Opened 3 years ago

Last modified 3 months ago

#15522 new enhancement

Write Protobufs for any BridgeDB data which must be sent over a network or IPC channel

Reported by: isis Owned by: isis
Priority: Medium Milestone:
Component: Obfuscation/BridgeDB Version:
Severity: Normal Keywords: bridgedb-db, metrics, protobuf, TorCoreTeam201608
Cc: isis, sysrqb Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

BridgeDB should have Protobufs for any data structures which must be sent over either network or IPC channels. This includes data such as bridges parsed from Stem (which should be sent to the database manager from #12031), any data which is going to be exported to CollecTor (e.g. if we were to redesign a new pool "assignments.log" format like for #2755 and exported that), and any data which the client-side Social Distributor (#7520) built into a Tor Browser extension plans to send to BridgeDB and vice-versa.

Protocol buffers have had extensive security reviews, are used extensively in many projects, and would provide automatic code generation for serialisers/marshallers for Python/Java/C++/C/Go, meaning that, for example, both Metrics and BridgeDB could use the same generated code to read the same data format.

Child Tickets

Change History (3)

comment:1 Changed 3 years ago by isis

Yawning mentioned on IRC that the designer of Protobufs quit Google and made a newer thing called Cap'n Proto (which is unfortunately still beta at this time), but claims to provide significant ("infinite percent", to use their words) speed increases, along with the following somewhat ridiculous but definitely-amusing-and-possibly-worth-looking-into claims:

  • Incremental reads: It is easy to start processing a Cap’n Proto message before you have received all of it since outer objects appear entirely before inner objects (as opposed to most encodings, where outer objects encompass inner objects).
  • Random access: You can read just one field of a message without parsing the whole thing.
  • mmap: Read a large Cap’n Proto file by memory-mapping it. The OS won’t even read in the parts that you don’t access.
  • Inter-language communication: Calling C++ code from, say, Java or Python tends to be painful or slow. With Cap’n Proto, the two languages can easily operate on the same in-memory data structure.
  • Inter-process communication: Multiple processes running on the same machine can share a Cap’n Proto message via shared memory. No need to pipe data through the kernel. Calling another process can be just as fast and easy as calling another thread.
  • Arena allocation: Manipulating Protobuf objects tends to be bogged down by memory allocation, unless you are very careful about object reuse. Cap’n Proto objects are always allocated in an “arena” or “region” style, which is faster and promotes cache locality.
  • Tiny generated code: Protobuf generates dedicated parsing and serialization code for every message type, and this code tends to be enormous. Cap’n Proto generated code is smaller by an order of magnitude or more. In fact, usually it’s no more than some inline accessor methods!
  • Tiny runtime library: Due to the simplicity of the Cap’n Proto format, the runtime library can be much smaller.
  • Time-traveling RPC: Cap’n Proto features an RPC system that implements time travel such that call results are returned to the client before the request even arrives at the server!

comment:2 Changed 19 months ago by isis

Keywords: TorCoreTeam201608 added

Adding to my august tickets.

comment:3 Changed 3 months ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

Note: See TracTickets for help on using tickets.