Opened 5 years ago

Closed 16 months ago

#15522 closed enhancement (wontfix)

Write Protobufs for any BridgeDB data which must be sent over a network or IPC channel

Reported by: isis Owned by:
Priority: Medium Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Normal Keywords: bridgedb-db, metrics, protobuf
Cc: isis, sysrqb, metrics-team Actual Points:
Parent ID: Points: 13
Reviewer: Sponsor: Sponsor19


BridgeDB should have Protobufs for any data structures which must be sent over either network or IPC channels. This includes data such as bridges parsed from Stem (which should be sent to the database manager from #12031), any data which is going to be exported to CollecTor (e.g. if we were to redesign a new pool "assignments.log" format like for #2755 and exported that), and any data which the client-side Social Distributor (#7520) built into a Tor Browser extension plans to send to BridgeDB and vice-versa.

Protocol buffers have had extensive security reviews, are used extensively in many projects, and would provide automatic code generation for serialisers/marshallers for Python/Java/C++/C/Go, meaning that, for example, both Metrics and BridgeDB could use the same generated code to read the same data format.

Child Tickets

Change History (7)

comment:1 Changed 5 years ago by isis

Yawning mentioned on IRC that the designer of Protobufs quit Google and made a newer thing called Cap'n Proto (which is unfortunately still beta at this time), but claims to provide significant ("infinite percent", to use their words) speed increases, along with the following somewhat ridiculous but definitely-amusing-and-possibly-worth-looking-into claims:

  • Incremental reads: It is easy to start processing a Cap’n Proto message before you have received all of it since outer objects appear entirely before inner objects (as opposed to most encodings, where outer objects encompass inner objects).
  • Random access: You can read just one field of a message without parsing the whole thing.
  • mmap: Read a large Cap’n Proto file by memory-mapping it. The OS won’t even read in the parts that you don’t access.
  • Inter-language communication: Calling C++ code from, say, Java or Python tends to be painful or slow. With Cap’n Proto, the two languages can easily operate on the same in-memory data structure.
  • Inter-process communication: Multiple processes running on the same machine can share a Cap’n Proto message via shared memory. No need to pipe data through the kernel. Calling another process can be just as fast and easy as calling another thread.
  • Arena allocation: Manipulating Protobuf objects tends to be bogged down by memory allocation, unless you are very careful about object reuse. Cap’n Proto objects are always allocated in an “arena” or “region” style, which is faster and promotes cache locality.
  • Tiny generated code: Protobuf generates dedicated parsing and serialization code for every message type, and this code tends to be enormous. Cap’n Proto generated code is smaller by an order of magnitude or more. In fact, usually it’s no more than some inline accessor methods!
  • Tiny runtime library: Due to the simplicity of the Cap’n Proto format, the runtime library can be much smaller.
  • Time-traveling RPC: Cap’n Proto features an RPC system that implements time travel such that call results are returned to the client before the request even arrives at the server!

comment:2 Changed 4 years ago by isis

Keywords: TorCoreTeam201608 added

Adding to my august tickets.

comment:3 Changed 2 years ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

comment:4 Changed 17 months ago by gaba

Keywords: TorCoreTeam201608 removed
Owner: isis deleted
Points: 10
Sponsor: Sponsor19
Status: newassigned

comment:5 Changed 17 months ago by karsten

Cc: metrics-team added

Commenting on this ticket, because it has the metrics tag and gaba asked me to comment.

I'm skeptical regarding the idea to use Protobufs for BridgeDB statistics. The idea to have automatically generated code for parsing is certainly tempting.

But having to use tools to look at binary-encoded messages seems annoying. This includes other folks who want to peek at BridgeDB statistics, without relying on any of our code.

I'm even more skeptical regarding Cap'n Proto, because there doesn't seem to be a Debian package that we can use as Java dependency.

Adding metrics-team to cc anyway to see where this goes.

comment:6 Changed 16 months ago by gaba

Points: 1013

comment:7 Changed 16 months ago by nickm

Resolution: wontfix
Status: assignedclosed

We're not currently planning to converge on protobufs, but we can reopen this if we change our minds.

Note: See TracTickets for help on using tickets.