Opened 6 years ago

Closed 14 months ago

Last modified 13 months ago

#14453 closed task (duplicate)

Implement statistics gathering for number of Bridges-per-Transport in BridgeDB

Reported by: isis Owned by:
Priority: Medium Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Normal Keywords: tor-bridge, bridgedb, anti-censorship-roadmap-august, s30-o21a1
Cc: Yawning, isis Actual Points:
Parent ID: #31274 Points: 5
Reviewer: Sponsor: Sponsor30-must


As part of the SponsorS PT work, we promised a way to gather statistics on the number of bridges per transport.

The proposal states this is a task for Metrics. However, it's possible to do this on the BridgeDB side. In fact, it would help BridgeDB in the future to determine how to better allocate bridges to its Distributors (and help the Distributors hand them out to users in smarter ways).

Technically, BridgeDB already sort-of has data on the number of Bridges-per-Transport… or, rather, when a client requests a certain type of bridge from a certain Distributor (e.g. "give me an IPv4 obfs3 bridge from the HTTPS Distributor"), BridgeDB creates (or retrieves from a cache) a "filtered" subhashring containing only Bridges which fit the client's request. BridgeDB even logs the number of Bridges in these subhashrings in its DEBUG and INFO logs:

22:19:16 INFO    L1361:Bridges.addRing()        Bridges inserted into HTTPS-Transpo subring: 235
22:19:16 DEBUG     L75:Dist.getNumBridgesPerA() Returning 3 bridges from ring of len: 235

The problem with using those numbers for statistics is that BridgeDB's Distributors may have multiple adjacent subhashrings, usually about 5. So, in the above case, there's roughly something like 1175=5*235 obfs3 bridges in the HTTPS Distributor. (These numbers aren't from the real deployed BridgeDB, by the way.)

A better way to do this would be to provide a database query (as part of #12031) which counts the number of Bridges which claim to offer a PT. An example mechanism for doing this in Redis would be to keep a hash (i.e. using HSET or HINCRBY) of Bridges which have any PTs, where the keys are the Bridge fingerprints, add a field for each type of PT, and then (if not using HINCRBY) store IP:PORT[,IP:PORT[,IP:PORT[…]]], for example:

redis> HSET 26F6A7570E0F655DFDD054E79ACBB127112C2D7B obfs4 ","

With that scheme, a new HSET would be necessary each time the @type bridge-extrainfo descriptors are parsed, but this only has time complexity O(1).

Some considerations / additional query parameters:

  • For these statistics, should we only count Bridges with the Running flag? Or only if the OONI machine says the PT is reachable?
  • What sanitisations should be done on these numbers? Should we round them? Or provide a scale, i.e. "between 1000-5000 obfs4 bridges"?
  • Do we want only the Bridges with a given PT? Or do we want the number of instances of a given PT (e.g. if a Bridge has multiple obfs3 instances)?

Child Tickets

Change History (13)

comment:1 Changed 6 years ago by karsten

I'm not clear whether we need statistics on the number of available bridges by transport, or on the number of requests to BridgeDB by transport. The former is a task for Metrics, the latter requires specifying a data format for BridgeDB statistics which CollecTor can fetch and then it requires writing code for CollecTor and then Metrics. I hope we didn't promise the latter.

Assuming we want number of bridges by transport, would you want to write the (Python) code for Metrics? Here's how this code could work:

I would run that script in a cronjob on yatei, and I'd write the necessary HTML and R/ggplot2 to turn the .csv into a shiny graph on Metrics.

Does that make sense?

comment:2 Changed 3 years ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

comment:3 Changed 21 months ago by gaba

Owner: isis deleted
Points: 5
Sponsor: Sponsor19
Status: newassigned

comment:4 Changed 21 months ago by gaba

Keywords: SponsorS-pt removed

comment:5 Changed 17 months ago by gaba

Keywords: ex-sponsor-19 added

Adding the keyword to mark everything that didn't fit into the time for sponsor 19.

comment:6 Changed 17 months ago by phw

Sponsor: Sponsor19Sponsor30-must

Moving from Sponsor 19 to Sponsor 30.

comment:7 Changed 15 months ago by phw

Doesn't BridgeDB's assignments.log already contain exactly this information? Its format is:
Fingerprint, distributor (e.g., HTTPS), IP version, flags, ring number, transports.

For each bridge fingerprint, it tells us what transports the bridge is running. If assignments.log is indeed all we want, we may want to tackle this together with #29480.

comment:8 Changed 15 months ago by gaba

Keywords: anti-censorship-roadmap-august added; ex-sponsor-19 removed

comment:9 Changed 15 months ago by phw

Parent ID: #31268

comment:10 Changed 15 months ago by phw

Parent ID: #31268#31274

comment:11 Changed 14 months ago by phw

Resolution: duplicate
Status: assignedclosed

I'm marking this as a duplicate because we can collect these statistics as part of our work for #31422. Basically, this can be just another BridgeDB-internal metric if I understand this ticket's description correctly.

comment:12 Changed 13 months ago by gaba

Keywords: s30-a1 added

comment:13 Changed 13 months ago by gaba

Keywords: s30-o21a1 added; s30-a1 removed
Note: See TracTickets for help on using tickets.