Opened 3 years ago

Last modified 6 months ago

#14453 new task

Implement statistics gathering for number of Bridges-per-Transport in BridgeDB

Reported by: isis Owned by: isis
Priority: Medium Milestone:
Component: Obfuscation/BridgeDB Version:
Severity: Normal Keywords: tor-bridge, bridgedb, SponsorS-pt
Cc: Yawning, isis Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

As part of the SponsorS PT work, we promised a way to gather statistics on the number of bridges per transport.

The proposal states this is a task for Metrics. However, it's possible to do this on the BridgeDB side. In fact, it would help BridgeDB in the future to determine how to better allocate bridges to its Distributors (and help the Distributors hand them out to users in smarter ways).

Technically, BridgeDB already sort-of has data on the number of Bridges-per-Transport… or, rather, when a client requests a certain type of bridge from a certain Distributor (e.g. "give me an IPv4 obfs3 bridge from the HTTPS Distributor"), BridgeDB creates (or retrieves from a cache) a "filtered" subhashring containing only Bridges which fit the client's request. BridgeDB even logs the number of Bridges in these subhashrings in its DEBUG and INFO logs:

22:19:16 INFO    L1361:Bridges.addRing()        Bridges inserted into HTTPS-Transpo subring: 235
22:19:16 DEBUG     L75:Dist.getNumBridgesPerA() Returning 3 bridges from ring of len: 235

The problem with using those numbers for statistics is that BridgeDB's Distributors may have multiple adjacent subhashrings, usually about 5. So, in the above case, there's roughly something like 1175=5*235 obfs3 bridges in the HTTPS Distributor. (These numbers aren't from the real deployed BridgeDB, by the way.)


A better way to do this would be to provide a database query (as part of #12031) which counts the number of Bridges which claim to offer a PT. An example mechanism for doing this in Redis would be to keep a hash (i.e. using HSET or HINCRBY) of Bridges which have any PTs, where the keys are the Bridge fingerprints, add a field for each type of PT, and then (if not using HINCRBY) store IP:PORT[,IP:PORT[,IP:PORT[…]]], for example:

redis> HSET 26F6A7570E0F655DFDD054E79ACBB127112C2D7B obfs4 "4.4.4.4:4444,5.5.5.5:5555"

With that scheme, a new HSET would be necessary each time the @type bridge-extrainfo descriptors are parsed, but this only has time complexity O(1).

Some considerations / additional query parameters:

  • For these statistics, should we only count Bridges with the Running flag? Or only if the OONI machine says the PT is reachable?
  • What sanitisations should be done on these numbers? Should we round them? Or provide a scale, i.e. "between 1000-5000 obfs4 bridges"?
  • Do we want only the Bridges with a given PT? Or do we want the number of instances of a given PT (e.g. if a Bridge has multiple obfs3 instances)?

Child Tickets

Change History (2)

comment:1 Changed 3 years ago by karsten

I'm not clear whether we need statistics on the number of available bridges by transport, or on the number of requests to BridgeDB by transport. The former is a task for Metrics, the latter requires specifying a data format for BridgeDB statistics which CollecTor can fetch and then it requires writing code for CollecTor and then Metrics. I hope we didn't promise the latter.

Assuming we want number of bridges by transport, would you want to write the (Python) code for Metrics? Here's how this code could work:

I would run that script in a cronjob on yatei, and I'd write the necessary HTML and R/ggplot2 to turn the .csv into a shiny graph on Metrics.

Does that make sense?

comment:2 Changed 6 months ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

Note: See TracTickets for help on using tickets.