Implement a feedback loop between BridgeDB and OONI

Trac:
Parent Ticket: #31280 (moved)
Child Ticket(s): #34116 (moved), #34212 (moved), #34089 (moved), #34154 (moved), #34195 (moved), #34259 (moved), #34260 (moved)

added anti-censorship-roadmap-2020 component::circumvention/bridgedb owner::phw parent::31280 points::10 priority::medium s30-o23a2 severity::normal sponsor::30-must status::assigned type::project labels

Trac:
Keywords: N/A deleted, anti-censorship-roadmap-2020Q1 added

We discussed this task at today's sponsor 30 sync meeting. Here's a summary:

OONI currently tests our default bridges and directory authorities. The test targets can however be changed dynamically on the server side.
OONI recently made it possible for their Tor test to take a test target. There's a spec for this. The list of targets can even include private bridges, which wouldn't end up in a public git repository and OONI can also redact a bridge's IP address (and fingerprint?) from the test results. It's also possible to retrieve private bridges at run time using OONI Probe Services. (The current code for this is very basic.)
Arturo mentioned that the following query fetches all Tor test results: https://api.ooni.io/api/v1/measurements?test_name=tor
...for casual browsing, the Explorer may be more useful: https://explorer.ooni.org/search?test_name=tor&until=2020-02-12 To get an idea of what a test result looks like, take a look at this result from China and this result from Iran.
To get test results from OONI back to to BridgeDB, Arturo suggested that we shouldn't use OONI's API because it's not designed for batch use. Over at #32126, Arturo provided some more details. It's okay however to use the API for testing.

In the next step, the anti-censorship team should think about how OONI should partition the test targets it hands out to probes. What bridges should be tested? And how often? By what probes?

Trac:
Status: new to assigned
Owner: N/A to phw

Here are preliminary design considerations:

We want a standalone service (let's call it wolpertinger) that lives on polyanthum, alongside BridgeDB. Wolpertinger exposes an API that OONI and others (e.g., ICLab) can query to fetch bridges to test. Upon receiving a request, wolpertinger uses BridgeDB's SQL database and yet-to-be-defined heuristics to find a bridge that's worth testing, and returns its bridge line. While we are specifically designing wolpertinger to work well with OONI, other censorship measurement platforms should be able to use it too.
Arturo mentioned that OONI probes may not talk to wolpertinger directly, but rather proxy their requests over OONI's infrastructure. In this case, we don't need to worry about making wolpertinger resistant to censorship, but we may still want to make it available over domain fronting so we are prepared for a future in which censorship measurement probes (which are unlikely to be able to talk to *.torproject.org) connect directly.
When requesting a bridge to test, a censorship measurement probe should tell us the country it's in. We may also want to know its autonomous system. What else do we want to know?
We must authenticate incoming requests, so we can be sure that they are from OONI, and not from an attacker who seeks to collect bridges. A simple authentication token would do the job.
Once OONI has test results for a given bridge, these results have to make it back to wolpertinger somehow, so it can write them to BridgeDB's SQL database. Both a push and pull model are conceivable here; OONI could make another request containing the test results, or wolpertinger fetches the results from OONI's API.
Asked about the number of requests wolpertinger would be seeing, Arturo said "Looking at the number of opened reports per day (which is ~20k), we can estimate that it’s probably not going to be more than 20k requests per day for some time. Or somewhere in the range of 10-15 requests per minute."
Considering all of the above, wolpertinger's API could take the following JSON request format as input:

{
  "type": "TYPE",
  "country_code": "COUNTRY_CODE",
  "auth_token": "AUTH_TOKEN",
}

TYPE identifies the censorship measurement probe, and could be "ooni" for OONI. COUNTRY_CODE identifies the country the probe is in and AUTH_TOKEN is an authentication token. Upon receiving this request, wolpertinger would then respond with:

{
  "bridge_lines": ["BRIDGE_LINE_1", ..., "BRIDGE_LINE_N"]
}

Any thoughts on the above?

Replying to phw:

Here are preliminary design considerations:

We want a standalone service (let's call it wolpertinger) that lives on polyanthum, alongside BridgeDB. Wolpertinger exposes an API that OONI and others (e.g., ICLab) can query to fetch bridges to test. Upon receiving a request, wolpertinger uses BridgeDB's SQL database and yet-to-be-defined heuristics to find a bridge that's worth testing, and returns its bridge line. While we are specifically designing wolpertinger to work well with OONI, other censorship measurement platforms should be able to use it too.

Cool! I like the idea of having a standalone service with a general API that multiple external measurement platforms can use.

I'm mostly thinking about this from a bridge enumeration standpoint at the moment, since this opens up another vector for attack. I guess my first question here, is what are we most interested in learning from this? Is it whether specific bridges have been blocked in specific places, or that countries X, Y, and Z are very effective at blocking bridges of type A?

If we want general stats and information about what different censors are doing, then I would suggest making another partition of bridges and giving out these bridges to the probing services (as well as to users). This will limit the damage of a censor that uses an OONI client to figure out what OONI is probing.

If we do want to know when and where each specific bridge is blocked, then we should make sure we know how useful this information is to us and what we're going to do with it. If it's not useful, perhaps we should re-evaluate whether it's worth the exposure. Or if there's a less risky (more passive) way to get this information.

Arturo mentioned that OONI probes may not talk to wolpertinger directly, but rather proxy their requests over OONI's infrastructure. In this case, we don't need to worry about making wolpertinger resistant to censorship, but we may still want to make it available over domain fronting so we are prepared for a future in which censorship measurement probes (which are unlikely to be able to talk to *.torproject.org) connect directly.

Another question for the OONI side of things: are all OONI clients testing each bridge? Or just a subset of them? A subset will again limit exposure and make it difficult for a censor to be able to enumerate bridges just by running an OONI client.

I like this design for now where OONI gets the bridge information and distributes it to probes as opposed to probes asking for it directly. This is much easier for us to secure and I'm not sure we'd ever want to the latter situation because of the potential for enumeration.

When requesting a bridge to test, a censorship measurement probe should tell us the country it's in. We may also want to know its autonomous system. What else do we want to know? A timestamp for sure. I think it would be useful for the same probe to try multiple times within some time frame (4x/day for 2-3 days).

Thanks for your feedback!

Replying to cohosh:

I'm mostly thinking about this from a bridge enumeration standpoint at the moment, since this opens up another vector for attack. I guess my first question here, is what are we most interested in learning from this? Is it whether specific bridges have been blocked in specific places, or that countries X, Y, and Z are very effective at blocking bridges of type A?

It's the former. We want censorship measurement platforms to tell us if a given bridge is reachable in country X. BridgeDB already has code that takes into account where a bridge is blocked and where a user is from. The goal is to optimise bridge distribution, so users end up with bridges that are (most likely) unblocked in their country.

If we do want to know when and where each specific bridge is blocked, then we should make sure we know how useful this information is to us and what we're going to do with it. If it's not useful, perhaps we should re-evaluate whether it's worth the exposure. Or if there's a less risky (more passive) way to get this information.

I suggest we start with low-risk bridges like our default bridges (which are already public anyway) and bridges in our HTTPS/Proxy bucket. We lose little to nothing if these bridges get into our adversaries' hands.

Arturo mentioned that OONI probes may not talk to wolpertinger directly, but rather proxy their requests over OONI's infrastructure. In this case, we don't need to worry about making wolpertinger resistant to censorship, but we may still want to make it available over domain fronting so we are prepared for a future in which censorship measurement probes (which are unlikely to be able to talk to *.torproject.org) connect directly.

Another question for the OONI side of things: are all OONI clients testing each bridge? Or just a subset of them? A subset will again limit exposure and make it difficult for a censor to be able to enumerate bridges just by running an OONI client.

That's a great question and I don't have satisfying answers yet. But I agree that we should start with a small set of bridges and iterate as we gain experience. We should at least build this system in a way that it doesn't make it easier for an adversary to collect bridges.

I like this design for now where OONI gets the bridge information and distributes it to probes as opposed to probes asking for it directly. This is much easier for us to secure and I'm not sure we'd ever want to the latter situation because of the potential for enumeration.

Yes, agreed.

When requesting a bridge to test, a censorship measurement probe should tell us the country it's in. We may also want to know its autonomous system. What else do we want to know? A timestamp for sure. I think it would be useful for the same probe to try multiple times within some time frame (4x/day for 2-3 days).

I don't follow: do you mean an OONI probe should embed a timestamp when requesting a bridge to test? Why do we want a probe to request bridges multiple times per time frame?

OONI created a GitHub ticket for this.

Trac:
Cc: phw to phw, cohosh

FYI, I filed #34089 (moved) to expose wolpertinger's REST API on polyanthum.

I pushed work-in-progress code to the new wolpertinger repository.

No more Q1 for 2020.

Trac:
Keywords: anti-censorship-roadmap-2020Q1 deleted, anti-censorship-roadmap-2020 added

Trac:
Cc: phw, cohosh to phw, cohosh, gaba

cohosh cc'ing myself on sponsor 30 work

changed time estimate to 80h

mentioned in issue #34089 (moved)

mentioned in issue #34116 (moved)

mentioned in issue #34154 (moved)

mentioned in issue #31280 (moved)

mentioned in issue #34195 (moved)

mentioned in issue #34212 (moved)

mentioned in issue #34259 (moved)

mentioned in issue #34260 (moved)

moved to tpo/anti-censorship/bridgedb#32740 (closed)

mentioned in issue tpo/anti-censorship/bridgedb#34116 (closed)

mentioned in issue tpo/anti-censorship/bridgedb#34154 (closed)

Implement a feedback loop between BridgeDB and OONI

Child items ...

Activity