Opened 4 months ago

Closed 3 months ago

#30006 closed defect (fixed)

Monitor "aliveness" of default bridges in Tor Browser

Reported by: phw Owned by: phw
Priority: Medium Milestone:
Component: Applications/Quality Assurance and Testing Version:
Severity: Normal Keywords: default bridge tbb-bridges
Cc: cohosh, anarcat, boklm, gaba, mrphs Actual Points:
Parent ID: #30152 Points:
Reviewer: Sponsor: Sponsor19-can

Description (last modified by phw)

Tor Browser ships with several dozen default bridges. We currently have no automated way of learning when one of these bridges disappear, e.g., because the owner cannot afford her VPS bill anymore. If this happens, Tor Browser would then ship with dead bridges, which don't help anyone. This has happened before. Also, note that this ticket is not about measuring censorship.

We should find out when any of our default bridges disappear. A simple way to do so would be to test its TCP reachability, i.e., see if it's possible to establish a TCP handshake with its bridge port. Ideally, we want a test like: "ping the bridge n times per week and mark it as "alive" if it responded to at least n - m SYN segments." We should also think about how to sync a list of default bridges to test with the actual list of default bridges in Tor Browser, so we are testing what needs testing.

Prometheus may be able to help here, and has an exporter that can measure a machine's TCP reachability. Here's what anarcat found:

$ curl 'http://localhost:9115/probe?target=google.com:80&module=tcp_connect' 
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.046783699
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.060479041
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 6
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1

Child Tickets

Change History (19)

comment:1 Changed 4 months ago by phw

Description: modified (diff)

comment:2 Changed 4 months ago by anarcat

Some more details on how this works. Prometheus is just a scraping/alerting system and relies on "exporters" to do the work. For example, we have "node exporters" installed on every TPA machine which provide stats like disk, CPU, and memory usage and also have "apache exporters" which provide internal stats on webservers as well. Details of that deployment are in #29681.

The exporter that seem to fit the bill of "probe a TCP port for liveness" seem to be the blackbox exporter. It could be deployed on the Prometheus server and check each public tor bridge for reachability. The blackbox exporter is not very well documented (not surprising considering its name), so I found more documentation on how it works here and here.

The example you pasted was ran on my home workstation, and was simply a matter of running:

apt install prometheus-blackbox-exporter

The exporter supports probing arbitrary hosts on the fly like this. The final targets would need to be added to the configuration file (see also this example). This could all be done somewhat automatically as well, with a cron job polling the list of bridges from some canonical location.

The blackbox exporter is pretty powerful: in theory, we could make it do a simple send/expect dialog to verify the other end is really a Tor server, if that would be useful.

Once the exporter is setup, the Prometheus server would be configured to scrape those metrics, which would be collected every "scrape interval" (currently 15 seconds).

Note that we do not have alerting capabilities yet: this is still handled by Icinga (previously known as Nagios) (see #29864 and #29863 for that discussion). Instead, we could make a Grafana dashboard that displays those metrics. There are a few dashboards that exist already that process those metrics out of the box, but they would probably require at least some tweaking:

I'm not sure alerting is really a necessity. It might be sufficient to check that dashboardas part of the release process, for example.

The open questions for me are:

  1. is this the metrics team responsability? or TPA?
  2. what is the canonical reference for the list of public bridges? this javascript file? how stable is that file format? do I need to parse it as javascript or can I get away with a regex?
  3. what is the threshold for failure? say we ping the bridge every 15 seconds, how many failures per which time period is a considered a failure? an example would be less than 50% of probes in the last day, for example. we can also check for latency as well
  4. are latency metrics sensitive? currently, the Prometheus metrics are more or less publicly accessible, so if this is implemented, it would expose the latency of those hosts which could be leveraged for correlation attacks (although arguably *anyone* could run a similar setup and do a similar attack). if we are worried about this, a separate Prometheus server could be deployed with stronger security. (see also the discussion in #29863)

comment:3 Changed 4 months ago by anarcat

the more i look at the list of bridges, the less happy i am. :) i was able to extract a list of host:port things with the atrocious:

curl -sSL 'https://gitweb.torproject.org/builders/tor-browser-build.git/plain/projects/tor-browser/Bundle-Data/PTConfigs/bridge_prefs.js' | sed -n '/default_bridge\.obfs/{s/.*obfs. //;s/ .*$//;p}'

... but that's pretty nasty. ideally, we'd have a plain-text file listing each host, one per line, without anything else. then we could do a stronger regex that would sanitize the output for inclusion in the prometheus server. is there another, simpler data format, canonical source for this?

comment:4 Changed 4 months ago by anarcat

also, I researched how the blackbox exporter works and it seems like actual blackbox exporter would be actually null: everything happens on the prometheus side, as the targets are passed to the exporter from there.

the scrape_config would look something like this:

scrape_configs:
  - job_name: blackbox_tor_bridges
    scrape_interval: 15s
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - 169.229.59.74:31493
        - 169.229.59.75:46328
        # ...
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9115  # The blackbox exporter's real hostname:port.

the targets line is the tricky part to get right. I think we *might* be able to get away with a file_sd_config the same way we did for the node_exporter stuff, with something like this:

    # ...
    metrics_path: /probe
    params:
      module: [tcp_connect]
    file_sd_configs:
    - files:
      - "/etc/prometheus/file_sd_config.d/blackbox_tor_bridge_*.yaml"
    relabel_configs:
    # ...    

... instead of static_config. Each blackbox_tor_bridge file would be generated dynamically with a cronjob, which is why it would be critical that it's safe to drink from that file, because wrong data might crash the Prometheus server or worse.

comment:5 in reply to:  3 Changed 3 months ago by gk

Replying to anarcat:

the more i look at the list of bridges, the less happy i am. :) i was able to extract a list of host:port things with the atrocious:

curl -sSL 'https://gitweb.torproject.org/builders/tor-browser-build.git/plain/projects/tor-browser/Bundle-Data/PTConfigs/bridge_prefs.js' | sed -n '/default_bridge\.obfs/{s/.*obfs. //;s/ .*$//;p}'

... but that's pretty nasty. ideally, we'd have a plain-text file listing each host, one per line, without anything else. then we could do a stronger regex that would sanitize the output for inclusion in the prometheus server. is there another, simpler data format, canonical source for this?

I guess .csv would work for you? See over at #29275 the OONI folks have such a thing it seems (even though it might be outdated): https://github.com/citizenlab/test-lists/blob/master/lists/services/tor/bridges.csv.

So, I'd suggest to make sure that one is always kept up-to-date (not sure how the .csv file is generated on the OONI side, but hopefully in a scriptable way) and then both OONI and the solution for this bug could use that one as canonical source.

comment:6 Changed 3 months ago by anarcat

I guess .csv would work for you?

Yes, CSV would work but it doesn't need to be anything fancy. Just not something that is actually source code, which can change radically and is difficult to parse reliably in the long term.

And yes, it seems like having a canonical copy of this, whether it's a plain text file, CSV, YAML, JSON or whatever would be better than the current "Mozilla prefs.js" approach. ;)

comment:7 Changed 3 months ago by phw

I'm considering creating a git repository, maintained by the anti-censorship team, that contains an up-to-date CSV file (which would be simple for anarcat to fetch and parse) for our default bridges with the following information:

  • Fingerprint
  • IP address and port(s)
  • Email address (or other contact info) of owner
  • What protocols the bridge speaks (e.g., vanilla Tor, obfs3, ...)
  • Date of when the bridge was set up
  • ...anything else?

What do you think? Should we rather keep OONI's list up-to-date? Mostly, I want a single source of truth that includes contact information of the operator.

comment:8 in reply to:  7 ; Changed 3 months ago by boklm

Replying to phw:

I'm considering creating a git repository, maintained by the anti-censorship team, that contains an up-to-date CSV file (which would be simple for anarcat to fetch and parse) for our default bridges with the following information:

  • Fingerprint
  • IP address and port(s)
  • Email address (or other contact info) of owner
  • What protocols the bridge speaks (e.g., vanilla Tor, obfs3, ...)
  • Date of when the bridge was set up
  • ...anything else?

What do you think? Should we rather keep OONI's list up-to-date? Mostly, I want a single source of truth that includes contact information of the operator.

If there is a git repository containing this CSV file, then maybe we could use it in tor-browser-build to generate the .js file containing the prefs for Tor Browser.

comment:9 Changed 3 months ago by boklm

Cc: boklm added

comment:10 Changed 3 months ago by gaba

Cc: gaba added

comment:11 Changed 3 months ago by mrphs

Cc: mrphs added

comment:12 in reply to:  7 Changed 3 months ago by dcf

Keywords: tbb-bridges added

Replying to gk:

I guess .csv would work for you? See over at #29275 the OONI folks have such a thing it seems (even though it might be outdated): ​https://github.com/citizenlab/test-lists/blob/master/lists/services/tor/bridges.csv.

That list is not maintained. An up-to-date one (used for the tcp_connect test, and containing more than just default bridges) is at https://github.com/OpenObservatory/ooni-resources/blob/master/bridge_reachability/tor-bridges-ip-port.csv. However that, too, is probably going to change in the future as OONI deploys its orchestrator.

Replying to phw:

I'm considering creating a git repository, maintained by the anti-censorship team, that contains an up-to-date CSV file (which would be simple for anarcat to fetch and parse) for our default bridges with the following information:

  • Fingerprint
  • IP address and port(s)
  • Email address (or other contact info) of owner
  • What protocols the bridge speaks (e.g., vanilla Tor, obfs3, ...)
  • Date of when the bridge was set up
  • ...anything else?

You may want a list separate from the OONI one, because the OONI one doesn't have all the information you want.

There's some past data (2015–2018) https://repo.eecs.berkeley.edu/git-anon/users/fifield/proxy-probe.git, see the files proxy-probe.csv and significant_dates.txt.

I try to make sure that every ticket about default bridges is tagged with the tbb-bridges tag. You can look over those tickets to get information about timing and who the operator is.

comment:13 in reply to:  8 Changed 3 months ago by phw

Replying to boklm:

Replying to phw:

I'm considering creating a git repository, maintained by the anti-censorship team, that contains an up-to-date CSV file (which would be simple for anarcat to fetch and parse) for our default bridges with the following information:

  • Fingerprint
  • IP address and port(s)
  • Email address (or other contact info) of owner
  • What protocols the bridge speaks (e.g., vanilla Tor, obfs3, ...)
  • Date of when the bridge was set up
  • ...anything else?

What do you think? Should we rather keep OONI's list up-to-date? Mostly, I want a single source of truth that includes contact information of the operator.

If there is a git repository containing this CSV file, then maybe we could use it in tor-browser-build to generate the .js file containing the prefs for Tor Browser.

I just filed #30121 to discuss this.

comment:14 Changed 3 months ago by anarcat

a comment in #30121 made me think that the CI / build process could be the thing that checks for integrity in the bridge list. if CI is able to access the network, it could just try to ping those bridges as part of the build and warn/fail if they are not reachable.

that would be simpler and better integrated than running a different monitoring server.

comment:15 in reply to:  14 Changed 3 months ago by phw

Replying to anarcat:

a comment in #30121 made me think that the CI / build process could be the thing that checks for integrity in the bridge list. if CI is able to access the network, it could just try to ping those bridges as part of the build and warn/fail if they are not reachable.

that would be simpler and better integrated than running a different monitoring server.

Alternatively, we had Rabbi Rob and gman999 offer to run monitoring for us -- using Nagios and sysmon, respectively. We may want to take them up on their offer, and while we're at it, also have our bridge authority monitored, as discussed in #29229. BridgeDB also needs monitoring.

comment:16 Changed 3 months ago by gaba

Cc: mrphs removed
Keywords: tbb-bridges removed
Sponsor: Sponsor19-can

comment:17 Changed 3 months ago by gaba

Cc: mrphs added
Keywords: tbb-bridges added

comment:18 Changed 3 months ago by phw

Parent ID: #30152

comment:19 Changed 3 months ago by phw

Resolution: fixed
Status: assignedclosed

We took gman999 up on his offer to have our default bridges monitored. The sysmon instance is testing the port of our default bridges every five minutes. The monitoring details are documented in our wiki page.

Note: See TracTickets for help on using tickets.