Opened 7 years ago

Closed 5 years ago

#6414 closed project (not a bug)

Automating Bridge Reachability Testing

Reported by: isis Owned by: isis
Priority: Medium Milestone:
Component: Archived/Ooni Version:
Severity: Keywords: bridge-reachability metrics-db automation testing SponsorZ
Cc: karsten, arma, ln5, aagbsn, identity.function@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

An effort was made earlier this year to create a discovery system for current
bridge reachability status #5028. This resulted in the development and
deployment of OONI's BridgeT [26], which uses txtorcon to attempt a
connection, speaking the full Tor protocol, to the set of bridges being
tested. Some bridges were scanned, and results were gathered. We would like to
go back and automate this process, and possibly revise it if a better
methodology is proposed. Anyone with ideas or interest should feel free to
join the discussion here.

While this automation is intended to be geolocationally agnostic, it is
trivial to test a bridge's reachability from a country which does not block
Tor, and therefore automation methodology should be developed according to the
worst-case scenarios. Countries which block Tor, or have blocked Tor, include
China, Iran, Lebanon, Qatar, United Arab Emirates, and Ethiopia. In order to
ensure that the fewest amount of Tor bridges are blocked during reachability
testing, it seems wise to assume that the test is being conducted from one of
these countries. Also, any test methodology which produces accurate results
from inside China or Iran would likely work just as well from any
non-Tor-blocking country.

Brief Overview of Dynamic Tor Bridge Blocking

From my understanding so far (please correct me if I have misunderstood
something, or if there is more information), China's mechanism for
blocking Tor bridges takes the following steps (unconfirmed data is
prefaced by a question mark):

  1. OP --> OR/Bridge Connection
    1. Alice (OP/client in China) connects to Bob (OR/bridge), completes the TLS handshake, and sets up circuits.
    2. This works for roughly fifteen minutes.
  2. Protocol Identification & Fingerprinting
    1. The GFC identifies Tor via fingerprinting the cipher list in the TLS Server Helo.
    2. Tests for the precise trigger in the fingerprint were conducted (I'll leave said tester(s) anonymous unless they would like to speak up) by fuzzing the TLS handshake ServerHello, and the precise fingerprint for triggering the GFC's nascent probes was determined to be a specific 5 bytes. (?) It was also found that the GFC blocks packets <= 79 bits.
    3. Philip Winter's research showed that fragmentation of the ciphersuite list would not trigger a probe [5].
  3. Network Enumeration
    1. The GFC adds Bob's IP and port to a queue of addresses to be checked. These queues are processed every fifteen minutes (hence why Alice's connection functions normally at first).
    2. A probe is sent to Bob during queue processing. The GFC probes are not yet fully understood, and unverified data in this section is prefaced by a '?'. Thus far, the following is believed to occur:
      • (?) Reportedly (speak up if you wish), there are eight "edge routers" in China. The reporter stated that there was "one for each province", however there are twenty-two Provinces in PRC -- twenty-three if you count Taiwan. There is one "core router" which controls/routes to the eight "edge routers". Because all traffic into and out of China passes through these eight routers, all netblocks within China are essentially a private network behind the "edge routers". (See question #2 below.)
      • (?) Because these "edge routers" are intercepting all traffic, they are able to temporarily hijack any IP from the contained netblocks.
      • A hijacked IP and a random port (the range appears to be ~35000-60000) are used as the source to send a probe to the queued IP:port of the suspected bridge. (See question #3 below.)
      • The probe does a TCP connect.
      • Then it sends a TLS ClientHello and waits for the cipher list in the ServerHello message.
      • If the cipher list matches that used by Tor, the IP:port gets blacklisted. Previous research has shown that this blacklisting is not permanent, but lasts for 12 hours after the last successful connection by a probe [1]. (See question #4)

Testing Bridge Reachability

As Roger has stated on the Tor Blog, we can either do active or passive scans
to check if a bridge has been blocked [4]. Passive scans, wherein either the
bridge or the client report connections, are unreliable without results from
active scans in the former case [5], and could potentially reduce privacy and
anonymity in the later case.

Active Scans

Direct Methods
From most innocuous (least Tor-like) to most conspicuous (most Tor-like):

ICMP type-8 ping / echo

Tells us if the host running the Tor bridge is online, but not necessarily
if the ORPort is open.

TCP ping / ACK

If TCP ACKs are timed to be sent infrequently (probably no more than one
every five minutes or so), they can appear to be random network noise
rather than a scan. If we get a RST back, we know that we can at least
communicate with the bridge's ORPort though the GFC. This might look odd,
if it gets noticed, especially since the GFC is stateful and might realize
the ACKs are unsolicited.

TCP SYN

This still doesn't tell us if Tor is running, but, again, a SYN/ACK would
let us know if the ORPort is reachable and accepting connections, a RST
that it is reachable and not accepting connections (or the GFC is sending
false TCP RSTs), and no response would mean that the GFC, or some other
hop is dropping packets. Philipp Winter's research showed that the
client's SYN is transmitted through the GFC, which instead drops the
SYN/ACK response of known Tor relays/bridges [2].

TCP connect()

We could try a normal full TCP connect (SYN & ACK). This would be the most
genuine-to-the-Tor-protocol test available for regions where SSL is being
blocked. It could be useful here to test different types of fragmentation,
for example, the old trick involving overlapping fragments to rewrite the
TCP headers in the first fragment [25].

SSL Handshake

We could try doing a normal SSL handshake, as if contacting, for example,
an Apache webserver over HTTPS. Another interesting idea would be to run
an SSLObservatory from inside China, and simply pretend that the bridges
are HTTPS webservers, which would look just like the normal SSLObservatory
for bridges whose ORPort is set to :443 [14, 15]. As of this morning, a
quick check on Tor relays shows that 27% of relays are run on :443 :

    isis@acab:/var/lib/tor$ cat cached-microdesc-consensus | grep -e "^r\ [a-zA-Z0-9]*\ /*" \
    >| grep " 443 " -c
    779
    isis@acab:/var/lib/tor$ cat cached-microdesc-consensus | grep -e "^r\ [a-zA-Z0-9]*\ /*" -c
    2912
    isis@acab:/var/lib/tor$ python -c 'from __future__ import division;a=799/2912;\
    >print a'
    0.274381868132

with the most common ports being:

    isis@acab:/var/lib/tor$ cat cached-microdesc-consensus | grep -e "^r\ [a-zA-Z0-9]*\ /*" \
    >| cut -d " " -f 7 | sort | uniq -ic | sort -gr
           1592 9001
            762 443
            217 80
             34 9090
             33 8080
             21 9002
             20 444
             11 9031
             11 110
              9 22
              7 21
    [...]

I would assume that the percentage of bridges running on :443 is higherthan
that of relays (question #5). We could safely automate the testing ofthose
relays without actually speaking Tor to them, by appearing to be
anSSLObservatory (question #6). This would provide us with an extensive
setof canaries to help mitigate the zig-zag enumeration attack [9]
(seequestion #7). However, in regions which block Tor based on the
ciphersuitelist in the ServerHello, such as in Iran in June 2011, it doesn't
matterwhat ciphersuite we send as the client [16].

For those bridge not running on :443, we could have the bridge scannermimic
another protocol and service which uses TLS/SSL, such as IMAPS,SFTP, for
instance it could pretend to be a client connecting to a Dovecotor vsftp
server.

Tor TLS/SSLv3 Handshake

We can drive a Tor Client, or a script pretending to be Tor (which
shouldknow about the different handshake versions, specifically their
commandand CERT cells [10]), to handle the TLS negotiation. Interestingly,
forthe v2 and v3 protocols, we can use any ciphersuite list we like, as
longas we include

TLS_DHE_RSA_WITH_AES_256_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA

in addition to at least one extra that is not any of those four. Torclients
before 0.2.3.11-alpha send a fixed ciphersuite list, and the GFCsends a
probe based on this fixed ciphersuite list [12]. It is apparentlyalso the
case that the GFC will not send a probe if the standard fixedciphersuite
is altered by at least two ciphers [12]. To assist with this,hellais wrote a
handy Python script for grabbing the default ciphersuitelist from the source
code of Firefox [13]. Also, as mentioned previously,we can fragment the
sending of the ciphersuite list to avoid triggering aprobe [5].

Indirect Methods

As Roger also mentions, we could use some variant of the idle scan. [4, 8,
17] There are a few:

  1. Use nmap / hping.
    1. For nmap, there is an NSE script for zombie discovery, which can be combined with blockfinder to collect lists of hosts (probably printers or other archaic networked devices) with globally sequential IPIDs [7, 18].
  2. Use idlescanner, a Python script which uses the "content upload" feature of popular sites, e.g. Reddit, Imgur, Facebook, Digg, Tinypic, Tineye, etc., to attempt a connection to the bridge [19, 20]. This may not be entirely accurate, because it is based purely on the waiting for the upload site to timeout.
  3. Use FTP PROXY or some other obscure bounce mechanism [21]. These need to be further researched.
  4. Now we start to get into some crazier ideas. If we set up a bridge purposefully to act as a canary, then we could send from an box inside China a bunch of TCP SYNs with spoofed IP headers to the canary bridge to trigger a bunch of probes. Then we trigger the probes with something (Winter wrote a program to do this called tcis [22, 23], and hellais ported it to Python in OONI [24]) forcing the probes to go after the canary bridge, during the two minutes that the probes have hijacked IP addresses, we use the probes' hijacked IP addresses as zombies for idle scan of bridge. This would require some preliminary mucking with the probes to see if they have any mechanism we could leverage to "see" if the bridge's packets made it to the probe. Basically we force the probe to hijack an IP, which we then zombify while it's chasing the canary, and get the zombie probe to scan the the bridge for us, without it actually scanning it, so it doesn't get blocked, and the traffic doesn't look suspicious to anyone keeping an eye on the probes.
  5. A commenter on the Tor blog had the idea to try to "borrow a Chinese botnet" to do the scans for us, since the botnet would probably attract a lot more attention by the Chinese officials than any amount of Tor bridges. Also, with this idea, the scan could be made to look like your standard botnet running around launching PHP exploits at everyone and their mothers. This is a highly entertaining idea, but it's also a bit unethical (though I'm not certain -- do the ends justify the means in this case?), and it might come back to bite us.
    1. If there were a way to get an in-country botnet to "take notice" of certain bridges, we could do a sort of "Here boy, fetch!" trick. For example, if a botnet appears to be having infected hosts report-back to an IRC channel, or scanning for Windows hosts with port 139 open, we could mimic the responses an infected host would give while spoofing the bridge's IP. I have no idea how feasible or reliable that would be.

Automation Concerns and Desired Features

We should avoid scanning bridges that we suspect are not
blocked. Therefore, eventually there should be an easy way to automate
feedback loops between Karsten's metrics and the bridge scanner. That way,
once connections in a certain country drop significantly, the automated
tests initiate in order to discover if those bridges are in fact
unreachable.

Design Features:

  1. Allow for either eventual integration with, or some type of feedback mechanism for, metrics-db.
  2. Should be automatable in a safe manner, i.e. the bridge scanner should know that a a full Tor connection to a specific bridge will likely result in that bridge being blocked, and thereby skip running any test which include a full Tor connection.
  3. Should be easily incrementable, meaning it should be simple to tell the test "only try TCP SYNs for this list of bridges", or "try everything up until a Tor-specific TLS/SSL handshake".
  4. GeoIP awareness.

Implementation

I propose the test have all of the Active Direct Methods outlined above, and
an easy way to test one at a time. For the actual testing, I want to err on
the side of caution, in order to avoid getting bridges blocked. Therefor,
during bridge reachability testing, we should test via most innocuous method
first, wait a while (probably a day or two), see what we learn, then proceed
to the next method.

I was planning to use Python, because it's fast (in terms of coding time), we
don't need to worry about portability in this instance, and it gives me less
headaches than C. And Java makes me want to set things on fire. James Arthur
Gosling, take it back.

For the indirect scanning methods, I believe these will be difficult to
entirely automate, but I plan to implement them so that they require as little
human interaction as possible. If any of them prove reliable, they can be used
as fallback methods when information concerning specific bridges is needed
immediately and there is a human willing to run the tests.

Project Timeline

July 2012

Two weeks of continued research and discussion until end of July.

August 2012

Four weeks for initial development phase. Beta tests should be deployed by
31 August, and gathered data saved for evaluation of testing methods.

September 2012

Four weeks for evaluation of data previously gathered from beta testing,
and continued development of bridge reachability testing tools. Alpha
release should be deployed by 30 August.

October 2012

Two weeks for final development, with a useable, automated bridge
reachability testing tool produced by 14 October. Two weeks for final
testing, data collection and report generation, and discussion of further
steps for integrating the automation of bridge reachability testing with
general Tor metrics.

November 2012

The project should be completed by 1 November 2012.

Active Questions:

  1. Should this automation be considered part of OONI? Or BridgeDB? Or is it part of some other project?
  2. If there are only eight "edge routers":
    1. What are their IP addresses?
    2. Which protocols return traceroute data for these routers?
    3. Is the "core router" on this side of the "edge routers", or the other?
    4. What is the usual TTL of packets from the probes?
  3. For how long is an IP hijacked by the GFC probe?
  4. Roger mentions that "if the bridge had no other interesting services running (like a webserver), they just blackholed the IP address...but if there was an interesting service, they blocked the bridge by IP and port." Do the probes enumerate all ports, just common ones, or just privileged ports?
  5. What percentage of current bridges are running on port 443?
  6. Does the GFC automatically flag connections to TLS/SSL services which did not previously complete a DNS resolve?
    1. If so, (because most browsers cache DNS resolutions) what is the max time interval between the last successful clientside DNS resolution and a client's request for the GFC to remember that DNS was resolved?
    2. Do connection directly to IP addresses on port 443 stand out due to a lack of DNS resolution?
  7. Does the GFC queue all TLS/SSL connections for later enumeration?

References

[1] "How China Is Blocking Tor". Winter, Philip, and Lindskog, Stefan.
Karlstad University, Sweden (2011). p.7, section 5.1 http://www.cs.kau.se/philwint/pdf/torblock2012.pdf
[2] Ibid. p.6, section 4.2.
[3] Ibid. p.19, section 6.3.
[4] "Research problem: Five ways to test bridge reachability". Dingledine, Roger.
The Tor Project (2011). https://blog.torproject.org/blog/research-problem-five-ways-test-bridge-reachability
[5] "Case study: Learning whether a Tor bridge is blocked by looking at its aggregate usage statistics".
Loesing, Karsten. The Tor Project (2011). https://metrics.torproject.org/papers/blocking-2011-09-15.pdf
[6] "Level Four Traceroute". http://pwhois.org/lft/
[7] "ipidseq.nse - nmap script for globally sequential IP ID discovery"
http://nmap.org/nsedoc/scripts/ipidseq.html
[8] "Idle Scan". http://nmap.org/book/idlescan.html
[9] "paketto". http://dankaminsky.com/2002/11/18/77/
[10] "Research problems: Ten ways to discover Tor bridges". Dingledine, Roger.
The Tor Project (2011). Point #10. https://blog.torproject.org/blog/research-problems-ten-ways-discover-tor-bridges
[11] "Tor Protocol Specification". Dingledine, Roger, and Mathewson, Nick.
The Tor Project (2012). Sections 2-4. https://gitweb.torproject.org/torspec.git/blob_plain/HEAD:/tor-spec.txt
[12] "GFW probes based on Tor's SSL cipher list".
https://trac.torproject.org/projects/tor/ticket/4744
[13] "get_mozilla_ciphers.py - Get the default ciphers of Mozilla Firefox".
https://trac.torproject.org/projects/tor/attachment/ticket/4744/get_mozilla_ciphers.py
[14] "EFF's SSL Observatory". https://www.eff.org/observatory
[15] "SSLObservatory git repository". https://git.eff.org/public/observatory.git
[16] "Iran blocks Tor; Tor releases same-day fix". Dingledine, Roger.
The Tor Project (2011). https://blog.torproject.org/blog/iran-blocks-tor-tor-releases-same-day-fix
[17] "new tcp scan method". Sanfilippo, Salvatore. (1998).
http://seclists.org/bugtraq/1998/Dec/79
[18] "Ioerror's blockfinder git repository". https://github.com/ioerror/blockfinder
[19] "Zombie Scans using Unintended Public Services".
http://blog.makensi.es/post/3884103946/zombie-scans-using-unintended-public-services
[20] "idlescanner.py - Use unintentional web services for portscanning".
http://makensi.es/tools/idlescanner.txt
[21] "FTP Bouncing for Portscanners - FTP PROXY".
http://nmap.org/nmap_doc.html#bounce
[22] "How the Great Firewall of China is Blocking Tor". Winter, Philipp.
Karlstads Universitet (2012). http://www.cs.kau.se/philwint/static/gfc/
[23] "NullHypothesis' tcis git repository". https://github.com/NullHypothesis/tcis
[24] "OONI - chinatrigger.py - Python port of tcis".
https://github.com/hellais/ooni-probe/blob/master/ooni/plugins/chinatrigger.py
[25] "An Analysis of Fragmentation Attacks". Anderson, Jason. (2001).
http://www.ouah.org/fragma.html
[26] "bridget.py". https://gitweb.torproject.org/ooni-probe.git/blob/HEAD:/ooni/plugins/bridget.py

Child Tickets

TicketStatusOwnerSummaryComponent
#5272closedisisBridgeT: Check if bridges are public relaysArchived/Ooni
#6714closedisisoption to specify custom tor binary pathArchived/Ooni
#6804closedisisReachability tests for obfuscated bridgesArchived/Ooni
#6865closedisisBridge Testing: Active ScansArchived/Ooni
#6874closedisisBridge Testing: Indirect ScansArchived/Ooni

Change History (18)

comment:1 in reply to:  description ; Changed 7 years ago by karsten

Keywords: bridge-reachability SponsorF20121101 added; bridge reachability removed

Wow, this project plan is fantastic! Awesome work! :)

First of all, I'm making this the new ticket for sponsor F year 2 item 10 by adding the keyword SponsorF20121101. (I also removed the commas from existing keywords, because I think that's how keywords work in Trac; please change back if I'm wrong.) The ticket will now be listed on the sponsor F year 2 wiki page instead of #5028.

As for the text that is now in the description, I could imagine that it will turn into a tech report that we can make part of the sponsor deliverable. How do you think about starting a LaTeX document by cloning my public tech-reports.git repo (branch fivereports) and creating a new directory 2012/automatic-bridge-reachability-testing/ there? You could ask for your own public tech-reports.git repo to host your report sources.

Also, I think I can help with two of your questions:

  1. Should this automation be considered part of OONI? Or BridgeDB? Or is it part of some other project?

I'd think the scanner should be considered part of OONI. There's already a defined interface for BridgeDB to use the scanner's results (#5484). Ideally, the scanner would output its results in that format, so that BridgeDB can make its decisions which bridges to give out to which users. Also, metrics-db should learn about the very same file, sanitize any sensitive information in it, archive it, and make it public.

Related to the overall architecture question, you briefly discussed a feedback loop between metrics and the scanner to learn about passively obtained reachability information. Note that metrics-db is dumb, and it should stay dumb; it collects and sanitizes Tor network information, but it doesn't do smart things with them. If the bandwidth scanner wants to extract information from collected bridge descriptors containing statistics, it should do that itself. I'm happy to discuss how to extract that information, but the code should live in the bridge scanner codebase, maybe in a different module than the active scanning code. Of course, if we want to archive the results from looking at passive stats and how they influence which bridges we scan, that would be something for metrics-db to collect, sanitize, archive, and publicize.

At least that's what I came up with when thinking about the architecture a bit. Does that make sense to you?

  1. What percentage of current bridges are running on port 443?

You can look this up in the sanitized bridge network statuses, similar to how you looked up the numbers in the microdescriptor consensus. You'll find the last three days of sanitized bridge network statuses here:

$ rsync -arz metrics.torproject.org::metrics-recent/bridge-descriptors/statuses/ statuses
$ cd statuses/
$ grep -B1 "^s.* Running" 20120719-103704-* | grep "^r .* 443 " | wc -l
     499
$ grep -B1 "^s.* Running" 20120719-103704-* | grep "^r" | wc -l
     998

So, half of them. (No, I'm not faking the numbers here, it's 50.0% for real!)

comment:2 Changed 7 years ago by asn

Some comments:

a) The GFC DPI/probing description is not entirely correct, but it shouldn't matter too much for reachability testing. Read Philipp's paper for more information (for example, the fpr is in the ClientHello, the probers do full SSL and send a CREATE Tor cell, etc.).

b) How many bridges should you test each time? Should we test _all_ bridges, or just a small sample of bridges (with diverse characteristics (like country, tor version, etc.))?

c) How much do we care about burning a bridge during reachability testing?

d) In which cases can we detect blocking during reachability testing in real-time, so that we don't burn our whole list of bridges in a single testing session? Is the price of bridges higher than the implementation pain of detecting real-time blocking?

e) Should we set our own bridges for reachability testing? This way, we have control over the bridges and we can pivot their TCP port if the blocking is IP:PORT-specific etc..

f) What about reachability testing on bridges that support pluggable transports?

g) Is there a point in performing less-useful tests than Tor TLS/SSLv3 Handshake? Since we will always be interested in performing the "dangerous" Tor TLS/SSLv3 Handshake test we might as well start with it, instead of incrementally performing less-dangerous tests. This comes down to "how much do we care if we burn a single bridge"?

Or are you interested in finding if they will block you in real-time, and the point of all the incremental testing is to bisect in which layer it happens? This sounds like a fun idea, but maybe we should separate 'reachability testing' and 'real-time DPI censorship detection' for now so that the implementation plan does not get too bloated.

comment:3 Changed 7 years ago by ln5

Cc: ln5 added

comment:4 in reply to:  2 ; Changed 7 years ago by aagbsn

Cc: aagbsn added

Replying to asn:

Some comments:

a) The GFC DPI/probing description is not entirely correct, but it shouldn't matter too much for reachability testing. Read Philipp's paper for more information (for example, the fpr is in the ClientHello, the probers do full SSL and send a CREATE Tor cell, etc.).

b) How many bridges should you test each time? Should we test _all_ bridges, or just a small sample of bridges (with diverse characteristics (like country, tor version, etc.))?

No single measurement point should have a complete view of all the bridges.

How often are bridges being scanned? Hourly? Daily? Weekly? Longer?

Keep in mind that if BridgeDB stop handing out bridges that are known to be blocked, and replaces them with new bridges, those bridges may get blocked too (example, a client that is mining bridges receives new bridges and blocks those too). We can control the rate that BridgeDB consumes reachability data -- that gives us a knob to play around with the rate that bridges get burned (though this rate can be different than the scan rate)

c) How much do we care about burning a bridge during reachability testing?

What scenarios do you think could cause a bridge to get burned in a way that would not also apply to every other bridge being scanned as well?

d) In which cases can we detect blocking during reachability testing in real-time, so that we don't burn our whole list of bridges in a single testing session? Is the price of bridges higher than the implementation pain of detecting real-time blocking?

Perhaps double-checking bridges from another host and aborting the scan if the results differ by some configurable threshold would work for the active-direct methods.

e) Should we set our own bridges for reachability testing? This way, we have control over the bridges and we can pivot their TCP port if the blocking is IP:PORT-specific etc..

This sounds like a good use of contact information in the bridge-descriptor.

f) What about reachability testing on bridges that support pluggable transports?

This is also a necessary component for the Bridge Authority -- bridges (0.2.4) can spam whatever transport lines they please, and BridgeDB eats it up and advertises it. For every pluggable transport type, there ought to be a corresponding reachability test.

g) Is there a point in performing less-useful tests than Tor TLS/SSLv3 Handshake? Since we will always be interested in performing the "dangerous" Tor TLS/SSLv3 Handshake test we might as well start with it, instead of incrementally performing less-dangerous tests. This comes down to "how much do we care if we burn a single bridge"?

Yes, if it means that the *scanner* is harder to detect. We do not want the measurement points to be targetted.

Or are you interested in finding if they will block you in real-time, and the point of all the incremental testing is to bisect in which layer it happens? This sounds like a fun idea, but maybe we should separate 'reachability testing' and 'real-time DPI censorship detection' for now so that the implementation plan does not get too bloated.

comment:5 Changed 7 years ago by phw

Cc: identity.function@… added

comment:6 Changed 7 years ago by phw

I thought a little bit about using David's flash proxy concept for bridge reachability testing. Here is a short summary of the idea.

comment:7 Changed 7 years ago by phw

I am probably just stating the obvious here but perhaps I have some useful remarks:

  • UAE and Ethiopia do not actually block bridges but rather network packets. We need to distinguish between blocking Tor by protocol and by end points (bridges/relays). Also, Lebanon and Qatar are new to me. It would be helpful to add them to the censorship wiki in case anybody knows more about that. Apart from China, is there any country actually blocking bridges by IP and port at the moment?
  • I have a pcap file containing the connections of several hundred Chinese probes over a period of several weeks. It could be used for TTL analysis etc. Drop me an e-mail if you are interested in the data set.
  • A censor could also silently block bridges while at the same time initiating dummy Tor connections to them so we wouldn't see the block in the usage statistics. However, we will probably still be able to detect this by users complaining over e-mail.
  • The following is probably non-trivial but aside from the country-level usage statistics, the feedback loop to scan bridges could also consider passive individual bridge usage statistics. E.g., if a bridge recently saw a connection from a censoring country, it might be reachable. On the other hand, if a bridge has not seen connections in n days, it might be worth scanning.
  • As mentioned above, I think that packet fragmentation could be a good way to scan bridges in China without triggering follow-up scanning. On a more general note, we might have to come up with country-specific plans to scan bridges without leaving dozens of blocked bridges behind us.

Regarding the open questions:

  1. This paper contains some additional information about the Chinese filtering infrastructure; especially with respect to the filtering ASes.
  2. That's hard to answer, given that I could not initiate any communication with the probes other than the bridge scan. Figure 2a) in my paper shows when we were able to communicate with what we believed was the host "behind" the probe. Also, keep in mind that the IP hijacking is just a hypothesis at this point and the evidence is not strong enough to consider it as fact.
  3. I haven't seen any evidence for service enumeration. Is this still supposed to happen?
  4. It looks like the old concept of "scanning queues" in 15 minute intervals changed a couple of weeks ago. What I am seeing now is real-time scans which happen immediately after the GFC detected a potential Tor connection. Then, you won't see any scanners for ~20 minutes even though you continue initiating Tor connections. After ~20 minutes, there are real-time scans again. I have yet to take a closer look at the data. If anybody wants to help, drop me an e-mail.

comment:8 in reply to:  1 Changed 7 years ago by isis

Replying to karsten:

As for the text that is now in the description, I could imagine that it will turn into a tech report that we can make part of the sponsor deliverable. How do you think about starting a LaTeX document by cloning my public tech-reports.git repo (branch fivereports) and creating a new directory 2012/automatic-bridge-reachability-testing/ there? You could ask for your own public tech-reports.git repo to host your report sources.

Cloned it, and it'll take me a second to re-remember for the zillionth time how LaTex works. I'll poke weasel for a repo.

Also, I think I can help with two of your questions:

  1. Should this automation be considered part of OONI? Or BridgeDB? Or is it part of some other project?

I'd think the scanner should be considered part of OONI.

Hum. I should see what ioerror and hellais think. I don't want them to feel like OONI is getting cluttered with tickets that I'm the only one working on. And I know hellais is too busy to take on this project, but perhaps ioerror would want to hack on it as well. I'll poke them as well!

There's already a defined interface for BridgeDB to use the scanner's results (#5484). Ideally, the scanner would output its results in that format, so that BridgeDB can make its decisions which bridges to give out to which users. Also, metrics-db should learn about the very same file, sanitize any sensitive information in it, archive it, and make it public.

Seems simple enough: "BridgeDB will process a file with lines "fingerprint address:port cc,cc,cc" meaning that the bridge running on the given address and port is unreachable from the given countries."

Related to the overall architecture question, you briefly discussed a feedback loop between metrics and the scanner to learn about passively obtained reachability information. Note that metrics-db is dumb, and it should stay dumb; it collects and sanitizes Tor network information, but it doesn't do smart things with them. If the bandwidth scanner wants to extract information from collected bridge descriptors containing statistics, it should do that itself. I'm happy to discuss how to extract that information, but the code should live in the bridge scanner codebase, maybe in a different module than the active scanning code. Of course, if we want to archive the results from looking at passive stats and how they influence which bridges we scan, that would be something for metrics-db to collect, sanitize, archive, and publicize.


Right. I was imagining that one would take the usage statistics, probably the number of connections, from metrics-db, and when the connections coming from a certain client drop drastically for a specific country, then the bridge scanner would jump in and try to figure out what was going on by testing a subset of bridges from that country. Also, the scanner would be the one polling metrics-db to figure out when connections are dropping.

At least that's what I came up with when thinking about the architecture a bit. Does that make sense to you?

Yep! Totally makes sense.

  1. What percentage of current bridges are running on port 443?

You can look this up in the sanitized bridge network statuses, similar to how you looked up the numbers in the microdescriptor consensus. You'll find the last three days of sanitized bridge network statuses here:

$ rsync -arz metrics.torproject.org::metrics-recent/bridge-descriptors/statuses/ statuses
$ cd statuses/
$ grep -B1 "^s.* Running" 20120719-103704-* | grep "^r .* 443 " | wc -l
     499
$ grep -B1 "^s.* Running" 20120719-103704-* | grep "^r" | wc -l
     998

So, half of them. (No, I'm not faking the numbers here, it's 50.0% for real!)

Sweet, this makes it easier to test things by pretending to be an SSLObservatory, which is also already in Python.

comment:9 in reply to:  4 ; Changed 7 years ago by isis

Replying to aagbsn:

Replying to asn:

Some comments:

a) The GFC DPI/probing description is not entirely correct, but it shouldn't matter too much for reachability testing. Read Philipp's paper for more information (for example, the fpr is in the ClientHello, the probers do full SSL and send a CREATE Tor cell, etc.).

You're right, I totally wrote the wrong thing, thanks! I did read Philip's paper, and it was quite informative. Though I must have gotten mixed up when writing: I had understood that China fingerprinted on the TLS ClientHello (the up-front cert exchange in the v1 case, and the static ciphersuite list in the v2/v3 case), while Iran had actually censored based on the ServerHello (I think it was the two-hour expiration time on the cert?).

So oops, I'll be sure to correct it in the paper.

b) How many bridges should you test each time? Should we test _all_ bridges, or just a small sample of bridges (with diverse characteristics (like country, tor version, etc.))?

No single measurement point should have a complete view of all the bridges.

How often are bridges being scanned? Hourly? Daily? Weekly? Longer?

For testing the reachability tests, I was assuming that we'd set up our own bridges. In addition to not risking burning volunteer's bridges, we'd also have a more controlled setting for getting better data about what's safe to do from a given country and what isn't (at least for the present).

And, for the general case, once the tests are established, I don't think unwarranted scanning should be done very often, perhaps once per week or likely even less. By unwarranted, I mean, "we're not noticing a drastic drop in connections to bridges from this country, but we're going to scan from there anyway just as a check."

Also, in the case of unwarranted scanning, I would guess that scanning about 5 bridges would suffice, but I do not know the statistical percentage of them likely to be duds to begin with. Do either of you have any opinions on what would be a good number to scan, that would give us accurate results, but also be as small and risk averse as possible?

Keep in mind that if BridgeDB stop handing out bridges that are known to be blocked, and replaces them with new bridges, those bridges may get blocked too (example, a client that is mining bridges receives new bridges and blocks those too). We can control the rate that BridgeDB consumes reachability data -- that gives us a knob to play around with the rate that bridges get burned (though this rate can be different than the scan rate)

Hmm. This is interesting. Is it okay if the scanner always reports truthful information to bridge-db, and bridge-db is in charge of the lying? Because making a scanner that sometimes tells lies seems not as useful to me...

c) How much do we care about burning a bridge during reachability testing?

What scenarios do you think could cause a bridge to get burned in a way that would not also apply to every other bridge being scanned as well?

I'm not sure if I understand this question? Could you please explain more?

I was working under the assumption that if TestX gets BridgeA blocked in a country, that TestX would also get BridgeB and BridgeC blocked. I'm not sure if this is always correct, nor if that is what you were asking.

d) In which cases can we detect blocking during reachability testing in real-time, so that we don't burn our whole list of bridges in a single testing session? Is the price of bridges higher than the implementation pain of detecting real-time blocking?

Perhaps double-checking bridges from another host and aborting the scan if the results differ by some configurable threshold would work for the active-direct methods.

Well, preferably we should have some bridges for testing that we control, so that we can see if the scanner is making connections to them, and we can also try connecting again later to see if the connection still works.

e) Should we set our own bridges for reachability testing? This way, we have control over the bridges and we can pivot their TCP port if the blocking is IP:PORT-specific etc..

This sounds like a good use of contact information in the bridge-descriptor.

Ah, or that! But I wouldn't want to cause more work for all the awesome people who volunteer.

f) What about reachability testing on bridges that support pluggable transports?

This is also a necessary component for the Bridge Authority -- bridges (0.2.4) can spam whatever transport lines they please, and BridgeDB eats it up and advertises it. For every pluggable transport type, there ought to be a corresponding reachability test.

I was wondering about this and forgot to add it as a question. Is there any way to test that an Obfs2 bridge is actually running without compiling Obfsproxy and controlling an Obfs2-configured Tor client?

g) Is there a point in performing less-useful tests than Tor TLS/SSLv3 Handshake? Since we will always be interested in performing the "dangerous" Tor TLS/SSLv3 Handshake test we might as well start with it, instead of incrementally performing less-dangerous tests. This comes down to "how much do we care if we burn a single bridge"?

Yes, if it means that the *scanner* is harder to detect. We do not want the measurement points to be targetted.

I ordered them by how innocuous I believe they will be to any watching party. Of course we're interested in whether or not a full Tor handshake can be completed, but this seems to carry a pretty high risk of fiery death.

Or are you interested in finding if they will block you in real-time, and the point of all the incremental testing is to bisect in which layer it happens? This sounds like a fun idea, but maybe we should separate 'reachability testing' and 'real-time DPI censorship detection' for now so that the implementation plan does not get too bloated.

That's interesting too, but I'd only do that on my own bridges, and I'd probably run some sort of service faker and maybe a lighttpd server with some crap on it, so that they block by IP:port and I can just change ports to continue testing. Also, incremental testing would be useful for places that have just implemented censorship/DPI devices, to figure out what level they're blocking Tor at.

It's not that much more work to write all of those tests, and I'd imagine they would all come in useful at some point. Even if not, they are things that OONI might be able to recycle into some other test. Plus, it's Python, yo'. None of this C 'static void somefunction(struct foo, int bar, char baz){blah blah blah}' nonsense. :) I hear we live in teh futures. /endlanguagetrolling

comment:10 in reply to:  7 ; Changed 7 years ago by isis

Status: newaccepted

Replying to phw:

I am probably just stating the obvious here but perhaps I have some useful remarks:

  • UAE and Ethiopia do not actually block bridges but rather network packets. We need to distinguish between blocking Tor by protocol and by end points (bridges/relays). Also, Lebanon and Qatar are new to me. It would be helpful to add them to the censorship wiki in case anybody knows more about that. Apart from China, is there any country actually blocking bridges by IP and port at the moment?

UAE and Ethiopia block all SSL, correct?

If so, there seems to be not much that Tor or any bridge tests can do about that, at least not until more pluggable transports are deployed.

  • I have a pcap file containing the connections of several hundred Chinese probes over a period of several weeks. It could be used for TTL analysis etc. Drop me an e-mail if you are interested in the data set.

Awesome!

  • A censor could also silently block bridges while at the same time initiating dummy Tor connections to them so we wouldn't see the block in the usage statistics. However, we will probably still be able to detect this by users complaining over e-mail.

That would be horrible. Though, we could still do frequency analysis on the traffic going through a bridge we suspect that of happening to, because I doubt that a dummy connection would be capable of generating realistic looking traffic (it's possible, it would just be more work on the censor's part). However, monitoring traffic frequency would require extra code added to little-t tor, it's obviously not something we could do from a scanner.

  • The following is probably non-trivial but aside from the country-level usage statistics, the feedback loop to scan bridges could also consider passive individual bridge usage statistics. E.g., if a bridge recently saw a connection from a censoring country, it might be reachable. On the other hand, if a bridge has not seen connections in n days, it might be worth scanning.

That seems do-able...and useful too.

If that were to be implemented, I think it would make sense for metrics-db to have an extra folder or field, some way of keeping state and storing the number of connections from a given country over time, and then the scanner could both update that data and parse it every so often, then feed it into whatever algorithm decides if changes look too drastic or sketch.

  • As mentioned above, I think that packet fragmentation could be a good way to scan bridges in China without triggering follow-up scanning. On a more general note, we might have to come up with country-specific plans to scan bridges without leaving dozens of blocked bridges behind us.

Yes, either the scanner will have to be very "intelligent", or there will have to be per-country rules. I'd opt for the former, because it's less work later if the thing adapts by itself.

Regarding the open questions:

  1. This paper contains some additional information about the Chinese filtering infrastructure; especially with respect to the filtering ASes.

Neat, reading material for the next flight! :)

  1. That's hard to answer, given that I could not initiate any communication with the probes other than the bridge scan. Figure 2a) in my paper shows when we were able to communicate with what we believed was the host "behind" the probe. Also, keep in mind that the IP hijacking is just a hypothesis at this point and the evidence is not strong enough to consider it as fact.

Right. I looked a bit more into how this could be done, and I found a tool called exabgp which enables the injection of ASes (actually most state-level actors probably don't need that, because they control the ASes anyway), and then applying packet-labels with LSRs and using MPLS to direct the appropriate flow to/from the probe and the rest to the actual host. This IP-hijacking, if it is indeed happening is something which I do not yet fully comprehend, but it is very interesting.

  1. I haven't seen any evidence for service enumeration. Is this still supposed to happen?

This is what I read, and if it is the case, i.e., that China blocks by IP:port if there are other services, and elif by IP blackholing, then it would make testing easier because, as I mentioned earlier, it would be pretty trivial to fake services on a bridge that we set up (that way we could just change the port to continue bridge reachability testing). If you're not seeing it though, I suppose you've probably got more info on this than anyone else, so it might not be happening anymore. Does this mean they always dumb block on IP:port? Or are they always blackholing?

  1. It looks like the old concept of "scanning queues" in 15 minute intervals changed a couple of weeks ago. What I am seeing now is real-time scans which happen immediately after the GFC detected a potential Tor connection. Then, you won't see any scanners for ~20 minutes even though you continue initiating Tor connections. After ~20 minutes, there are real-time scans again. I have yet to take a closer look at the data. If anybody wants to help, drop me an e-mail.

I'd take a look at it. Do you think the "twelve hours without a successful connection from the probe" still applies to unblocking?

comment:11 in reply to:  10 Changed 7 years ago by phw

Replying to isis:

Replying to phw:

  • UAE and Ethiopia do not actually block bridges but rather network packets. We need to distinguish between blocking Tor by protocol and by end points (bridges/relays). Also, Lebanon and Qatar are new to me. It would be helpful to add them to the censorship wiki in case anybody knows more about that. Apart from China, is there any country actually blocking bridges by IP and port at the moment?

UAE and Ethiopia block all SSL, correct?

If so, there seems to be not much that Tor or any bridge tests can do about that, at least not until more pluggable transports are deployed.

It does not look like UAE is blocking Tor at the moment but Ethiopia is dropping the Tor TLS client hello and server hello. All the details are in the one and only censorship wiki! SSL in general is not blocked, though.

  1. I haven't seen any evidence for service enumeration. Is this still supposed to happen?

This is what I read, and if it is the case, i.e., that China blocks by IP:port if there are other services, and elif by IP blackholing, then it would make testing easier because, as I mentioned earlier, it would be pretty trivial to fake services on a bridge that we set up (that way we could just change the port to continue bridge reachability testing). If you're not seeing it though, I suppose you've probably got more info on this than anyone else, so it might not be happening anymore. Does this mean they always dumb block on IP:port? Or are they always blackholing?

Yes, it looks like bridges (and even relays) are blocked by IP:Port. I only saw IP blackholing for the directory authorities. I think that from the GFC's point of view that's quite good: It has no collateral damage and is still highly effective.

  1. It looks like the old concept of "scanning queues" in 15 minute intervals changed a couple of weeks ago. What I am seeing now is real-time scans which happen immediately after the GFC detected a potential Tor connection. Then, you won't see any scanners for ~20 minutes even though you continue initiating Tor connections. After ~20 minutes, there are real-time scans again. I have yet to take a closer look at the data. If anybody wants to help, drop me an e-mail.

I'd take a look at it. Do you think the "twelve hours without a successful connection from the probe" still applies to unblocking?

I reproduced this a couple of weeks ago and was able to unblock bridges after just 2-3 hours. The time seems to vary but it should still work, yes.

comment:12 in reply to:  9 Changed 7 years ago by asn

Replying to isis:

Replying to aagbsn:

Replying to asn:

b) How many bridges should you test each time? Should we test _all_ bridges, or just a small sample of bridges (with diverse characteristics (like country, tor version, etc.))?

No single measurement point should have a complete view of all the bridges.

How often are bridges being scanned? Hourly? Daily? Weekly? Longer?

For testing the reachability tests, I was assuming that we'd set up our own bridges. In addition to not risking burning volunteer's bridges, we'd also have a more controlled setting for getting better data about what's safe to do from a given country and what isn't (at least for the present).

And, for the general case, once the tests are established, I don't think unwarranted scanning should be done very often, perhaps once per week or likely even less. By unwarranted, I mean, "we're not noticing a drastic drop in connections to bridges from this country, but we're going to scan from there anyway just as a check."

Also, in the case of unwarranted scanning, I would guess that scanning about 5 bridges would suffice, but I do not know the statistical percentage of them likely to be duds to begin with. Do either of you have any opinions on what would be a good number to scan, that would give us accurate results, but also be as small and risk averse as possible?

I think I would approach this problem by finding some bridge properties that are interesting from a reachability PoV (tor version, country, uptime, etc.) and then I would try to compile a diverse list of bridges wrt those properties.

Finding the right properties and the right amount of bridges will probably require some tweaking and real life testing.

c) How much do we care about burning a bridge during reachability testing?

What scenarios do you think could cause a bridge to get burned in a way that would not also apply to every other bridge being scanned as well?

I'm not sure if I understand this question? Could you please explain more?

Oops, sorry for the confusion. I wanted to ask: how much do we care if a bridge gets blocked during reachability testing? That is, how much do we value our bridges?

I think that answering this question will help us tackle many other questions (including "how many bridges should we test each time?" and "which tests should we run for each bridge?").

I suspect that the answer to this question is variable and depends on many factors (like, how many unblocked bridges we have in total, where the bridge is located, how fast it is, how much usage it sees, etc.).

f) What about reachability testing on bridges that support pluggable transports?

This is also a necessary component for the Bridge Authority -- bridges (0.2.4) can spam whatever transport lines they please, and BridgeDB eats it up and advertises it. For every pluggable transport type, there ought to be a corresponding reachability test.

I was wondering about this and forgot to add it as a question. Is there any way to test that an Obfs2 bridge is actually running without compiling Obfsproxy and controlling an Obfs2-configured Tor client?

You can't be 100% sure without running an obfs2-configured Tor client.

I mean, you can run cheap tests like TCP port scans, or checking the entropy of the data returned by the server, but you will never be sure that it's obfs2 without speaking the Tor protocol with the bridge.

This pretty-much comes down to #6396.

comment:13 Changed 7 years ago by karsten

Isis and I talked about the project timeline. We moved the August 31 milestone to September 15 and the September 30 milestone to October 7.

comment:14 Changed 6 years ago by karsten

Keywords: SponsorZ added; SponsorF20121101 removed

The November deadline has passed, and we haven't made any visible progress on this ticket. That doesn't mean we should give up on it, but it won't be part of sponsor F year 2. Removing the sponsor deadline tag and turning this ticket into a sponsor Z project.

comment:15 Changed 6 years ago by isis

Status: acceptedneeds_revision

At the end of September, it was decided that OONI's framework, which is what the bridge tests were based on, should be changed. The changed was supposed to be backwards compatible, though it was not. At first I thought that I should just write the tests using whatever I felt like using, and have them be their own separate project, but I really wanted them to be part of OONI.

I had already been fighting and hacking around OONI's earlier framework some fair amount to get the tests to work, and I figured it shouldn't be difficult to get them to run with the new framework. I completely rewrote the class converter which was supposed to provide backwards compatibility. Then I rewrote more things. Then I fixed more things. I spent the entire month of October trying to get everything to work, and it was incredibly frustrating, to say the least. I really dislike failing, and even more admitting when I have (which is why it's so difficult for me to update these tickets), but that's bullshit and I need to swallow my pride.

The biggest problem I am facing is twisted's reactor. In the old framework, tests would get called by a parent/wrapper script which controlled starting and stopping the reactor. I need to have a persistent Tor process connected to the reactor, which means fighting the wrapper. In the new framework, twisted trial (twisted's unittesting framework/module) is used as the basis of the design for OONI's framework. Trial also controls the reactor, and in a somewhat more severe manner. Per test (i.e. per single unittest method), the reactor is started, the class is set up, initialized, the parameters to the test method are set up in the instantiation, and then passed to the test method, and then the reactor is cleared and stopped. This is crucial to trial, because events left sitting on the reactor can fire the next time it is started, but not necessarily. If the reactor isn't cleared, it leads to very strange behaviour that is difficult to debug, so it makes sense for trial to do such a thing. I am still not entirely sure that the bridge tests will work with the new OONI framework, given that I already had an unexpected amount of difficultly with the old.

Also, I am sorry that sometimes I fail at communicating.

So, tl;dr: OONI's alpha release got pushed back by one month. The bridge tests are part of OONI, and should be done at that time.

comment:16 Changed 6 years ago by hellais

What is the current status on this? Should this ticket be closed or what should be done of it?

comment:17 Changed 6 years ago by asn

I guess this ticket could be closed since Isis stopped working on this (am I right?), and related work should happen in related tickets like #6396.

Isis?

comment:18 Changed 5 years ago by hellais

Resolution: not a bug
Status: needs_revisionclosed

Since now most of the bridge reachability work is being done on #12544. All updates should be done there.

Note: See TracTickets for help on using tickets.