Opened 7 months ago

Last modified 2 days ago

#30368 needs_review task

Run some tests to check reachability of snowflake proxies

Reported by: cohosh Owned by: cohosh
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords: anti-censorship-roadmap
Cc: dcf, arlolra, cohosh, phw Actual Points:
Parent ID: Points:
Reviewer: phw Sponsor: Sponsor28-can

Description

Our standalone proxies were recently blocked in China: #30350

We should start running some probe tests like we are for obfs4 to see whether this blocking was a one-off event and detect blocking of new proxy instances.

Child Tickets

Attachments (4)

snowflake-reachability-2019-05-06.pdf (7.3 KB) - added by cohosh 7 months ago.
snowflake-reachability-2019-05-09.pdf (9.7 KB) - added by cohosh 7 months ago.
snowflake-reachability-2019-05-27.pdf (20.3 KB) - added by cohosh 7 months ago.
snowflake-reachability-2019-06-13.pdf (37.4 KB) - added by cohosh 6 months ago.

Download all attachments as: .zip

Change History (26)

comment:1 Changed 7 months ago by cohosh

Summary: Run some tests to check reachability of snowflake proxies in ChinaRun some tests to check reachability of snowflake proxies

comment:2 Changed 7 months ago by cohosh

Interestingly, the server that runs the standalone proxies is still TCP reachable from the VPS in China. I can telnet into ports 22, 80, and 443.

For these tests I think we'll have to go the full bootstrapping route and send actual STUN traffic to the proxies.

comment:3 Changed 7 months ago by cohosh

Here's a branch that does probing of snowflake proxies: https://github.com/cohosh/bridgetest/tree/snowflake

Some notes:

  • we make 10 tries to connect through snowflake per probe period. This is to ensure we get a reasonable selection of the available proxies. There's no guarantee we'll get all of them but this seems like a good number of tries for now
  • I cut the Tor connection timeout down to 90 seconds from 180. This is because once one proxy hits the 30 second timeout (as described in #25429), it will attempt to find another snowflake. However, we only want to measure the bootstrap progress for one snowflake at a time. On the other hand, this means that in cases where the proxy is reachable, we sometimes only see a bootstrap up to 25% or 50% because we didn't give it enough time to complete.
  • Snowflake doesn't log the IP address of the snowflake proxies it connects to (since #21304) so we have to do a tcpdump and then analysis to see which proxy we actually got.

I'll post some graphs here once we have more data.

comment:4 Changed 7 months ago by cohosh

Added a plot of snowflake reachability from the VPS in China.

Connections at or above the green line (at bootstrap = 11%) indicate that the snowflake proxy IP address was reachable. Connections below the green line indicate they were blocked. As mentioned above, the timeout for bootstrapping is set very low (90s) so not all connections through a reachable proxy were able to bootstrap fully.

snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.

snowflake-cohosh is another standalone proxy-go instance I set up as a result of the blocking

snowflake-proxy-[1,2] are additional proxies that we don't know anything about.

Last edited 7 months ago by cohosh (previous) (diff)

Changed 7 months ago by cohosh

comment:5 in reply to:  4 ; Changed 7 months ago by dcf

Replying to cohosh:

snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.

Yeah. A >50% success rate, a rate that matches that of the other proxy you set up, doesn't look like IP blocking. Possibly not even blocking at all. It could be that in #30350 the reporter just experienced Snowflake not working very well yet, and wrongly interpreted it as the result of blocking. I know that I've never been able to use Snowflake for more than an hour or so because it always quits working--I suspected #25429 but never tracked it down, and of course there have been plenty of other bugs.

comment:6 in reply to:  5 ; Changed 7 months ago by cohosh

Replying to dcf:

Replying to cohosh:

snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.

Yeah. A >50% success rate, a rate that matches that of the other proxy you set up, doesn't look like IP blocking. Possibly not even blocking at all. It could be that in #30350 the reporter just experienced Snowflake not working very well yet, and wrongly interpreted it as the result of blocking. I know that I've never been able to use Snowflake for more than an hour or so because it always quits working--I suspected #25429 but never tracked it down, and of course there have been plenty of other bugs.

Hm, fwiw, when I was doing manual checks around the time the ticket was filed the snowflake.bamsoftware.com proxy-go instances were reliable and reachable from the US but definitely not from the VPS in China. At the same time, the additional proxy-go instances I set up on another server was definitely reachable from both places.

What do you mean by success rate here? The other proxy I set up is reachable 100% of the time (in that it bootstraps past the 10% that all snowflake connections automatically bootstrap to).

I think there's more going on here than just the usual snowflake bugs, but I think #25429 will go a long way to mitigate the impact of whatever is going on.

comment:7 in reply to:  6 Changed 7 months ago by dcf

Replying to cohosh:

Hm, fwiw, when I was doing manual checks around the time the ticket was filed the snowflake.bamsoftware.com proxy-go instances were reliable and reachable from the US but definitely not from the VPS in China. At the same time, the additional proxy-go instances I set up on another server was definitely reachable from both places.

I believe you. That's good evidence that there is some sort of targeted blocking. It seems to be less severe, at least, since May 3 according to the tests. We don't have tests from beforehand to know whether it used to be equally unreliable.

What do you mean by success rate here? The other proxy I set up is reachable 100% of the time (in that it bootstraps past the 10% that all snowflake connections automatically bootstrap to).

I know that anything past 10% means the IP of the proxy was reachable, but mentally I'm not quite thinking of a less than complete bootstrap as complete "success" because to a user it looks like failure. E.g. in comment:16:ticket:30350 the user got to 75% after 13 seconds but then no further progress. So I'm thinking of it in kind of a "works/doesn't work" way, and in that way, snowflake-bridge and snowflake-cohosh seem to have roughly equal utility according to the data so far. While we know that the GFW sometimes fails open and allows access to blocked IP addresses, this doesn't look like that because the success rate is too high.

Or maybe there really is some kind of protocol detection happening, once the WebRTC DataChannel is connected, and it's not simple IP blocking. That would be consistent with the evidence. I would not expect it as a first step of blocking, but certainly my intuition has been wrong before.

Changed 7 months ago by cohosh

comment:8 Changed 7 months ago by cohosh

Attached an updated reachability graph.

There are some very long periods of time where snowflake-bridge is unreachable, and it's strange that snowflake-proxy-3 seems to be unreachable this entire time.

I know that anything past 10% means the IP of the proxy was reachable, but mentally I'm not quite thinking of a less than complete bootstrap as complete "success" because to a user it looks like failure. E.g. in comment:16:ticket:30350 the user got to 75% after 13 seconds but then no further progress.

That's fair. I had to set the circuit timeout really low in order to prevent the snowflake client from trying to reconnect to another snowflake after 30 seconds which would mess with our test results the way I've set them up now. I think the ones that actually got to 75% would have gotten to 100% in a few more seconds, but maybe that doesn't matter because it's taking so long anyway.

comment:9 Changed 7 months ago by cohosh

Starting this afternoon (it may have been happening before but I was unaware), access to stun.l.google.com:19302 was blocked.

Simply changing the client line to use
-ice stun:stunserver.org
resulted in a full bootstrap.

I'm going to refine the snowflake tests to tell us more information about precisely where the blocking is occurring because this appears to be changing. The candidates for blocking I can think of are:

  • At the ICE Gathering stage (the connection to the client's STUN server)*
  • At the signaling stage (the connection to the domain-fronted broker)
  • At the Connectivity checking stage (the UDP connection to the snowflake proxy)*
  • At the connected stage (the TCP connection to the snowflake proxy)

The stages with asterisks (*) are where we've seen blocking occur so far

comment:10 in reply to:  8 Changed 7 months ago by dcf

Replying to cohosh:

There are some very long periods of time where snowflake-bridge is unreachable, and it's strange that snowflake-proxy-3 seems to be unreachable this entire time.

Ok, I agree, it looks like a qualitative difference between snowflake-bridge and snowflake-cohosh.

comment:11 Changed 7 months ago by cohosh

I've added snowflaketest and snowflake-stage.lua to probe and analyze snowflake reachability at a finer level https://github.com/cohosh/bridgetest/commits/snowflake

I've also noticed while running this test that China now appears to have blocked stunserver.org. That was fast.

Changed 7 months ago by cohosh

comment:12 Changed 7 months ago by cohosh

Updated this ticket with recent test results. I've changed the tests to measure which stage the ICE protocol got to on the client side. I switched to using stun.ekiga.net in China since the Google STUN servers started getting blocked. It seems once they blocked those stun servers, all snowflake bridges became reachable again. I've confirmed that stun.l.google.com is still being blocked.

Some of the plots look weird (only the Chinese location access them). This is probably due to an imperfect pcap parsing script that accidentally records the VPS's own IP address as the snowflake's IP.

comment:13 in reply to:  12 Changed 7 months ago by dcf

Replying to cohosh:

Updated this ticket with recent test results. I've changed the tests to measure which stage the ICE protocol got to on the client side. I switched to using stun.ekiga.net in China since the Google STUN servers started getting blocked. It seems once they blocked those stun servers, all snowflake bridges became reachable again. I've confirmed that stun.l.google.com is still being blocked.

Oh wow. This is really interesting. I wonder if there was any collateral damage as a result of the Google STUN servers being blocked.

comment:14 Changed 6 months ago by gaba

Sponsor: Sponsor28-can

Changed 6 months ago by cohosh

comment:15 Changed 6 months ago by cohosh

Updated the results. As before, my script still has some flaws where the CA probe site records its own IP address for some of the values (which is why you see only CN probes to some snowflakes).

Overall, since switching the tests to the new STUN server, there have been no problems reaching snowflakes. We might get more interesting results after changing the STUN server in the release (#30579)

comment:16 Changed 6 months ago by gaba

Keywords: anti-censorship-roadmap added

comment:17 Changed 6 months ago by cohosh

Status: assignedaccepted

comment:18 Changed 5 months ago by gaba

Keywords: anti-censorship-roadmap-august added; anti-censorship-roadmap removed

comment:19 Changed 3 months ago by gaba

Keywords: anti-censorship-roadmap added; anti-censorship-roadmap-august removed

comment:20 Changed 3 days ago by cohosh

Reviewer: phw
Status: acceptedneeds_review

I've made some recent changes to the reachability scripts for #32657. I'll keep this ticket for a review of the correctness of those scripts and whether there are other tests that would be useful for us here.

comment:21 Changed 3 days ago by cohosh

The code currently lives here: https://github.com/cohosh/bridgetest

comment:22 in reply to:  21 Changed 2 days ago by phw

Replying to cohosh:

The code currently lives here: https://github.com/cohosh/bridgetest


I took a look at the Python code that processes pcaps and left a bunch of comments in the respective commits. I did not review the R and lua scripts. Please let me know if you want me to review these as well.

Note: See TracTickets for help on using tickets.