Run some tests to check reachability of snowflake proxies

added anti-censorship-roadmap component::circumvention/snowflake owner::cohosh priority::medium resolution::fixed reviewer::phw severity::normal sponsor::28-can status::closed type::task labels

Trac:
Summary: Run some tests to check reachability of snowflake proxies in China to Run some tests to check reachability of snowflake proxies

Interestingly, the server that runs the standalone proxies is still TCP reachable from the VPS in China. I can telnet into ports 22, 80, and 443.

For these tests I think we'll have to go the full bootstrapping route and send actual STUN traffic to the proxies.

Here's a branch that does probing of snowflake proxies: https://github.com/cohosh/bridgetest/tree/snowflake

Some notes:

we make 10 tries to connect through snowflake per probe period. This is to ensure we get a reasonable selection of the available proxies. There's no guarantee we'll get all of them but this seems like a good number of tries for now
I cut the Tor connection timeout down to 90 seconds from 180. This is because once one proxy hits the 30 second timeout (as described in #25429 (moved)), it will attempt to find another snowflake. However, we only want to measure the bootstrap progress for one snowflake at a time. On the other hand, this means that in cases where the proxy is reachable, we sometimes only see a bootstrap up to 25% or 50% because we didn't give it enough time to complete.
Snowflake doesn't log the IP address of the snowflake proxies it connects to (since #21304 (moved)) so we have to do a tcpdump and then analysis to see which proxy we actually got.

I'll post some graphs here once we have more data.

Added a plot of snowflake reachability from the VPS in China.

Connections at or above the green line (at bootstrap = 11%) indicate that the snowflake proxy IP address was reachable. Connections below the green line indicate they were blocked. As mentioned above, the timeout for bootstrapping is set very low (90s) so not all connections through a reachable proxy were able to bootstrap fully.

snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.

snowflake-cohosh is another standalone proxy-go instance I set up as a result of the blocking

snowflake-proxy-[1,2] are additional proxies that we don't know anything about.

Trac:
snowflake-reachability-2019-05-06.pdf

Replying to cohosh:

snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.

Yeah. A >50% success rate, a rate that matches that of the other proxy you set up, doesn't look like IP blocking. Possibly not even blocking at all. It could be that in #30350 (moved) the reporter just experienced Snowflake not working very well yet, and wrongly interpreted it as the result of blocking. I know that I've never been able to use Snowflake for more than an hour or so because it always quits working--I suspected #25429 (moved) but never tracked it down, and of course there have been plenty of other bugs.

Replying to dcf:

Replying to cohosh:

snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.

Yeah. A >50% success rate, a rate that matches that of the other proxy you set up, doesn't look like IP blocking. Possibly not even blocking at all. It could be that in #30350 (moved) the reporter just experienced Snowflake not working very well yet, and wrongly interpreted it as the result of blocking. I know that I've never been able to use Snowflake for more than an hour or so because it always quits working--I suspected #25429 (moved) but never tracked it down, and of course there have been plenty of other bugs. Hm, fwiw, when I was doing manual checks around the time the ticket was filed the snowflake.bamsoftware.com proxy-go instances were reliable and reachable from the US but definitely not from the VPS in China. At the same time, the additional proxy-go instances I set up on another server was definitely reachable from both places.

What do you mean by success rate here? The other proxy I set up is reachable 100% of the time (in that it bootstraps past the 10% that all snowflake connections automatically bootstrap to).

I think there's more going on here than just the usual snowflake bugs, but I think #25429 (moved) will go a long way to mitigate the impact of whatever is going on.

Replying to cohosh:

Hm, fwiw, when I was doing manual checks around the time the ticket was filed the snowflake.bamsoftware.com proxy-go instances were reliable and reachable from the US but definitely not from the VPS in China. At the same time, the additional proxy-go instances I set up on another server was definitely reachable from both places.

I believe you. That's good evidence that there is some sort of targeted blocking. It seems to be less severe, at least, since May 3 according to the tests. We don't have tests from beforehand to know whether it used to be equally unreliable.

What do you mean by success rate here? The other proxy I set up is reachable 100% of the time (in that it bootstraps past the 10% that all snowflake connections automatically bootstrap to).

I know that anything past 10% means the IP of the proxy was reachable, but mentally I'm not quite thinking of a less than complete bootstrap as complete "success" because to a user it looks like failure. E.g. in comment:16:ticket:30350 the user got to 75% after 13 seconds but then no further progress. So I'm thinking of it in kind of a "works/doesn't work" way, and in that way, snowflake-bridge and snowflake-cohosh seem to have roughly equal utility according to the data so far. While we know that the GFW sometimes fails open and allows access to blocked IP addresses, this doesn't look like that because the success rate is too high.

Or maybe there really is some kind of protocol detection happening, once the WebRTC DataChannel is connected, and it's not simple IP blocking. That would be consistent with the evidence. I would not expect it as a first step of blocking, but certainly my intuition has been wrong before.

Trac:
snowflake-reachability-2019-05-09.pdf

Attached an updated reachability graph.

There are some very long periods of time where snowflake-bridge is unreachable, and it's strange that snowflake-proxy-3 seems to be unreachable this entire time.

I know that anything past 10% means the IP of the proxy was reachable, but mentally I'm not quite thinking of a less than complete bootstrap as complete "success" because to a user it looks like failure. E.g. in comment:16:ticket:30350 the user got to 75% after 13 seconds but then no further progress. That's fair. I had to set the circuit timeout really low in order to prevent the snowflake client from trying to reconnect to another snowflake after 30 seconds which would mess with our test results the way I've set them up now. I think the ones that actually got to 75% would have gotten to 100% in a few more seconds, but maybe that doesn't matter because it's taking so long anyway.

Starting this afternoon (it may have been happening before but I was unaware), access to stun.l.google.com:19302 was blocked.

Simply changing the client line to use -ice stun:stunserver.org resulted in a full bootstrap.

I'm going to refine the snowflake tests to tell us more information about precisely where the blocking is occurring because this appears to be changing. The candidates for blocking I can think of are:

At the ICE Gathering stage (the connection to the client's STUN server)*
At the signaling stage (the connection to the domain-fronted broker)
At the Connectivity checking stage (the UDP connection to the snowflake proxy)*
At the connected stage (the TCP connection to the snowflake proxy)

The stages with asterisks (*) are where we've seen blocking occur so far

Replying to cohosh:

There are some very long periods of time where snowflake-bridge is unreachable, and it's strange that snowflake-proxy-3 seems to be unreachable this entire time.

Ok, I agree, it looks like a qualitative difference between snowflake-bridge and snowflake-cohosh.

I've added snowflaketest and snowflake-stage.lua to probe and analyze snowflake reachability at a finer level https://github.com/cohosh/bridgetest/commits/snowflake

I've also noticed while running this test that China now appears to have blocked stunserver.org. That was fast.

Trac:
snowflake-reachability-2019-05-27.pdf

Updated this ticket with recent test results. I've changed the tests to measure which stage the ICE protocol got to on the client side. I switched to using stun.ekiga.net in China since the Google STUN servers started getting blocked. It seems once they blocked those stun servers, all snowflake bridges became reachable again. I've confirmed that stun.l.google.com is still being blocked.

Some of the plots look weird (only the Chinese location access them). This is probably due to an imperfect pcap parsing script that accidentally records the VPS's own IP address as the snowflake's IP.

Replying to cohosh:

Updated this ticket with recent test results. I've changed the tests to measure which stage the ICE protocol got to on the client side. I switched to using stun.ekiga.net in China since the Google STUN servers started getting blocked. It seems once they blocked those stun servers, all snowflake bridges became reachable again. I've confirmed that stun.l.google.com is still being blocked.

Oh wow. This is really interesting. I wonder if there was any collateral damage as a result of the Google STUN servers being blocked.

Trac:
Sponsor: N/A to Sponsor28-can

Trac:
snowflake-reachability-2019-06-13.pdf

Updated the results. As before, my script still has some flaws where the CA probe site records its own IP address for some of the values (which is why you see only CN probes to some snowflakes).

Overall, since switching the tests to the new STUN server, there have been no problems reaching snowflakes. We might get more interesting results after changing the STUN server in the release (#30579 (moved))

Run some tests to check reachability of snowflake proxies

Child items ...

Activity