Our standalone proxies were recently blocked in China: #30350 (moved)
We should start running some probe tests like we are for obfs4 to see whether this blocking was a one-off event and detect blocking of new proxy instances.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
we make 10 tries to connect through snowflake per probe period. This is to ensure we get a reasonable selection of the available proxies. There's no guarantee we'll get all of them but this seems like a good number of tries for now
I cut the Tor connection timeout down to 90 seconds from 180. This is because once one proxy hits the 30 second timeout (as described in #25429 (moved)), it will attempt to find another snowflake. However, we only want to measure the bootstrap progress for one snowflake at a time. On the other hand, this means that in cases where the proxy is reachable, we sometimes only see a bootstrap up to 25% or 50% because we didn't give it enough time to complete.
Snowflake doesn't log the IP address of the snowflake proxies it connects to (since #21304 (moved)) so we have to do a tcpdump and then analysis to see which proxy we actually got.
I'll post some graphs here once we have more data.
Added a plot of snowflake reachability from the VPS in China.
Connections at or above the green line (at bootstrap = 11%) indicate that the snowflake proxy IP address was reachable. Connections below the green line indicate they were blocked. As mentioned above, the timeout for bootstrapping is set very low (90s) so not all connections through a reachable proxy were able to bootstrap fully.
snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.
snowflake-cohosh is another standalone proxy-go instance I set up as a result of the blocking
snowflake-proxy-[1,2] are additional proxies that we don't know anything about.
snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.
Yeah. A >50% success rate, a rate that matches that of the other proxy you set up, doesn't look like IP blocking. Possibly not even blocking at all. It could be that in #30350 (moved) the reporter just experienced Snowflake not working very well yet, and wrongly interpreted it as the result of blocking. I know that I've never been able to use Snowflake for more than an hour or so because it always quits working--I suspected #25429 (moved) but never tracked it down, and of course there have been plenty of other bugs.
snowflake-bridge is the location of our fallback standalone proxy-go instances that we know have been blocked. It's interesting that blocking of this proxy comes and goes.
Yeah. A >50% success rate, a rate that matches that of the other proxy you set up, doesn't look like IP blocking. Possibly not even blocking at all. It could be that in #30350 (moved) the reporter just experienced Snowflake not working very well yet, and wrongly interpreted it as the result of blocking. I know that I've never been able to use Snowflake for more than an hour or so because it always quits working--I suspected #25429 (moved) but never tracked it down, and of course there have been plenty of other bugs.
Hm, fwiw, when I was doing manual checks around the time the ticket was filed the snowflake.bamsoftware.com proxy-go instances were reliable and reachable from the US but definitely not from the VPS in China. At the same time, the additional proxy-go instances I set up on another server was definitely reachable from both places.
What do you mean by success rate here? The other proxy I set up is reachable 100% of the time (in that it bootstraps past the 10% that all snowflake connections automatically bootstrap to).
I think there's more going on here than just the usual snowflake bugs, but I think #25429 (moved) will go a long way to mitigate the impact of whatever is going on.
Hm, fwiw, when I was doing manual checks around the time the ticket was filed the snowflake.bamsoftware.com proxy-go instances were reliable and reachable from the US but definitely not from the VPS in China. At the same time, the additional proxy-go instances I set up on another server was definitely reachable from both places.
I believe you. That's good evidence that there is some sort of targeted blocking. It seems to be less severe, at least, since May 3 according to the tests. We don't have tests from beforehand to know whether it used to be equally unreliable.
What do you mean by success rate here? The other proxy I set up is reachable 100% of the time (in that it bootstraps past the 10% that all snowflake connections automatically bootstrap to).
I know that anything past 10% means the IP of the proxy was reachable, but mentally I'm not quite thinking of a less than complete bootstrap as complete "success" because to a user it looks like failure. E.g. in comment:16:ticket:30350 the user got to 75% after 13 seconds but then no further progress. So I'm thinking of it in kind of a "works/doesn't work" way, and in that way, snowflake-bridge and snowflake-cohosh seem to have roughly equal utility according to the data so far. While we know that the GFW sometimes fails open and allows access to blocked IP addresses, this doesn't look like that because the success rate is too high.
Or maybe there really is some kind of protocol detection happening, once the WebRTC DataChannel is connected, and it's not simple IP blocking. That would be consistent with the evidence. I would not expect it as a first step of blocking, but certainly my intuition has been wrong before.
There are some very long periods of time where snowflake-bridge is unreachable, and it's strange that snowflake-proxy-3 seems to be unreachable this entire time.
I know that anything past 10% means the IP of the proxy was reachable, but mentally I'm not quite thinking of a less than complete bootstrap as complete "success" because to a user it looks like failure. E.g. in comment:16:ticket:30350 the user got to 75% after 13 seconds but then no further progress.
That's fair. I had to set the circuit timeout really low in order to prevent the snowflake client from trying to reconnect to another snowflake after 30 seconds which would mess with our test results the way I've set them up now. I think the ones that actually got to 75% would have gotten to 100% in a few more seconds, but maybe that doesn't matter because it's taking so long anyway.
Starting this afternoon (it may have been happening before but I was unaware), access to stun.l.google.com:19302 was blocked.
Simply changing the client line to use
-ice stun:stunserver.org
resulted in a full bootstrap.
I'm going to refine the snowflake tests to tell us more information about precisely where the blocking is occurring because this appears to be changing. The candidates for blocking I can think of are:
At the ICE Gathering stage (the connection to the client's STUN server)*
At the signaling stage (the connection to the domain-fronted broker)
At the Connectivity checking stage (the UDP connection to the snowflake proxy)*
At the connected stage (the TCP connection to the snowflake proxy)
The stages with asterisks (*) are where we've seen blocking occur so far
There are some very long periods of time where snowflake-bridge is unreachable, and it's strange that snowflake-proxy-3 seems to be unreachable this entire time.
Ok, I agree, it looks like a qualitative difference between snowflake-bridge and snowflake-cohosh.
Updated this ticket with recent test results. I've changed the tests to measure which stage the ICE protocol got to on the client side. I switched to using stun.ekiga.net in China since the Google STUN servers started getting blocked. It seems once they blocked those stun servers, all snowflake bridges became reachable again. I've confirmed that stun.l.google.com is still being blocked.
Some of the plots look weird (only the Chinese location access them). This is probably due to an imperfect pcap parsing script that accidentally records the VPS's own IP address as the snowflake's IP.
Updated this ticket with recent test results. I've changed the tests to measure which stage the ICE protocol got to on the client side. I switched to using stun.ekiga.net in China since the Google STUN servers started getting blocked. It seems once they blocked those stun servers, all snowflake bridges became reachable again. I've confirmed that stun.l.google.com is still being blocked.
Oh wow. This is really interesting. I wonder if there was any collateral damage as a result of the Google STUN servers being blocked.
Updated the results. As before, my script still has some flaws where the CA probe site records its own IP address for some of the values (which is why you see only CN probes to some snowflakes).
Overall, since switching the tests to the new STUN server, there have been no problems reaching snowflakes. We might get more interesting results after changing the STUN server in the release (#30579 (moved))