Snowflake client fails after bootstrap

added component::circumvention/snowflake owner::cohosh priority::medium resolution::fixed severity::normal status::closed type::defect labels

Seems to work fine now, no idea what could've been the issue previously but it was there.

Thanks cypherpunks,

Just want to clarify the behaviour you're seeing so we can be on the look out for it. Was it the WebRTC connections that were failing, or normal page loads in your browser?

Trac:
Status: new to needs_information

Replying to cohosh:

Just want to clarify the behaviour you're seeing so we can be on the look out for it. Was it the WebRTC connections that were failing, or normal page loads in your browser? I didn't look closely, but I would load a small download and it would suddenly stop after 3-4sec, then nothing can be loaded, and after that no further connection can be established, waiting 4min I don't get any other snowflake (restarting the browser didn't change anything). Maybe it's a broker flaw? I can clarify further if this isn't clear.

Replying to cypherpunks:

Replying to cohosh:

Just want to clarify the behaviour you're seeing so we can be on the look out for it. Was it the WebRTC connections that were failing, or normal page loads in your browser? I didn't look closely, but I would load a small download and it would suddenly stop after 3-4sec, then nothing can be loaded, and after that no further connection can be established, waiting 4min I don't get any other snowflake (restarting the browser didn't change anything). Maybe it's a broker flaw? I can clarify further if this isn't clear.

Oh thanks, I misread this and thought it was a webextension problem. This could be due to unreliable snowflakes. We just had a huge influx of snowflakes from webextension users and we're working out some engineering issues that come with a large variety of snowflakes, each with different throughputs and connectivity. It's possible you got a less reliable snowflake at first. Thanks for letting us know.

I'm going to mark this as not a bug for now but feel free to re-open if you think this problem is persisting.

Trac:
Summary: Snowflake fails after bootstrap to Snowflake client fails after bootstrap
Status: needs_information to closed
Resolution: N/A to not a bug

Replying to cohosh:

Oh thanks, I misread this and thought it was a webextension problem. This could be due to unreliable snowflakes. We just had a huge influx of snowflakes from webextension users and we're working out some engineering issues that come with a large variety of snowflakes, each with different throughputs and connectivity. It's possible you got a less reliable snowflake at first. Thanks for letting us know. I'm aware about the status quo, but I found it hard to explain what happened in terms of unreliable snowflakes since the rest of the snowflakes didn't work (even after 4min and 2 browser restars). Maybe an extremely unlucky combination of consecutive unreliable snowflakes + this bug in Tor: #29245 (moved) may explain it. In any case, I'll definitely report back if I notice anything similar and you can leave this bug as closed meanwhile.

Here's what typically happens:

2019/08/10 [scrubbed] WebRTC: OnIceComplete
2019/08/10 [scrubbed] Negotiating via BrokerChannel...
Target URL:  snowflake-broker.azureedge.net 
Front URL:   ajax.aspnetcdn.com
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 1571 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] BrokerChannel Response:
504 Gateway Timeout

2019/08/10 [scrubbed] BrokerChannel Error: Unexpected error, no answer.
2019/08/10 [scrubbed] Failed to retrieve answer. Retrying in 10 seconds
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 1057 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Negotiating via BrokerChannel...
Target URL:  snowflake-broker.azureedge.net 
Front URL:   ajax.aspnetcdn.com
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 1057 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] BrokerChannel Response:
504 Gateway Timeout

2019/08/10 [scrubbed] BrokerChannel Error: Unexpected error, no answer.
2019/08/10 [scrubbed] Failed to retrieve answer. Retrying in 10 seconds

I wonder if it's due to the recent fix in #31100 (moved)

There's definitely something wrong going on, snowflake has become unusable for me now.

Thanks for the heads up, I'll take a look at this first thing tomorrow.

Trac:
Status: closed to reopened
Resolution: not a bug to N/A

Trac:
Owner: N/A to cohosh
Status: reopened to assigned

I was able to reproduce this after updating Tor Browser. The client worked fine before the update:

2019/08/12 15:25:00 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/08/12 15:25:01 Traffic Bytes (in|out): 543 | 543 -- (1 OnMessages, 1 Sends)
2019/08/12 15:25:07 Traffic Bytes (in|out): 1086 | 1086 -- (2 OnMessages, 2 Sends)
2019/08/12 15:25:10 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/08/12 15:25:12 Traffic Bytes (in|out): 543 | 1600 -- (1 OnMessages, 2 Sends)
2019/08/12 15:25:18 Traffic Bytes (in|out): 1086 | 1086 -- (2 OnMessages, 2 Sends)
2019/08/12 15:25:18 WebRTC: closing DataChannel
2019/08/12 15:25:18 ConnectLoop: stopped.
2019/08/12 15:25:18 copy loop ended
2019/08/12 15:25:18 ---- Handler: closed ---
2019/08/12 15:25:18 WebRTC: DataChannel.OnClose [locally]
2019/08/12 15:25:18 WebRTC: closing PeerConnection
2019/08/12 15:25:18 WebRTC: Closing
2019/08/12 15:25:18 WebRTC: melted all 1 snowflakes.
2019/08/12 15:25:18 SOCKS listening...
2019/08/12 15:25:18 snowflake is done.

But afterwards, I got repeated behaviour where data is going out but no data is coming back in:

2019/08/12 [scrubbed] Received Answer.
2019/08/12 [scrubbed] WebRTC: DataChannel.OnOpen
2019/08/12 [scrubbed] Buffered 1057 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 1057 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] WebRTC: No messages received for 30 seconds -- closing stale connection.
2019/08/12 [scrubbed] WebRTC: closing DataChannel
2019/08/12 [scrubbed] WebRTC: DataChannel.OnClose [locally]
2019/08/12 [scrubbed] WebRTC: closing PeerConnection
2019/08/12 [scrubbed] WebRTC: Closing
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 1571 bytes --> WebRTC

This could be either a bug at the client or just by chance that the pre-update version got a good proxy.

I also just made #31403 (moved) because our over-zealous log scrubber fix and the guards against large reads haven't made their way into Tor Browser yet

After more tests, it looks like the client version has nothing to do with it.

Most of my connections (including ones through webextension snowflakes) seem to be doing fine. A few are failing in one of two ways:

the proxy fails to answer in time
the proxy answers and the datachannel is opened but the client never gets downstream bytes

Unfortunately for case 2, if the client has already sent its upstream bytes, those bytes are lost forever and so all subsequent snowflakes will look like they are failing even if their connection to the bridge is fine. This is what the sequencing layer in #29206 (moved) is for. #29206 (moved) will also help alleviate some these symptoms by allowing the 30s timeout window to be lessened so that snowflakes can be gathered more quickly (#25429 (moved)).

I'm currently running the snowflake reachability tests from #30368 (moved) to try to get a sense of how many snowflakes are unreliable and I'll post back here with more info.

I'll also try to have a look at what's going on with case 1 and how often that's happening.

Trac:
snowflake_health_check.pdf

100 probes of snowflake proxies

Okay I bootstrapped 100 Tor connections through snowflake and uploaded the bootstrap progress as an attachment to this ticket. Each bootstrap requests a new snowflake from the broker in the usual way. It looks like the majority of snowflakes are working just fine and the client can bootstrap to 100%. Any bootstrap above 10% and below 100% means the bootstrap didn't finish in 30s. This means that these connections are slower than usual but not necessarily buggy. Bootstraps of only 10% mean that the data probably never reached the bridge.

This doesn't test the bandwidth of the proxies, it only gives us a rough idea of how reachable they are and it looks like for the most part the webextension proxies aren't failing in this way.

Like I mentioned above though, if you are unlucky and get a bad snowflake, all subsequent snowflakes will fail until you make a new connection because of the lost data.

Replying to cohosh:

if the client has already sent its upstream bytes, those bytes are lost forever and so all subsequent snowflakes will look like they are failing even if their connection to the bridge is fine. This is what the sequencing layer in #29206 (moved) is for.

Would it be smart, while we don't have the sequencing layer in place yet, for Snowflake to have some keepalive or timeout feature, where it notices that it's sent its bytes, and things aren't looking good, and it should abort that connection so Tor can try a new one?

If we can do it in a simple way, it might help people in the short term. And in the long term, we might still need some sort of similar "gosh that channel looks like it has failed" feature to know when it's time to launch a new one.

@cohosh: did you try to setup a Firefox WebExt 0.0.9 vs 0.0.8 test to determine if it's not due to the recent patch in #31100 (moved)?

Replying to arma:

Replying to cohosh:

if the client has already sent its upstream bytes, those bytes are lost forever and so all subsequent snowflakes will look like they are failing even if their connection to the bridge is fine. This is what the sequencing layer in #29206 (moved) is for.

Would it be smart, while we don't have the sequencing layer in place yet, for Snowflake to have some keepalive or timeout feature, where it notices that it's sent its bytes, and things aren't looking good, and it should abort that connection so Tor can try a new one?

If we can do it in a simple way, it might help people in the short term. And in the long term, we might still need some sort of similar "gosh that channel looks like it has failed" feature to know when it's time to launch a new one. I'm working on #29206 (moved) this week. I'll keep this in mind, but the answer might be that it's just as fast to go ahead and implement the sequencing layer (which is already a very simple version of what we'll actually want in the future).

Replying to cypherpunks:

@cohosh: did you try to setup a Firefox WebExt 0.0.9 vs 0.0.8 test to determine if it's not due to the recent patch in #31100 (moved)? The reason I state this is that there are some snowflakes with which this problem doesn't happen at all, especially now that the # of WebExt Chrome users has overtaken the # of WebExt Firefox users following the publication of this article which has attracted quiet a bit of Japanese Chrom* users.

Replying to cypherpunks:

Replying to cypherpunks:

@cohosh: did you try to setup a Firefox WebExt 0.0.9 vs 0.0.8 test to determine if it's not due to the recent patch in #31100 (moved)? The reason I state this is that there are some snowflakes with which this problem doesn't happen at all, especially now that the # of WebExt Chrome users has overtaken the # of WebExt Firefox users following the publication of this article which has attracted quiet a bit of Japanese Chrom* users.

Thanks cypherpunks, I'll look into it.

Okay, I finally had time to make some updates to snowbox that will allow us to test the webextension and recent browser proxy changes: https://github.com/cohosh/snowbox/commit/89ffc081c17c90827106f85d382d65feaf2ae683

After testing I found a bug in the proxy deadlock fix where we were closing proxies regardless of connection status. This should help some problems but #29206 (moved) is still going to be the ultimate solution for this and problems of this type. I'm noticing some quality issues still so I'll continue to debug those. Here's the fix: https://github.com/cohosh/snowflake/compare/bug31385

Can someone review this ASAP and I'll push an update?

Trac:
Status: assigned to needs_review

Can someone review this ASAP and I'll push an update?

Yeah, looks like this was the intention in 2521f4f.

Couple of notes, L89 in proxy/snowflake.js is redundant,

if (pair.active) {
  ...
} else {
  return pair.active = false;
}

Also, since the timeout sets up a race, maybe add a this.active check in channel.onopen.

I guess this is another reason to tackle #31310 (moved)

Trac:
Status: needs_review to merge_ready

Replying to arlolra:

Can someone review this ASAP and I'll push an update?

Yeah, looks like this was the intention in 2521f4f.

Couple of notes, L89 in proxy/snowflake.js is redundant,

{{{ if (pair.active) { ... } else { return pair.active = false; } }}}

Also, since the timeout sets up a race, maybe add a this.active check in channel.onopen.

Thanks! I added a fix for both of these comments to the same branch.

I guess this is another reason to tackle #31310 (moved) Yeah. I think there's still some problems here caused by reusing proxy pairs (if the pair is reused the active check in onopen might pass even though it's meant for a different client), but this is probably okay in the short term especially since we have so few clients.

I added a fix for both of these comments to the same branch.

Looks good

Merged in 16a1b69823 and updated the Firefox add-on. I'll update the Chrome web store as well.

Trac:
Status: merge_ready to closed
Resolution: N/A to fixed

I deployed webext-0.0.10 to snowflake.torproject.org at 2019-08-26 15:07:00 UTC.

Trac:
Username: amiableclarity2011

torrc-defaults

Trac:
Username: amiableclarity2011

torrc

Trac:
Username: amiableclarity2011

state

Trac:
Username: amiableclarity2011

cached-microdescs.new

Trac:
Username: amiableclarity2011

cached-descriptors

Trac:
Username: amiableclarity2011

Hello, currently, in China, I still can't open any webpage in 9.0a4 version Tor browser through snowflake bridge. I can connect to Tor network through snowflake bridge. I upload my torrc-defaults file, my torrc file, my state file, my cached-microdescs.new file and my cached-descriptors file.

Trac:
Username: amiableclarity2011

Thank you very much for your help.

Trac:
Username: amiableclarity2011

Replying to amiableclarity2011:

Hello, currently, in China, I still can't open any webpage in 9.0a4 version Tor browser through snowflake bridge. I can connect to Tor network through snowflake bridge. I upload my torrc-defaults file, my torrc file, my state file, my cached-microdescs.new file and my cached-descriptors file. The addon was just updated in the Mozilla addon store so it will take some time to propagate, I'd say try again in some day or two.

I use Linux version 9.0a4 Tor browser.

Trac:
Username: amiableclarity2011

Replying to amiableclarity2011:

Hello, currently, in China, I still can't open any webpage in 9.0a4 version Tor browser through snowflake bridge. I can connect to Tor network through snowflake bridge. I upload my torrc-defaults file, my torrc file, my state file, my cached-microdescs.new file and my cached-descriptors file.

Hi amiableclarity2011,

As cypherpunks said, we've pushed a fix out for this ticket and it will take a while for updates to propagate. Hopefully your connection will improve soon.

Replying to cohosh:

Replying to amiableclarity2011:

Hello, currently, in China, I still can't open any webpage in 9.0a4 version Tor browser through snowflake bridge. I can connect to Tor network through snowflake bridge. I upload my torrc-defaults file, my torrc file, my state file, my cached-microdescs.new file and my cached-descriptors file.

Hi amiableclarity2011,

As cypherpunks said, we've pushed a fix out for this ticket and it will take a while for updates to propagate. Hopefully your connection will improve soon.

Thank you very much for your help. I really appreciate it.

Trac:
Username: amiableclarity2011

closed

mentioned in issue #31425 (moved)

mentioned in issue #31503 (moved)

moved to tpo/anti-censorship/pluggable-transports/snowflake#31385 (closed)

Snowflake client fails after bootstrap

Child items 0

Activity