Opened 6 weeks ago

Closed 3 weeks ago

Last modified 3 weeks ago

#31385 closed defect (fixed)

Snowflake client fails after bootstrap

Reported by: cypherpunks Owned by: cohosh
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords:
Cc: arlolra, cohosh, phw, dcf Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Snowflake seems to fail after bootstrap, you open the browser open a page it starts loading then all connections drop and you can't open more pages.

Child Tickets

Attachments (6)

snowflake_health_check.pdf (10.0 KB) - added by cohosh 5 weeks ago.
100 probes of snowflake proxies
torrc-defaults (1.1 KB) - added by amiableclarity2011 3 weeks ago.
torrc (629 bytes) - added by amiableclarity2011 3 weeks ago.
state (4.4 KB) - added by amiableclarity2011 3 weeks ago.
cached-microdescs.new (1.4 MB) - added by amiableclarity2011 3 weeks ago.
cached-descriptors (2.3 KB) - added by amiableclarity2011 3 weeks ago.

Download all attachments as: .zip

Change History (36)

comment:1 Changed 6 weeks ago by cypherpunks

Seems to work fine now, no idea what could've been the issue previously but it was there.

comment:2 Changed 6 weeks ago by cohosh

Status: newneeds_information

Thanks cypherpunks,

Just want to clarify the behaviour you're seeing so we can be on the look out for it. Was it the WebRTC connections that were failing, or normal page loads in your browser?

comment:3 in reply to:  2 ; Changed 6 weeks ago by cypherpunks

Replying to cohosh:

Just want to clarify the behaviour you're seeing so we can be on the look out for it. Was it the WebRTC connections that were failing, or normal page loads in your browser?

I didn't look closely, but I would load a small download and it would suddenly stop after 3-4sec, then nothing can be loaded, and after that no further connection can be established, waiting 4min I don't get any other snowflake (restarting the browser didn't change anything). Maybe it's a broker flaw? I can clarify further if this isn't clear.

comment:4 in reply to:  3 ; Changed 6 weeks ago by cohosh

Resolution: not a bug
Status: needs_informationclosed
Summary: Snowflake fails after bootstrapSnowflake client fails after bootstrap

Replying to cypherpunks:

Replying to cohosh:

Just want to clarify the behaviour you're seeing so we can be on the look out for it. Was it the WebRTC connections that were failing, or normal page loads in your browser?

I didn't look closely, but I would load a small download and it would suddenly stop after 3-4sec, then nothing can be loaded, and after that no further connection can be established, waiting 4min I don't get any other snowflake (restarting the browser didn't change anything). Maybe it's a broker flaw? I can clarify further if this isn't clear.

Oh thanks, I misread this and thought it was a webextension problem. This could be due to unreliable snowflakes. We just had a huge influx of snowflakes from webextension users and we're working out some engineering issues that come with a large variety of snowflakes, each with different throughputs and connectivity. It's possible you got a less reliable snowflake at first. Thanks for letting us know.

I'm going to mark this as not a bug for now but feel free to re-open if you think this problem is persisting.

comment:5 in reply to:  4 Changed 6 weeks ago by cypherpunks

Replying to cohosh:

Oh thanks, I misread this and thought it was a webextension problem. This could be due to unreliable snowflakes. We just had a huge influx of snowflakes from webextension users and we're working out some engineering issues that come with a large variety of snowflakes, each with different throughputs and connectivity. It's possible you got a less reliable snowflake at first. Thanks for letting us know.

I'm aware about the status quo, but I found it hard to explain what happened in terms of unreliable snowflakes since the rest of the snowflakes didn't work (even after 4min and 2 browser restars). Maybe an extremely unlucky combination of consecutive unreliable snowflakes + this bug in Tor: #29245 may explain it. In any case, I'll definitely report back if I notice anything similar and you can leave this bug as closed meanwhile.

comment:6 Changed 6 weeks ago by cypherpunks

Here's what typically happens:

2019/08/10 [scrubbed] WebRTC: OnIceComplete
2019/08/10 [scrubbed] Negotiating via BrokerChannel...
Target URL:  snowflake-broker.azureedge.net 
Front URL:   ajax.aspnetcdn.com
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 1571 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] BrokerChannel Response:
504 Gateway Timeout

2019/08/10 [scrubbed] BrokerChannel Error: Unexpected error, no answer.
2019/08/10 [scrubbed] Failed to retrieve answer. Retrying in 10 seconds
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 1057 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Negotiating via BrokerChannel...
Target URL:  snowflake-broker.azureedge.net 
Front URL:   ajax.aspnetcdn.com
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 1057 bytes --> WebRTC
2019/08/10 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/10 [scrubbed] BrokerChannel Response:
504 Gateway Timeout

2019/08/10 [scrubbed] BrokerChannel Error: Unexpected error, no answer.
2019/08/10 [scrubbed] Failed to retrieve answer. Retrying in 10 seconds

comment:7 Changed 5 weeks ago by cypherpunks

I wonder if it's due to the recent fix in #31100

comment:8 Changed 5 weeks ago by cypherpunks

There's definitely something wrong going on, snowflake has become unusable for me now.

comment:9 Changed 5 weeks ago by cohosh

Resolution: not a bug
Status: closedreopened

Thanks for the heads up, I'll take a look at this first thing tomorrow.

comment:10 Changed 5 weeks ago by cohosh

Owner: set to cohosh
Status: reopenedassigned

comment:11 Changed 5 weeks ago by cohosh

I was able to reproduce this after updating Tor Browser. The client worked fine before the update:

2019/08/12 15:25:00 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/08/12 15:25:01 Traffic Bytes (in|out): 543 | 543 -- (1 OnMessages, 1 Sends)
2019/08/12 15:25:07 Traffic Bytes (in|out): 1086 | 1086 -- (2 OnMessages, 2 Sends)
2019/08/12 15:25:10 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/08/12 15:25:12 Traffic Bytes (in|out): 543 | 1600 -- (1 OnMessages, 2 Sends)
2019/08/12 15:25:18 Traffic Bytes (in|out): 1086 | 1086 -- (2 OnMessages, 2 Sends)
2019/08/12 15:25:18 WebRTC: closing DataChannel
2019/08/12 15:25:18 ConnectLoop: stopped.
2019/08/12 15:25:18 copy loop ended
2019/08/12 15:25:18 ---- Handler: closed ---
2019/08/12 15:25:18 WebRTC: DataChannel.OnClose [locally]
2019/08/12 15:25:18 WebRTC: closing PeerConnection
2019/08/12 15:25:18 WebRTC: Closing
2019/08/12 15:25:18 WebRTC: melted all 1 snowflakes.
2019/08/12 15:25:18 SOCKS listening...
2019/08/12 15:25:18 snowflake is done.

But afterwards, I got repeated behaviour where data is going out but no data is coming back in:

2019/08/12 [scrubbed] Received Answer.
2019/08/12 [scrubbed] WebRTC: DataChannel.OnOpen
2019/08/12 [scrubbed] Buffered 1057 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 1057 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] WebRTC: No messages received for 30 seconds -- closing stale connection.
2019/08/12 [scrubbed] WebRTC: closing DataChannel
2019/08/12 [scrubbed] WebRTC: DataChannel.OnClose [locally]
2019/08/12 [scrubbed] WebRTC: closing PeerConnection
2019/08/12 [scrubbed] WebRTC: Closing
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 543 bytes --> WebRTC
2019/08/12 [scrubbed] Traffic Bytes (in|out): 0 | 543 -- (0 OnMessages, 1 Sends)
2019/08/12 [scrubbed] Buffered 1571 bytes --> WebRTC

This could be either a bug at the client or just by chance that the pre-update version got a good proxy.

I also just made #31403 because our over-zealous log scrubber fix and the guards against large reads haven't made their way into Tor Browser yet

comment:12 Changed 5 weeks ago by cohosh

After more tests, it looks like the client version has nothing to do with it.

Most of my connections (including ones through webextension snowflakes) seem to be doing fine. A few are failing in one of two ways:

  1. the proxy fails to answer in time
  2. the proxy answers and the datachannel is opened but the client never gets downstream bytes

Unfortunately for case 2, if the client has already sent its upstream bytes, those bytes are lost forever and so all subsequent snowflakes will look like they are failing even if their connection to the bridge is fine. This is what the sequencing layer in #29206 is for. #29206 will also help alleviate some these symptoms by allowing the 30s timeout window to be lessened so that snowflakes can be gathered more quickly (#25429).

I'm currently running the snowflake reachability tests from #30368 to try to get a sense of how many snowflakes are unreliable and I'll post back here with more info.

I'll also try to have a look at what's going on with case 1 and how often that's happening.

Changed 5 weeks ago by cohosh

Attachment: snowflake_health_check.pdf added

100 probes of snowflake proxies

comment:13 Changed 5 weeks ago by cohosh

Okay I bootstrapped 100 Tor connections through snowflake and uploaded the bootstrap progress as an attachment to this ticket. Each bootstrap requests a new snowflake from the broker in the usual way. It looks like the majority of snowflakes are working just fine and the client can bootstrap to 100%. Any bootstrap above 10% and below 100% means the bootstrap didn't finish in 30s. This means that these connections are slower than usual but not necessarily buggy. Bootstraps of only 10% mean that the data probably never reached the bridge.

This doesn't test the bandwidth of the proxies, it only gives us a rough idea of how reachable they are and it looks like for the most part the webextension proxies aren't failing in this way.

Like I mentioned above though, if you are unlucky and get a bad snowflake, all subsequent snowflakes will fail until you make a new connection because of the lost data.

comment:14 in reply to:  12 ; Changed 5 weeks ago by arma

Replying to cohosh:

if the client has already sent its upstream bytes, those bytes are lost forever and so all subsequent snowflakes will look like they are failing even if their connection to the bridge is fine. This is what the sequencing layer in #29206 is for.

Would it be smart, while we don't have the sequencing layer in place yet, for Snowflake to have some keepalive or timeout feature, where it notices that it's sent its bytes, and things aren't looking good, and it should abort that connection so Tor can try a new one?

If we can do it in a simple way, it might help people in the short term. And in the long term, we might still need some sort of similar "gosh that channel looks like it has failed" feature to know when it's time to launch a new one.

comment:15 Changed 5 weeks ago by cypherpunks

@cohosh: did you try to setup a Firefox WebExt 0.0.9 vs 0.0.8 test to determine if it's not due to the recent patch in #31100?

comment:16 in reply to:  14 Changed 5 weeks ago by cohosh

Replying to arma:

Replying to cohosh:

if the client has already sent its upstream bytes, those bytes are lost forever and so all subsequent snowflakes will look like they are failing even if their connection to the bridge is fine. This is what the sequencing layer in #29206 is for.

Would it be smart, while we don't have the sequencing layer in place yet, for Snowflake to have some keepalive or timeout feature, where it notices that it's sent its bytes, and things aren't looking good, and it should abort that connection so Tor can try a new one?

If we can do it in a simple way, it might help people in the short term. And in the long term, we might still need some sort of similar "gosh that channel looks like it has failed" feature to know when it's time to launch a new one.

I'm working on #29206 this week. I'll keep this in mind, but the answer might be that it's just as fast to go ahead and implement the sequencing layer (which is already a very simple version of what we'll actually want in the future).

comment:17 in reply to:  15 ; Changed 5 weeks ago by cypherpunks

Replying to cypherpunks:

@cohosh: did you try to setup a Firefox WebExt 0.0.9 vs 0.0.8 test to determine if it's not due to the recent patch in #31100?

The reason I state this is that there are some snowflakes with which this problem doesn't happen *at all*, especially now that the # of WebExt Chrome users has overtaken the # of WebExt Firefox users following the publication of this article which has attracted quiet a bit of Japanese Chrom* users.

comment:18 in reply to:  17 Changed 5 weeks ago by cohosh

Replying to cypherpunks:

Replying to cypherpunks:

@cohosh: did you try to setup a Firefox WebExt 0.0.9 vs 0.0.8 test to determine if it's not due to the recent patch in #31100?

The reason I state this is that there are some snowflakes with which this problem doesn't happen *at all*, especially now that the # of WebExt Chrome users has overtaken the # of WebExt Firefox users following the publication of this article which has attracted quiet a bit of Japanese Chrom* users.

Thanks cypherpunks, I'll look into it.

comment:19 Changed 4 weeks ago by cohosh

Status: assignedneeds_review

Okay, I finally had time to make some updates to snowbox that will allow us to test the webextension and recent browser proxy changes: https://github.com/cohosh/snowbox/commit/89ffc081c17c90827106f85d382d65feaf2ae683

After testing I found a bug in the proxy deadlock fix where we were closing proxies regardless of connection status. This should help some problems but #29206 is still going to be the ultimate solution for this and problems of this type. I'm noticing some quality issues still so I'll continue to debug those. Here's the fix: https://github.com/cohosh/snowflake/compare/bug31385

Can someone review this ASAP and I'll push an update?

comment:20 Changed 4 weeks ago by arlolra

Status: needs_reviewmerge_ready

Can someone review this ASAP and I'll push an update?

Yeah, looks like this was the intention in 2521f4f.

Couple of notes, L89 in proxy/snowflake.js is redundant,

if (pair.active) {
  ...
} else {
  return pair.active = false;
}

Also, since the timeout sets up a race, maybe add a this.active check in channel.onopen.

I guess this is another reason to tackle #31310

comment:21 in reply to:  20 Changed 4 weeks ago by cohosh

Replying to arlolra:

Can someone review this ASAP and I'll push an update?

Yeah, looks like this was the intention in 2521f4f.

Couple of notes, L89 in proxy/snowflake.js is redundant,

if (pair.active) {
  ...
} else {
  return pair.active = false;
}

Also, since the timeout sets up a race, maybe add a this.active check in channel.onopen.

Thanks! I added a fix for both of these comments to the same branch.

I guess this is another reason to tackle #31310

Yeah. I think there's still some problems here caused by reusing proxy pairs (if the pair is reused the active check in onopen might pass even though it's meant for a different client), but this is probably okay in the short term especially since we have so few clients.

comment:22 Changed 3 weeks ago by arlolra

I added a fix for both of these comments to the same branch.

Looks good

comment:23 Changed 3 weeks ago by cohosh

Resolution: fixed
Status: merge_readyclosed

Merged in 16a1b69823 and updated the Firefox add-on. I'll update the Chrome web store as well.

comment:24 Changed 3 weeks ago by dcf

I deployed webext-0.0.10 to snowflake.torproject.org at 2019-08-26 15:07:00 UTC.

Changed 3 weeks ago by amiableclarity2011

Attachment: torrc-defaults added

Changed 3 weeks ago by amiableclarity2011

Attachment: torrc added

Changed 3 weeks ago by amiableclarity2011

Attachment: state added

Changed 3 weeks ago by amiableclarity2011

Attachment: cached-microdescs.new added

Changed 3 weeks ago by amiableclarity2011

Attachment: cached-descriptors added

comment:25 Changed 3 weeks ago by amiableclarity2011

Hello, currently, in China, I still can't open any webpage in 9.0a4 version Tor browser through snowflake bridge. I can connect to Tor network through snowflake bridge. I upload my torrc-defaults file, my torrc file, my state file, my cached-microdescs.new file and my cached-descriptors file.

comment:26 Changed 3 weeks ago by amiableclarity2011

Thank you very much for your help.

comment:27 in reply to:  25 Changed 3 weeks ago by cypherpunks

Replying to amiableclarity2011:

Hello, currently, in China, I still can't open any webpage in 9.0a4 version Tor browser through snowflake bridge. I can connect to Tor network through snowflake bridge. I upload my torrc-defaults file, my torrc file, my state file, my cached-microdescs.new file and my cached-descriptors file.

The addon was just updated in the Mozilla addon store so it will take some time to propagate, I'd say try again in some day or two.

comment:28 Changed 3 weeks ago by amiableclarity2011

I use Linux version 9.0a4 Tor browser.

comment:29 in reply to:  25 ; Changed 3 weeks ago by cohosh

Replying to amiableclarity2011:

Hello, currently, in China, I still can't open any webpage in 9.0a4 version Tor browser through snowflake bridge. I can connect to Tor network through snowflake bridge. I upload my torrc-defaults file, my torrc file, my state file, my cached-microdescs.new file and my cached-descriptors file.

Hi amiableclarity2011,

As cypherpunks said, we've pushed a fix out for this ticket and it will take a while for updates to propagate. Hopefully your connection will improve soon.

comment:30 in reply to:  29 Changed 3 weeks ago by amiableclarity2011

Replying to cohosh:

Replying to amiableclarity2011:

Hello, currently, in China, I still can't open any webpage in 9.0a4 version Tor browser through snowflake bridge. I can connect to Tor network through snowflake bridge. I upload my torrc-defaults file, my torrc file, my state file, my cached-microdescs.new file and my cached-descriptors file.

Hi amiableclarity2011,

As cypherpunks said, we've pushed a fix out for this ticket and it will take a while for updates to propagate. Hopefully your connection will improve soon.

Thank you very much for your help. I really appreciate it.

Note: See TracTickets for help on using tickets.