Opened 6 weeks ago

Closed 2 days ago

#33336 closed task (fixed)

Trial deployment of Snowflake with Turbo Tunnel

Reported by: dcf Owned by: dcf
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords: turbotunnel
Cc: cohosh, phw, arlolra, dcf Actual Points:
Parent ID: #19001 Points:
Reviewer: Sponsor:

Description (last modified by dcf)

We now have a turbotunnel branch of Snowflake that uses an inner transport protocol to migrate session across multiple proxies.

And some first-draft Tor Browser builds that can use it:

I want to deploy a bridge that supports Turbo Tunnel, then make Tor Browser builds and invite testers to test them.

There's the question of whether to run the Turbo Tunnel code on the existing public bridge, or to set up a second bridge reserved for the Turbo Tunnel experiment. I propose to run the Turbo Tunnel code on the existing public bridge (i.e., snowflake.torproject.net). This is because (1) the Turbo Tunnel server is backward-compatible with non–Turbo Tunnel clients, and (2) we would need to somehow provide proxy capacity for the second bridge, which our current proxy code cannot easily handle. Running a separate bridge would have the advantage, though, that because we would have to run our own special proxy-go instances to support it, we could closely control the proxy environment; but part of my goal in an experimental deployment is to see how the Turbo Tunnel code fares with the organic proxies we have now.

I've have versions of the code using two different session/reliability protocol libraries: kcp-go and quic-go. Other than to note that the two libraries are basically equivalent in features, I haven't done much to compare them as to performance. kcp-go is more mature and stable, while quic-go adds fewer dependencies to the Tor Browser build.

We could make use of this opportunity to compare the two options. We set up a triple-mode bridge: supporting legacy, KCP, and QUIC clients. We make two Tor Browser builds, one with KCP and one with QUIC, and invite testing of both. Based on the results of user testing, we decide which we like better, and finally deploy only that option (and the backward-compatible mode). The only thing is, giving people two options to test is more confusing than giving them one.

Child Tickets

Change History (26)

comment:1 in reply to:  description ; Changed 6 weeks ago by dcf

Replying to dcf:

We could make use of this opportunity to compare the two options. We set up a triple-mode bridge: supporting legacy, KCP, and QUIC clients. We make two Tor Browser builds, one with KCP and one with QUIC, and invite testing of both. Based on the results of user testing, we decide which we like better, and finally deploy only that option (and the backward-compatible mode). The only thing is, giving people two options to test is more confusing than giving them one.

This is a commit for the triple-mode bridge as described. It works by creating two QueuePacketConns, one for KCP and one for QUIC, and using separate magic prefix tokens to distinguish the two protocols.
https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=turbotunnel&id=d5be0906ffe4ef8de8a9345690713bc362d3bcee

I have made branches turbotunnel-kcp and turbotunnel-quic for clients specialized to use one protocol or the other, and I've started Tor Browser builds with them.

comment:2 Changed 6 weeks ago by cohosh

It will be useful here to understand what kind of feedback we would like or are expecting from testers. We've had really good feedback before, for example on #31971, on the performance of Snowflake on Windows. That led us to make some tests that allowed us to verify the performance problems.

Perhaps along with asking for user feedback, we should run some of the tests we've made on the two versions. That way, if there's a difference in performance that's significant but too slight to be detected in a user interface we can have some data to back up our decision?

comment:3 in reply to:  1 Changed 6 weeks ago by cohosh

Replying to dcf:

This is a commit for the triple-mode bridge as described. It works by creating two QueuePacketConns, one for KCP and one for QUIC, and using separate magic prefix tokens to distinguish the two protocols.
https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=turbotunnel&id=d5be0906ffe4ef8de8a9345690713bc362d3bcee

I have made branches turbotunnel-kcp and turbotunnel-quic for clients specialized to use one protocol or the other, and I've started Tor Browser builds with them.

Cool, these look good! I am in support of this idea, and that now is a good time to do it.

comment:4 Changed 6 weeks ago by dcf

Here are two Tor Browser builds. These are what I hope to announce to testers. They are built from the snowflake-turbotunnel-kcp and snowflake-turbotunnel-quic branches of tor-browser-build.git respectively. In both cases I had the rbm submodule at bug_33283_v2 from #33283 in an attempt to speed up the build.

Both builds have a commit that attempts to disable automatic updates for 60 days. My reasoning is that we don't want our testers to experience an automatic update while they are testing these special builds, because an update would remove the snowflake-turbotunnel features. But also, if someone for some reason decides to keep using an experimental build, we don't want them to be stuck on a non-updating browser forever.

How to try them locally

When we deploy the triple-mode bridge, it will be possible to just select "snowflake" from the menu. But until a Turbo Tunnel–aware bridge is deployed, you have to run a broker, proxy, and bridge locally.

  1. Download the turbotunnel branch and build all but the client.
    git clone https://git.torproject.org/pluggable-transports/snowflake.git
    cd snowflake
    git remote add dcf https://git.torproject.org/user/dcf/snowflake.git
    git fetch dcf
    git checkout d5be0906ffe4ef8de8a9345690713bc362d3bcee # turbotunnel branch
    for d in broker proxy-go server; do (cd $d && go get); done
    # set dependencies to the same versions that Tor Browser uses
    (cd $GOPATH/src/github.com/lucas-clemente/quic-go && git checkout 907071221cf97f75398d9cf8b1174e94f56e8f96)
    (cd $GOPATH/src/github.com/marten-seemann/qtls && git checkout 65ca381cd298d7e0aef0de8ba523a870ec5a96fe)
    for d in broker proxy-go server; do (cd $d && go build); done
    
  2. Run the broker.
    broker/broker --disable-tls --addr 127.0.0.1:8000
    
  3. Run a proxy.
    proxy-go/proxy-go --broker http://127.0.0.1:8000/ --relay ws://127.0.0.1:8080/
    
  4. Run the bridge. Create a file called torrc.server with the contents
    DataDirectory datadir-server
    SocksPort 0
    ORPort 9001
    ExtORPort auto
    BridgeRelay 1
    AssumeReachable 1
    PublishServerDescriptor 0
    ServerTransportListenAddr snowflake 0.0.0.0:8080
    ServerTransportPlugin snowflake exec server/server --disable-tls --log snowflake-server.log
    
    Then run the command
    tor -f torrc.server
    
  5. Unpack the Tor Browser package and edit the file Browser/TorBrowser/Data/Tor/torrc-defaults. Change the ClientTransportPlugin snowflake line to make it use the local broker:
    ClientTransportPlugin snowflake exec ./TorBrowser/Tor/PluggableTransports/snowflake-client -url http://127.0.0.1:8000/ -ice stun:stun.l.google.com:19302
    
  6. Run Tor Browser. Select Configure, then Tor is censored in my country, then Provide a bridge I know. In the box, enter
    snowflake 0.0.3.0:1
    
  7. Click Connect and everything should start working. Keep an eye on the proxy-go output to see if packets are flowing. The Turbo Tunnel feature means you should be able to leave the browser idle for hours and have it still be working later, in the worst case after a wait of 30 seconds.

comment:5 Changed 5 weeks ago by dcf

Description: modified (diff)
Status: assignedneeds_review

Shall we deploy the Turbo Tunnel bridge? I can do it as early as today. I was waiting until we had figured out #33367 (and I've added the patch for #33367 to the turbotunnel branch).

To be specific, what I want to do is build the server at commit da37211c74b7d6992f4cb07adb6033a684d56838 to the public bridge. Then watch it closely for a few hours to make sure it hasn't broken currently deployed clients. The Tor Browser packages from comment:4 should start working just by selecting "snowflake" from the menu, without extra configuration.

comment:6 in reply to:  5 Changed 5 weeks ago by cohosh

Replying to dcf:

Shall we deploy the Turbo Tunnel bridge? I can do it as early as today. I was waiting until we had figured out #33367 (and I've added the patch for #33367 to the turbotunnel branch).

To be specific, what I want to do is build the server at commit da37211c74b7d6992f4cb07adb6033a684d56838 to the public bridge. Then watch it closely for a few hours to make sure it hasn't broken currently deployed clients. The Tor Browser packages from comment:4 should start working just by selecting "snowflake" from the menu, without extra configuration.

Yes, I'd like to go ahead with this. When it's deployed I'll make some trial connections on my side with the three different Snowflake Tor Browser builds (existing alpha, kcp, and quic). Let me know if you want extra eyes on the server.

comment:7 Changed 5 weeks ago by cohosh

Status: needs_reviewmerge_ready

comment:8 Changed 5 weeks ago by dcf

Status: merge_readyaccepted

I built commit da37211c74b7d6992f4cb07adb6033a684d56838 using go1.13.8, installed it, and started it at 2020-02-19 18:03:30.

I set it up as a symlink so we can easily restore the non–Turbo Tunnel version if needed.

lrwxrwxrwx 1 root root       28 Feb 19 18:03 snowflake-server -> snowflake-server.turbotunnel
-rwxr-xr-x 1 root root  9067083 Feb 18 23:18 snowflake-server.normal
-rwxr-xr-x 1 root root 12459290 Feb 19 18:01 snowflake-server.turbotunnel

I tested with a non–Turbo Tunnel client at 380b133155ad725126bc418d0e66b3c550b4c555 using snowflake/client/torrc at and was able to bootstrap once, but that's all I have tested so far.

comment:9 Changed 5 weeks ago by dcf

All right, both the packages from comment:4 are working for me, with no configuration other than picking "snowflake" from the menu. I had to try bootstrapping the quic one twice, but now it's working playing video and everything. Let the record show that the first song listened to over Turbo Tunnel Snowflake was "The Happy Monster" and the first video watched was "Starships".

One thing I didn't think about is that if you want a log for debugging, you'll have to manually add a -log option to the ClientTransportPlugin line in Browser/TorBrowser/Data/Tor/torrc-defaults.

ClientTransportPlugin snowflake exec ./TorBrowser/Tor/PluggableTransports/snowflake-client -url https://snowflake-broker.azureedge.net/ -front ajax.aspnetcdn.com -ice stun:stun.l.google.com:19302 -log snowflake-client.log

Here's a tip on how to run multiple Tor Browsers at the same time. This way you can run the experimental Turbo Tunnel bundles alongside your ordinary Tor Browser. It can be helpful to go to the Customize... menu and pick different themes (default/light/dark) to distinguish which is which.

Last edited 5 weeks ago by dcf (previous) (diff)

comment:10 Changed 5 weeks ago by cohosh

Nice, it's also working for me on a non-Turbo Tunnel 9.5a5 version of Tor Browser.

comment:11 Changed 5 weeks ago by dcf

My observations from running the quic and kcp browsers more or less continuously since yesterday.

  • The experience is still pretty hit or miss. Sometimes you get a good proxy and cruise on it for a while; other times you get delays of several minutes caused by a series of non-working proxies—not slow proxies, but ones that never send any downstream data at all. I don't know why so many proxies should be broken in this way; for me it must be over 50% of them.
  • I got to the point of continuously tailing both snowflake-client logs to get some insight into what was happening.
  • The worst is when a series of bad proxies causes a delay of a few minutes with no data transfer; in that case tor gets into a "No running bridges" bridges state that it is hard to coax out of. When this happens it's not evident in the snowflake-client log; you have to go to about:preferences#tor and look at the tor log. It may look like this:
    [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $3F50D11DE55C028B8F3EFC272BB1CD9138C1F9A4~0x616e6f6e at 178.17.171.78. Retrying on a new circuit.
    [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $3F50D11DE55C028B8F3EFC272BB1CD9138C1F9A4~0x616e6f6e at 178.17.171.78. Retrying on a new circuit.
    [NOTICE] Delaying directory fetches: No running bridges
    [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.
    [NOTICE] Delaying directory fetches: No running bridges
    [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.
    [NOTICE] Delaying directory fetches: No running bridges
    [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.
    
    Or this:
    [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $6B062B0FDFEAC3C6F9203FB9584451E295574DAD~idideditTheconfig at 51.15.37.97. Retrying on a new circuit.
    [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $7761DDC7EB1BE26D4155F74A15F12C32A36FE0F2~CalyxInstitute09 at 162.247.74.217. Retrying on a new circuit.
    [NOTICE] Delaying directory fetches: No running bridges
    [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.
    
    When this happens, I usually have luck in going to about:preferences#tor, momentarily switching from snowflake to obfs4, then switching back to snowflake. This restarts the snowflake-client process and seems to cause tor to have a fresh look at its bridges.
  • I'm not noticing a ton of subjective difference in the feel of the two browsers. The main difference I have seen is that the quic one occasionally spends a few minutes at 100% CPU: #33401.
  • It may be my imagination, but I get the impression that everything works better while the connection is being used. Initially my impression was positive as I was trying to stress the system by having videos playing in the background. Then the experience became more frustrating as I tried normal text browsing and I encountered the occasional delays mentioned above. It made me think that perhaps there is something in the proxy that drops idle connections, but I didn't find anything like that. It's possible that this is my imagination and that my initial impression was just getting good luck with proxies.

comment:12 in reply to:  11 ; Changed 5 weeks ago by dcf

Summary: Deploy a Turbo Tunnel–aware Snowflake bridgeTrial deployment of Snowflake with Turbo Tunnel

Replying to dcf:

  • It may be my imagination, but I get the impression that everything works better while the connection is being used. Initially my impression was positive as I was trying to stress the system by having videos playing in the background. Then the experience became more frustrating as I tried normal text browsing and I encountered the occasional delays mentioned above. It made me think that perhaps there is something in the proxy that drops idle connections, but I didn't find anything like that. It's possible that this is my imagination and that my initial impression was just getting good luck with proxies.

I think I know why idle browsing seemed to disconnect more, at least in the quic case. It's because the older version of quic-go we are using (2019-04-01) does not send frequent enough keepalives. It sets the keepalive interval to half the idle timeout, which for us is 10 minutes. Keepalives every 5 minutes are not enough to prevent checkForStaleness from killing the connection after 30 seconds of idleness.

The keepalive issue is fixed in a newer version of quic-go (2019-11-10):

Currently, we're sending a keep-alive-PING after half the idle-timeout period. This doesn't work well for long idle timeouts, if we need to keep a NAT binding alive. We should send a PING after min(30s, idle timeout / 2).

The actual commit uses 20s, not 30s, which is low enough to inhibit checkForStaleness as long as the connection is actually working.

I can try doing another Tor Browser build with a more recent version of quic-go, assuming I can find a new enough version of quic-go that is also compatible with pion-quic (which currently specifies the old version from 2019-04-01).

comment:13 in reply to:  12 Changed 5 weeks ago by arma

Replying to dcf:

I think I know why idle browsing seemed to disconnect more, at least in the quic case. It's because the older version of quic-go we are using (2019-04-01) does not send frequent enough keepalives. It sets the keepalive interval to half the idle timeout, which for us is 10 minutes. Keepalives every 5 minutes are not enough to prevent checkForStaleness from killing the connection after 30 seconds of idleness.

Remember that Tor has its own application level (i.e. tor client <=> tor bridge in this case) keepalives.

Which by an odd quirk of fate are also sent and received every 5 minutes: see the KeepalivePeriod torrc option:
https://gitweb.torproject.org/tor.git/tree/src/core/mainloop/mainloop.c#n1274

You could in theory crank this number down to 20 seconds to workaround the problem at the quic layer. But it is definitely not the right long term answer, and also it might introduce other weird side effects, like apparently we use the Keepalive parameter to decide if we've waited long enough that we should give up on an in-progress-but-not-yet-open OR connection:
https://gitweb.torproject.org/tor.git/tree/src/core/mainloop/mainloop.c#n1236

It is in any case an option to explore if upgrading the quic libs turns out to be messier than expected. :)

Last edited 5 weeks ago by arma (previous) (diff)

comment:14 in reply to:  12 Changed 5 weeks ago by dcf

Replying to dcf:

Replying to dcf:

  • It may be my imagination, but I get the impression that everything works better while the connection is being used. Initially my impression was positive as I was trying to stress the system by having videos playing in the background. Then the experience became more frustrating as I tried normal text browsing and I encountered the occasional delays mentioned above. It made me think that perhaps there is something in the proxy that drops idle connections, but I didn't find anything like that. It's possible that this is my imagination and that my initial impression was just getting good luck with proxies.

I think I know why idle browsing seemed to disconnect more, at least in the quic case.

And I think I see what was going wrong with kcp as well. The keepalive interval was fine, but the idle timeout was too low (30 s). Because it takes over 30 s to realize that you have a bad proxy, the first bad proxy would kill your connection. The effect was magnified because the copyLoop function, when the session timed out due to idleness, would only exit the socks←webRTC loop, but would keep running the webRTC←socks loop for about another 2 minutes (might be tor SocksTimeout, not sure). So one bad proxy would knock you out for at least 2.5 minutes, as well as killing all your existing circuits.

I made these commits:

  • a05f5efc Set the smux KeepAliveTimeout (idle timeout) to 10 minutes.
  • 6b902fca Let copyLoop exit when either direction finishes.
Last edited 5 weeks ago by dcf (previous) (diff)

comment:15 in reply to:  12 ; Changed 5 weeks ago by dcf

Replying to dcf:

I can try doing another Tor Browser build with a more recent version of quic-go, assuming I can find a new enough version of quic-go that is also compatible with pion-quic (which currently specifies the old version from 2019-04-01).

I have a couple of updated branches and I'm starting on Tor Browser builds with them. They make the kcp idle timeout fix from comment:14 and update to a newer quic-go as mentioned in comment:12.

The upgrade of quic-go was a bit of a gross process. The API changes are mild. pion-quic is unfortunately incompatible with the newer version; but I worked around that with a patch in the tor-browser-build project. I selected a very specific commit of quic-go to upgrade to: we need at least 6407f5bf because it has the keepalive fix for comment:12 and those in #33401. But I didn't want to use 572ef44c or later, because it adds a huge number of new transitive dependencies that I didn't have the ambition to start packaging for tor-browser-build. (It's a lot of dependencies—go mod graph goes from 59 lines to 283 lines. And one of the dependencies—google.golang.org/api—is over 550 MB!) Upgrading quic-go also requires upgrading go itself to 1.13, because the qtls library is coupled to crypto/tls in the standard library. The upgraded client was not compatible with the server I deployed in comment:8, so I rebuilt the server at commit 42c07f2c and deployed it at 2020-02-22T04:13:

lrwxrwxrwx 1 root root       37 Feb 22 04:12 snowflake-server -> snowflake-server.turbotunnel.42c07f2c
-rwxr-xr-x 1 root root  9067083 Feb 18 23:18 snowflake-server.normal
-rwxr-xr-x 1 root root 15648527 Feb 22 04:11 snowflake-server.turbotunnel.42c07f2c
-rwxr-xr-x 1 root root 12459290 Feb 19 18:01 snowflake-server.turbotunnel.da37211c

Overall, it's making me feel more and more meh about deploying quic-go; it and QUIC are still changing fast and I foresee maintenance and compatibility difficulties.

In the new Tor Browser builds I'm going to enable snowflake-client logging by default and enable some torrc options to try and make tor more reluctant to give up on its circuits. The latter idea I got from the 2020-02-20 anti-censorship meeting (staring at about 18:10:00).

LearnCircuitBuildTimeout 0
CircuitBuildTimeout 300
CircuitStreamTimeout 300

comment:16 Changed 5 weeks ago by dcf

I was experimenting with performance today and I also want to try disabling the KCP congestion window. I built 47312dd1eccc8456652853bd66f8ed396e9ba6ec and deployed it at 2020-02-22 23:51:15. (Also including 924593615c5a8fca7e0d9b4c0fafbd143db1bb62 which is a stats fix from comment:3:ticket:33385.)

comment:17 in reply to:  15 Changed 5 weeks ago by dcf

Replying to dcf:

I have a couple of updated branches and I'm starting on Tor Browser builds with them. They make the kcp idle timeout fix from comment:14 and update to a newer quic-go as mentioned in comment:12.

Here are second-draft Tor Browser packages. They fix most of the problems I experienced with the first draft, which are summarized in comment:11. The commits they are built from are snowflake-turbotunnel-kcp and snowflake-turbotunnel-quic. Both are working well for me, even playing hours-long online videos.

Summary of changes since the first-draft packages in comment:4:

The log appears in Browser/TorBrowser/Data/Tor/pt_state/snowflake-client.log. Some hints on interpreting the log:

BrokerChannel Response: 504 Gateway Timeout
This means the broker couldn't find a proxy for you. It's a temporary error and the client will try again in 10 seconds.
BrokerChannel Response: 200 OK
This means that you got matched up with a proxy, but it doesn't necessarily mean the proxy works.
Traffic Bytes (in|out): 0 | 972
If the number on the left stays at 0, it means the proxy isn't working (you're sending but not receiving anything). If 30 second pass without receiving anything, the client will abandon that proxy and contact the broker to get another one.
Traffic Bytes (in|out): 52457 | 7270 -- (47 OnMessages, 75 Sends)
When you start getting numbers like this, your proxy is working.
WebRTC: No messages received for 30s -- closing stale connection
This means the proxy stopped working (or never worked) and the client will try another one.
WebRTC: At capacity [1/1] Retrying in 10s...
This is normal and means that the client has its desired number of proxies (1).

comment:18 in reply to:  15 ; Changed 5 weeks ago by cohosh

Replying to dcf:

The upgrade of quic-go was a bit of a gross process. The API changes are mild. pion-quic is unfortunately incompatible with the newer version; but I worked around that with a patch in the tor-browser-build project. I selected a very specific commit of quic-go to upgrade to: we need at least 6407f5bf because it has the keepalive fix for comment:12 and those in #33401. But I didn't want to use 572ef44c or later, because it adds a huge number of new transitive dependencies that I didn't have the ambition to start packaging for tor-browser-build. (It's a lot of dependencies—go mod graph goes from 59 lines to 283 lines. And one of the dependencies—google.golang.org/api—is over 550 MB!) Upgrading quic-go also requires upgrading go itself to 1.13, because the qtls library is coupled to crypto/tls in the standard library. The upgraded client was not compatible with the server I deployed in comment:8, so I rebuilt the server at commit 42c07f2c and deployed it at 2020-02-22T04:13:

[snip]

Overall, it's making me feel more and more meh about deploying quic-go; it and QUIC are still changing fast and I foresee maintenance and compatibility difficulties.

Ugh, is KCP likely to be more stable?

The dependency problem applies to KCP too, right? Your earlier mail suggested that KCP would add ~16 new dependencies. Though this seems much less of an issue compared to what quic-go now requires.

I wonder whether pion-webrtc will eventually force us to upgrade to this dependency-heavy version of quic-go anyway.

comment:19 in reply to:  18 Changed 5 weeks ago by dcf

Replying to cohosh:

Replying to dcf:

Overall, it's making me feel more and more meh about deploying quic-go; it and QUIC are still changing fast and I foresee maintenance and compatibility difficulties.

Ugh, is KCP likely to be more stable?

It's slower-moving at least. kcp-go and smux together have had 30 commits since January 1, while quic-go has had 181.

kcp-go$ git log --oneline --since 2020-01-01 96f67cd | wc -l
17
smux$ git log --oneline --since 2020-01-01 c6969d8 | wc -l
13
quic-go$ git log --oneline --since 2020-01-01 ca469eb0 | wc -l
181

We still haven't had a tone of experience with either library, but there were 3 API breaks in quic-go that affected our code when I did the upgrade: IdleTimeoutMaxIdleTimeout, removal of Session.Close, and Accept functions taking a Context. On the other hand, smux also broke its import path during that time.

The dependency problem applies to KCP too, right? Your earlier mail suggested that KCP would add ~16 new dependencies. Though this seems much less of an issue compared to what quic-go now requires.

9 of those (the /x/crypto and /x/net ones) are not completely new dependencies, just new sub-packages in existing projects. Almost all the new dependencies are for features we don't actually use, so it leaves open the possibility of making a fork (ugh) or a tor-browser-build patch that removes the need for them. gogmsm, goxcrypto, go-templexxx-xorsimd, and go-templexxx-cpu are for the optional crypto feature that is symmetric-key only and therefore useless in a shared-bridge environment with untrusted clients. goreedsolomon and gocpuid are only for the forward error-correction feature, which may be useful in certain contexts but which we aren't using now. That would leave only the /x/net ones, which aren't really full new dependencies.

I wonder whether pion-webrtc will eventually force us to upgrade to this dependency-heavy version of quic-go anyway.

Yeah, possibly. I'm not actually sure what pion-webrtc uses pion-quic for anyway.

The massive dependency increase in quic-go is entirely due to the addition of the GoJay package for faster JSON encoding. It's GoJay's go.mod that brings in all the gunk, including stuff like cloud.google.com/go which brings in its own dependencies. Now it's probable that not all of those new dependencies actually need to be packaged—they may be used only in tests or could be easily hacked out as in the kcp-go case. It was just more than I wanted to deal with, when all I really wanted was a newer quic-go with some bugs fixed.

comment:20 Changed 5 weeks ago by arma

Running dcf's https://people.torproject.org/~dcf/pt-bundle/tor-browser-snowflake-turbotunnel-quic-9.5a5-20200223/ tor-browser-linux64-9.5a5_en-US.tar.xz

I replaced tor-browser_en-US/Browser/TorBrowser/Tor/tor with the tor binary made from my Tor git branch debug33336

and in tor-browser_en-US/Browser/TorBrowser/Data/Tor/torrc-defaults I commented out dcf's new lines LearnCircuitBuildTimeout, CircuitBuildTimeout, CircuitStreamTimeout, and added these three of my own:

log info file /tmp/tor-info-log
logtimegranularity 1
safelogging 0

Then I started my tor browser using snowflake:

[...]
Feb 25 14:25:27.557 [notice] Definitely works: recorded success for primary confirmed guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72)
Feb 25 14:25:27.929 [notice] Bootstrapped 100% (done): Done

I browsed for a while. It worked fine. I disabled my wireless on the laptop, and clicked on a few more things.

Feb 25 14:31:01.385 [notice] We tried for 15 seconds to connect to 'www.gstatic.com' using exit $81B75D534F91BFB7C57AB67DA10BCEF622582AE8~hviv104 at 192.42.116.16. Retrying on a new circuit.
[...]
Feb 25 14:31:25.438 [notice] We tried for 15 seconds to connect to 'pagead2.googlesyndication.com' using exit $578E007E5E4535FBFEF7758D8587B07B4C8C5D06~marylou1 at 89.234.157.254. Retrying on a new circuit.
Feb 25 14:31:25.438 [notice] Our circuit 3717720358 (id: 24) failed to get a response from the first hop (0.0.3.0:1). I'm going to try to rotate to a better connection.
Feb 25 14:31:25.438 [notice] Marking guard down: Recorded failure for primary confirmed guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72)
Feb 25 14:31:26.436 [notice] Delaying directory fetches: No running bridges
Feb 25 14:31:27.437 [notice] We tried for 15 seconds to connect to 'px.moatads.com' using exit $578E007E5E4535FBFEF7758D8587B07B4C8C5D06~marylou1 at 89.234.157.254. Retrying on a new circuit.
Feb 25 14:31:27.438 [notice] Considering retry for primary confirmed guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72)
Feb 25 14:31:27.438 [notice] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.
[...]
Feb 25 14:33:31.503 [notice] Closing OR conn. Considering blaming guard.
Feb 25 14:33:31.503 [notice] Our circuit 0 (id: 30) died before the first hop with no connection
Feb 25 14:33:31.504 [notice] Marking guard down: Recorded failure for primary confirmed guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72)

around here I turned the wifi back on, and clicked on a few more things in tor browser.

It thrashed for a while more, with things like

Feb 25 14:35:31.846 [notice] Our circuit 0 (id: 49) died before the first hop wi
th no connection
Feb 25 14:35:31.846 [notice] Marking guard down: Recorded failure for primary co
nfirmed guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC4
0DDCC8824814F80A72)
Feb 25 14:35:32.075 [notice] Delaying directory fetches: No running bridges
Feb 25 14:35:58.268 [notice] Considering retry for primary confirmed guard $2B28
0B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72)
Feb 25 14:35:58.268 [notice] Application request when we haven't received a cons
ensus with exits. Optimistically trying known bridges again.
Feb 25 14:36:24.457 [notice] Closing OR conn. Considering blaming guard.
Feb 25 14:36:24.457 [notice] Our circuit 0 (id: 50) died before the first hop wi
th no connection
Feb 25 14:36:24.457 [notice] Marking guard down: Recorded failure for primary co
nfirmed guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC4
0DDCC8824814F80A72)

and then came the exciting part:

Feb 25 14:36:24.458 [warn] Pluggable Transport process terminated with status code 512
Feb 25 14:36:25.307 [notice] Delaying directory fetches: No running bridges
Feb 25 14:36:44.753 [notice] Considering retry for primary confirmed guard $2B28
0B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72)
Feb 25 14:36:44.753 [notice] Application request when we haven't received a cons
ensus with exits. Optimistically trying known bridges again.
Feb 25 14:36:44.825 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:45527 just failed. Make sure that the proxy server is up and running.
Feb 25 14:36:44.825 [notice] Closing OR conn. Considering blaming guard.

My snowflake client is dead? But Tor just keeps on trying to use it, warning quietly to itself every couple of minutes about how the connection to the socks5 proxy server just failed?

I will set up logging on the snowflake client side too, and see if I can reproduce.

comment:21 in reply to:  20 Changed 5 weeks ago by arma

Replying to arma:

I will set up logging on the snowflake client side too, and see if I can reproduce.

Ok, the package comes with snowflake logging already set up. Here is how my snowflake-client.log ends:

2020/02/25 14:36:15 WebRTC: Done gathering candidates
2020/02/25 14:36:15 WebRTC: ICEGatheringStateComplete
2020/02/25 14:36:15 Negotiating via BrokerChannel...
Target URL:  snowflake-broker.azureedge.net 
Front URL:   ajax.aspnetcdn.com
2020/02/25 14:36:16 BrokerChannel Response:
200 OK

2020/02/25 14:36:16 Received Answer.
2020/02/25 14:36:16 ---- Handler: snowflake assigned ----
2020/02/25 14:36:16 Buffered 8 bytes --> WebRTC
2020/02/25 14:36:16 Buffered 8 bytes --> WebRTC
2020/02/25 14:36:16 Traffic Bytes (in|out): 0 | 8 -- (0 OnMessages, 1 Sends)
2020/02/25 14:36:16 Buffered 1202 bytes --> WebRTC
2020/02/25 14:36:16 Buffered 1202 bytes --> WebRTC
2020/02/25 14:36:18 WebRTC: DataChannel.OnOpen
2020/02/25 14:36:18 Flushed 2420 bytes.
2020/02/25 14:36:21 Traffic Bytes (in|out): 0 | 2412 -- (0 OnMessages, 3 Sends)
2020/02/25 14:36:23 Traffic Bytes (in|out): 1088 | 0 -- (1 OnMessages, 0 Sends)
2020/02/25 14:36:24 copying WebRTC to SOCKS resulted in error: write tcp [scrubbed]->[scrubbed]: write: broken pipe
2020/02/25 14:36:24 copy loop ended
2020/02/25 14:36:24 ---- Handler: closed ---
2020/02/25 14:36:24 WebRTC: closing DataChannel
2020/02/25 14:36:24 WebRTC: closing PeerConnection
2020/02/25 14:36:24 Error writing to SOCKS pipe

comment:22 in reply to:  20 ; Changed 4 weeks ago by dcf

Replying to arma:

I browsed for a while. It worked fine. I disabled my wireless on the laptop,

You have a knack for thinking of interesting tests :) Killing the wireless would not only break the WebRTC connection to the proxy, it would also prevent snowflake-client from contacting the broker to get a new one. But my guess is that it should handle even this gracefully, attempting every 10 seconds to contact the broker until it starts working again.

Feb 25 14:36:24.457 [notice] Closing OR conn. Considering blaming guard.
2020/02/25 14:36:24 copying WebRTC to SOCKS resulted in error: write tcp [scrubbed]->[scrubbed]: write: broken pipe
2020/02/25 14:36:24 WebRTC: closing DataChannel
2020/02/25 14:36:24 WebRTC: closing PeerConnection
2020/02/25 14:36:24 Error writing to SOCKS pipe

What I see here is tor closing its SOCKS connection to snowflake-client, and snowflake-client noticing the closed connection and tearing down its own proxy connection. That part all looks fine.

Feb 25 14:36:24.458 [warn] Pluggable Transport process terminated with status code 512

The weird part is that in the same second the snowflake-client process is terminated. It's an abnormal termination; otherwise you would see another log line with "snowflake is done". Failure to write to the SOCKS connection shouldn't cause snowflake-client to exit anyway; its socksAcceptLoop function should keep running and accepting new SOCKS connections.

Two possible explanations for what's happening are

  1. snowflake-client is panicking or crashing in an uncontrolled way.
  2. tor is killing snowflake-client rather than signaling it to exit gracefully.

In case (1), I would expect a stack trace to make its way into the tor log via tor's PT stderr handler. Is there anything in the "Closing OR conn" code path that would make tor kill the PT process?

comment:23 in reply to:  4 Changed 4 weeks ago by dcf

Replying to dcf:

Both builds have a commit that attempts to disable automatic updates for 60 days. My reasoning is that we don't want our testers to experience an automatic update while they are testing these special builds, because an update would remove the snowflake-turbotunnel features. But also, if someone for some reason decides to keep using an experimental build, we don't want them to be stuck on a non-updating browser forever.

The prefs I set to disable automatic updates didn't work. I got updated to today. If it happens to you, you need to re-download the snowflake-turbotunnel packages, then set app.update.auto=false in about:config the first time you run it. Alternatively, go to Preferences, General, and select "Check for updates but let you choose to install them." You will still have the option to update manually, but it won't happen automatically.

Apparently setting app.update.interval to a large value doesn't work anymore. I'm pretty sure it was working in #29611. I found a Bugzilla comment that says "The maximum value allowed is 86400 which is 24 hours." I searched the source code at https://dxr.mozilla.org/ for the place where this maximum is enforced, but couldn't find it.

comment:24 in reply to:  22 Changed 9 days ago by arma

Replying to dcf:

The weird part is that in the same second the snowflake-client process is terminated. It's an abnormal termination

I've opened #33669 for having Tor (or Tor Browser) handle this situation better on its side.

I don't currently have any guesses about whether it is a Tor bug or a Snowflake bug that caused the exit.

comment:25 Changed 9 days ago by cohosh

Parent ID: #19001

comment:26 Changed 2 days ago by dcf

Resolution: fixed
Status: acceptedclosed

Closing this ticket for the trial deployment. Let's do further discussion and bug report on #33745 for the merge.

Note: See TracTickets for help on using tickets.