Opened 2 months ago

Closed 5 weeks ago

Last modified 2 weeks ago

#33745 closed task (fixed)

Merge a turbotunnel branch

Reported by: dcf Owned by: dcf
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords: turbotunnel
Cc: cohosh, phw, arlolra, dcf Actual Points:
Parent ID: #19001 Points:
Reviewer: cohosh Sponsor:

Description (last modified by dcf)

Snowflake turbo tunnel features have now been through a test deployment (#33336) and a few iterations of Tor Browser packages. There haven't been as many test reports as I'd like, but what testing there has been has been mostly positive. Turbo tunnel–like features are a dependency of some of the tasks for a stable release of Snowflake (#19001). So we should merge it.

Some sub-tasks:

  • Decide between the KCP and QUIC branch.
  • Test without LearnCircuitBuildTimeout 0 and find another workaround, if necessary. See comment:15:ticket:33336.
  • Rebase and clean history of the chosen branch.
  • Redeploy bridge from master.

Summary of turbo tunnel development history till now:

One bug that may or not be snowflake's fault:

Child Tickets

Change History (15)

comment:1 in reply to:  description ; Changed 2 months ago by arma

Replying to dcf:

I did all my testing on quic, on the theory that it's the one I've heard of so it's probably more likely to be around next year. So as we choose, we should learn if kcp has gotten much testing.

I did all my quic tests with those torrc changes commented out (i.e. I used what normal Tor Browser would have given me).

comment:2 in reply to:  description ; Changed 2 months ago by dcf

Description: modified (diff)

Replying to dcf:

On this point, I advocate for the kcp-go branch. I think that quic-go will be the cause of a lot of maintenance difficulties. There's nothing really wrong with QUIC the protocol (apart from it still being a changing standard); the problem is the instability of the quic-go package.

  • quic-go doesn't have a stable API, and the API has broken in the time I've been using it.
  • quic-go is tightly coupled to a specific version of Go and its standard library, because it uses internal data structures of the crypto/tls package. If you compile quic-go with the wrong version of Go, you get an error not at compile time but at runtime. The version of quic-go currently being used requires Go 1.13, which means it cannot be built in Tor Browser until Tor Browser upgrades its own version of Go, which is currently stalled pending changes in rbm (#32027). It cuts the other way as well: any future upgrade of Go will require a simultaneous upgrade of quic-go and work to fix any API breakage.
  • Upgrading quic-go past where it is now pinned will add a lot of weighty dependencies (comment:15:ticket:33336).
  • The quic-go repository is volatile. Not once but twice I encountered significant bugs whose fix was only a few days or weeks old, not yet in any numbered release.
  • Different versions of quic-go do not necessarily interoperate. Specifically, I know that the version of quic-go currently packaged in Tor Browser (as a dependency of pion-webrtc), which is less than one year old, does not interoperate with current versions of quic-go, because the old version does not send ALPN in its TLS handshake, and newer versions require it.

Could I be wrong in my assessment? Of course. kcp-go is not perfect either--in particular I'm thinking about patching it to remove what of its dependencies we don't use. If quic-go were the only option, I would say go ahead with it, and live with the maintenance costs. But my considered opinion is that it is more risky to deploy a system based on quic-go than one based on kcp-go and smux.

Replying to arma:

So as we choose, we should learn if kcp has gotten much testing.

I know you mean well, but I find comments like this demoralizing. You make it sound as if I have not been thinking about this among many considerations for months, and going out of my way to document my reasoning. You should read my initial candidate protocol evaluation if you have not already.

Last edited 2 months ago by dcf (previous) (diff)

comment:3 in reply to:  1 Changed 2 months ago by dcf

Replying to arma:

I did all my quic tests with those torrc changes commented out (i.e. I used what normal Tor Browser would have given me).

Thanks, that's good to know. I added the CircuitBuildTimeout changes in an uncontrolled way at the same time as the idle timeout fixes in #33336, so it's possible they are not necessary. I'll try just removing them and see how it goes. What I was afraid of is a case where the first time you connect, you get lucky and get a working proxy on the first try, and tor learns a CircuitBuildTimeout of 60 seconds. Then then next time you connect, maybe it takes four tries to get a working proxy, but by then tor has given up according to the shorter timeout it learned. But I don't know if it really works like that or how eagerly tor adjusts the learned timeout.

comment:4 in reply to:  2 ; Changed 8 weeks ago by cohosh

Replying to dcf:

Replying to dcf:

On this point, I advocate for the kcp-go branch. I think that quic-go will be the cause of a lot of maintenance difficulties. There's nothing really wrong with QUIC the protocol (apart from it still being a changing standard); the problem is the instability of the quic-go package.

  • quic-go doesn't have a stable API, and the API has broken in the time I've been using it.
  • quic-go is tightly coupled to a specific version of Go and its standard library, because it uses internal data structures of the crypto/tls package. If you compile quic-go with the wrong version of Go, you get an error not at compile time but at runtime. The version of quic-go currently being used requires Go 1.13, which means it cannot be built in Tor Browser until Tor Browser upgrades its own version of Go, which is currently stalled pending changes in rbm (#32027). It cuts the other way as well: any future upgrade of Go will require a simultaneous upgrade of quic-go and work to fix any API breakage.
  • Upgrading quic-go past where it is now pinned will add a lot of weighty dependencies (comment:15:ticket:33336).
  • The quic-go repository is volatile. Not once but twice I encountered significant bugs whose fix was only a few days or weeks old, not yet in any numbered release.
  • Different versions of quic-go do not necessarily interoperate. Specifically, I know that the version of quic-go currently packaged in Tor Browser (as a dependency of pion-webrtc), which is less than one year old, does not interoperate with current versions of quic-go, because the old version does not send ALPN in its TLS handshake, and newer versions require it.

Could I be wrong in my assessment? Of course. kcp-go is not perfect either--in particular I'm thinking about patching it to remove what of its dependencies we don't use. If quic-go were the only option, I would say go ahead with it, and live with the maintenance costs. But my considered opinion is that it is more risky to deploy a system based on quic-go than one based on kcp-go and smux.

As you mentioned earlier, the intent of the quic branch was to take advantage of the fact that pion/webrtc already requires pion/quic to build. pion/quic requires quic-go commit 907071 at the moment, which is an earlier version that we're using here, and before the large dependencies are added. It's possible (maybe even likely) they will eventually move to a more recent version of quic-go so we'll have to include it anyway to keep up to date on the webrtc library and get things like security updates.

Honestly this is where the go ecosystem is causing a lot of pain. We don't even use pion/quic. We only need it to build the project. Digging into this a bit more, it looks like the code in quictransport.go is only ever used in examples, and is an optional feature of ORTC, which we do not use.Looking at the code, a patch to remove this dependency would be quite simple. We'd pretty much need to just remove a few files.

I'm inclined to agree with going with KCP. The only concerns I can find that we had with it are that it added more new depencies (looks like about 16) and most are again for features we don't use.

There seems to be many good opportunities for dependency pruning across the board. I wish there was a way to make go libraries that allowed for the automatic disabling of unused features to avoid this kind of bloat.

comment:5 in reply to:  4 Changed 8 weeks ago by cohosh

Replying to cohosh:

Honestly this is where the go ecosystem is causing a lot of pain. We don't even use pion/quic. We only need it to build the project. Digging into this a bit more, it looks like the code in quictransport.go is only ever used in examples, and is an optional feature of ORTC, which we do not use.Looking at the code, a patch to remove this dependency would be quite simple. We'd pretty much need to just remove a few files.

There seems to be many good opportunities for dependency pruning across the board. I wish there was a way to make go libraries that allowed for the automatic disabling of unused features to avoid this kind of bloat.

Yep, check out this patch in #33761 that removes all quic dependencies of snowflake by deleting three files.

comment:6 Changed 7 weeks ago by dcf

Here's what a dependency-reduction patch for kcp-go looks like:

It deletes the crypt.go and fec.go files, then introduces a new file with shim types for the removed types. My feeling is that this type of patch (only add or remove whole files) will be easier to maintain than one that makes changes within a file.

I've made squashed merge branches for snowflake and tor-browser-build. I've started a testbuild with them. The snowflake branch uses kcp-go only, and removes QUIC support from the server, though the server is still compatible with non-turbotunnel clients. The tor-browser-build branch is based on tbb-9.5a11-build2.

comment:7 Changed 7 weeks ago by dcf

Status: assignedneeds_review

https://people.torproject.org/~dcf/pt-bundle/tor-browser-snowflake-turbotunnel-9.5a11-20200410/

Here are packages built from the branches in comment:6. I'm asking for review of the snowflake turbotunnel-merge changes. If they're accepted then I'll do another ticket for the Tor Browser merge.

These packages have snowflake-client logging disabled by default. To re-enable logging, edit Browser/TorBrowser/Data/Tor/torrc-defaults and add -log snowflake-client.log -logToStateDir to the ClientTransportPlugin snowflake line.

These packages also remove the LearnCircuitBuildTimeout 0 setting that was present in earlier iterations. I'm going to test to see if creates a problem of tor learning too-short timeouts, given the high rate of proxy failures some clients will experience (comment:8:ticket:33666 ff.).

comment:8 Changed 6 weeks ago by cohosh

Reviewer: cohosh

comment:9 Changed 6 weeks ago by arma

Great! I've moved from the a8 with quic to the a11 with kcp. I added the snowflake-client logging, and also added

log info file /tmp/tori-log
logtimegranularity 1
safelogging 0

to the torrc, and also replaced the tor binary with the one from my debug33336-2 branch.

It took a little while to bootstrap initially, but everything is going swimmingly now. Later I will try doing rude things to its network connection and see how it handles it, and let you know about any issues that I see. I am excited to have another iteration of packages. Thanks!

comment:10 Changed 6 weeks ago by cohosh

Status: needs_reviewneeds_revision

Okay, these changes look good to me. Some things I noticed while going through to code again:

  • there's something up with the go.sum file you generated that's causing security errors for me. Specifically the hash for github.com/tjfoc/gmsm seemed wrong. I fixed this by removing go.sum and then regenerating it. Can you take a look at your end and see if you're having the same issues?
  • for the changes in this commit, it looks like we can still end up with a blocking goroutine waiting to read from a KCP stream once the SOCKS connection closes.

In fact, since the KCP stream is kept alive, will this stay open indefinitely (or until Tor Browser restarts/closes)?

  • this is a nitpick, but the comments and variable names in Handler and copyLoop still assume this is a WebRTC session and not a KCP stream here.

I'll put it in needs_revision to hear back about these points but I generally think it's good to go.

comment:11 in reply to:  10 Changed 5 weeks ago by dcf

Replying to cohosh:

Okay, these changes look good to me. Some things I noticed while going through to code again:

  • there's something up with the go.sum file you generated that's causing security errors for me. Specifically the hash for github.com/tjfoc/gmsm seemed wrong. I fixed this by removing go.sum and then regenerating it. Can you take a look at your end and see if you're having the same issues?

Good catch. This was essentially an upstream packaging error which I've reported at https://github.com/tjfoc/gmsm/issues/56. The v1.3.0 you can download from proxy.golang.org and the v1.3.0 you can download from github.com are not the same. I'm guessing that someone once did a go get with the v1.3.0 tag pointing at the wrong place, and now it's impossible to removed that version from proxy.golang.org and sum.golang.org. I use GOPROXY=direct and GOSUMDB=off because I don't like the go command constantly phoning home, which is hy my download had come from github.com and I got a different checksum.

The only reason gmsm@v1.3.0 was in go.mod at all was a single import in in kcp-go that was missing the /v5 suffix, causing it to grab the newest version of all its dependencies instead of the versions in go.mod. I fixed that /v5 error a few weeks ago and now it only depends on gmsm@v1.0.1, which doesn't have a proxy/non-proxy checksum problem.

  • for the changes in this commit, it looks like we can still end up with a blocking goroutine waiting to read from a KCP stream once the SOCKS connection closes.

When one of the two goroutines finishes, control goes back to Handler, which closes both connections, which should cause the other goroutine to finish.

Another way to do this would be to move the Conn closing into the copyLoop function, where each goroutine, when it finishes, closes the Conn that makes the other finish.

In fact, since the KCP stream is kept alive, will this stay open indefinitely (or until Tor Browser restarts/closes)?

webRTC in this case is an smux stream, contained within the KCP stream. The KCP stream lasts forever but smux streams come and go. The smux stream gets closed in a caller here.

  • this is a nitpick, but the comments and variable names in Handler and copyLoop still assume this is a WebRTC session and not a KCP stream here.

Thanks, I changed the comment and variable name in copyLoop. I think the references to WebRTC in Handler are appropriate because they refer to the connection before any KCP or smux has been layered on it.

I made the changes in turbotunnel-merge-2 and merged in 2022496d3b6fc76b7725135758c37d7d49546d3d.

comment:12 Changed 5 weeks ago by dcf

Resolution: fixed
Status: needs_revisionclosed

I redeployed the bridge from commit 2022496d3b6fc76b7725135758c37d7d49546d3d at 2020-04-23 22:51:49.

comment:13 Changed 5 weeks ago by cohosh

Woohoo!

Thanks for answering my questions above.

comment:14 Changed 2 weeks ago by cypherpunks

Sorry I haven't been following Snowflake development for a while now. To take advantage of this feature would it be sufficient for me to get the snowflake-client from the latest Tor Browser alpha?

comment:15 in reply to:  14 Changed 2 weeks ago by dcf

Replying to cypherpunks:

Sorry I haven't been following Snowflake development for a while now. To take advantage of this feature would it be sufficient for me to get the snowflake-client from the latest Tor Browser alpha?

It's not yet in any released alpha. See #34043 for when it will be merged into an alpha. Until then, you can take snowflake-client from https://people.torproject.org/~dcf/pt-bundle/tor-browser-snowflake-turbotunnel-9.5a11-20200410/.

Note: See TracTickets for help on using tickets.