Remove buffering from WebRTCPeer

added component::circumvention/snowflake owner::dcf priority::medium resolution::fixed reviewer::cohosh severity::normal status::closed turbotunnel type::enhancement labels

It's not enough to simply remove the buffer and drop Writes that occur before the data channel is established. That's because not every Write represents a turbotunnel packet that will be retransmitted if lost—specifically the magic token and ClientID are transmitted first, notionally outside the turbotunnel layer, and can't be dropped.

My plan for this ticket is to refactor WebRTCPeer so that NewWebRTCPeer never returns a partially connected object—it always gives you back something with a connected data channel that is ready to directly translate c.Write into c.transport.Send. How it works now is you get an object whose internal transport field (representing the data channel) is initially nil, and only becomes non-nil asynchronously some time later, while you're using it.

I think this design will allow for the removal of some locks and other internal state, for overall simpler code. But it's going to take a little hack-and-slash refactoring. I've already filed #33982 (moved) and #33984 (moved) for some changes that suggested themselves while I was rewriting things.

https://gitweb.torproject.org/user/dcf/snowflake.git/log/?h=bug33897&id=6bfeabab1e0097afce941fef2de8f13a9a582826 https://gitweb.torproject.org/user/dcf/snowflake.git/diff/?h=bug33897&id=6bfeabab1e0097afce941fef2de8f13a9a582826&id2=5b7cfafd9e64dcda88eaaa9109c28edee493b7a9

The first few commits are refactoring changes covered by other tickets: #33994 (moved), #33995 (moved), #33996 (moved), #33997 (moved), #34002 (moved). The new changes for this ticket start with Eliminate separate WebRTCPeer.Connect method.

My overall approach has been to make NewWebRTCPeer return a ready-to-use connection. Formerly, you had to call NewWebRTCPeer, then separately call the Connect method, and then at some time in the future the connection would actually happen (or fail, as the case may be). There was no way to detect a failed connection other than by the connection closing itself through a checkForStaleness check. In contrast, now, NewWebRTCPeer waits until it is connected before returning, or else returns an error or timeout indicating that the data channel couldn't be connected.

Unfortunately I had to delete the tests touching WebRTCPeer. I couldn't make them work, because they relied on the separate create/connect interface that this branch eliminates. 3 or the 5 tests test features that this branch removes; the value of the other 2 is arguable but I don't think they are critical to have.

The simplified external interface allows simplifying the internal implementation. Besides buffer, I was able to remove or reduce the scope of config, broker, offerChannel, answerChannel, and lock.

There's a new constant DataChannelTimeout that's separate from SnowflakeTimeout (the latter is what checkForStaleness uses). I've left DataChannelTimeout set to the same value of SnowflakeTimeout, 30 seconds, but I think we can set DataChannelTimeout to a lower value, to quickly dispose of that cannot be connected to, while still being conservative in discarding once-working proxies. I'm testing a browser now with DataChannelTimeout set to 5 seconds.

Trac:
Status: assigned to needs_review

Trac:
Reviewer: N/A to cohosh

Cool! I like how much this simplifies the client-side code.

The code looks good. My only comment is:

broker.Negotiate() can still return a nil answer and we're no longer checking for that like we used to. At first glance, it looks like it shouldn't return a nil answer and a non-nil error, but util.DeserializeSessionDescription can return a nil value without an error attached to it which would cause problems.

So we should either modify util.DeserializeSessionDescription to return errors and always error if the answer is non-nil, or we should add a nil answer check before passing it to the pion library function SetRemoteDescription. I don't want to assume that they've written this function to not seg fault on us. We could also check for a nil answer in Negotiate() too.

I think we can set DataChannelTimeout to a lower value, to quickly dispose of that cannot be connected to, while still being conservative in discarding once-working proxies. I'm testing a browser now with DataChannelTimeout set to 5 seconds. This is great. We can also change the data channel timeout on the proxy side to match the client side once we've settled on a value that works for us.

This should help ease the pain of clients with restrictive NATs. If we can shorted the time it takes for the client to realize it has an incompatible proxy, we can recover more quickly (hopefully).

Replying to cohosh:

broker.Negotiate() can still return a nil answer and we're no longer checking for that like we used to. At first glance, it looks like it shouldn't return a nil answer and a non-nil error, but util.DeserializeSessionDescription can return a nil value without an error attached to it which would cause problems.

That's a great catch. Indeed I looked at broker.Negotiate quickly and wrongly decided that it wouldn't return (nil, nil). Here's a new branch to review. The only difference is this patch to make util.DeserializeSessionDescription and util.SerializeSessionDescription return an error.

https://gitweb.torproject.org/user/dcf/snowflake.git/log/?h=bug33897_2&id=047d3214bfb46de07e5d9f223e4fb1ba24584c8a https://gitweb.torproject.org/user/dcf/snowflake.git/diff/?h=bug33897_2&id=047d3214bfb46de07e5d9f223e4fb1ba24584c8a&id2=76732155e7d730573b3ced62209e4e1e4ead511c