WebRTCPeer has a buffer that is used to hold bytes until the data channel is connected. We should remove it after the turbotunnel changes are merged (#33745 (moved)). What ends up happening is the reliability layer ends up retransmitting packets as they sit in the buffer, and when the data channel is finally established, all those old useless packets get sent in a mass. It's better to just drop those packets on the floor before the data channel exists.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
It's not enough to simply remove the buffer and drop Writes that occur before the data channel is established. That's because not every Write represents a turbotunnel packet that will be retransmitted if lost—specifically the magic token and ClientID are transmitted first, notionally outside the turbotunnel layer, and can't be dropped.
My plan for this ticket is to refactor WebRTCPeer so that NewWebRTCPeer never returns a partially connected object—it always gives you back something with a connected data channel that is ready to directly translate c.Write into c.transport.Send. How it works now is you get an object whose internal transport field (representing the data channel) is initially nil, and only becomes non-nil asynchronously some time later, while you're using it.
I think this design will allow for the removal of some locks and other internal state, for overall simpler code. But it's going to take a little hack-and-slash refactoring. I've already filed #33982 (moved) and #33984 (moved) for some changes that suggested themselves while I was rewriting things.
My overall approach has been to make NewWebRTCPeer return a ready-to-use connection. Formerly, you had to call NewWebRTCPeer, then separately call the Connect method, and then at some time in the future the connection would actually happen (or fail, as the case may be). There was no way to detect a failed connection other than by the connection closing itself through a checkForStaleness check. In contrast, now, NewWebRTCPeer waits until it is connected before returning, or else returns an error or timeout indicating that the data channel couldn't be connected.
Unfortunately I had to delete the tests touching WebRTCPeer. I couldn't make them work, because they relied on the separate create/connect interface that this branch eliminates. 3 or the 5 tests test features that this branch removes; the value of the other 2 is arguable but I don't think they are critical to have.
The simplified external interface allows simplifying the internal implementation. Besides buffer, I was able to remove or reduce the scope of config, broker, offerChannel, answerChannel, and lock.
There's a new constant DataChannelTimeout that's separate from SnowflakeTimeout (the latter is what checkForStaleness uses). I've left DataChannelTimeout set to the same value of SnowflakeTimeout, 30 seconds, but I think we can set DataChannelTimeout to a lower value, to quickly dispose of that cannot be connected to, while still being conservative in discarding once-working proxies. I'm testing a browser now with DataChannelTimeout set to 5 seconds.
Cool! I like how much this simplifies the client-side code.
The code looks good. My only comment is:
broker.Negotiate() can still return a nil answer and we're no longer checking for that like we used to. At first glance, it looks like it shouldn't return a nil answer and a non-nil error, but util.DeserializeSessionDescriptioncan return a nil value without an error attached to it which would cause problems.
So we should either modify util.DeserializeSessionDescription to return errors and always error if the answer is non-nil, or we should add a nil answer check before passing it to the pion library function SetRemoteDescription. I don't want to assume that they've written this function to not seg fault on us. We could also check for a nil answer in Negotiate() too.
I think we can set DataChannelTimeout to a lower value, to quickly dispose of that cannot be connected to, while still being conservative in discarding once-working proxies. I'm testing a browser now with DataChannelTimeout set to 5 seconds.
This is great. We can also change the data channel timeout on the proxy side to match the client side once we've settled on a value that works for us.
This should help ease the pain of clients with restrictive NATs. If we can shorted the time it takes for the client to realize it has an incompatible proxy, we can recover more quickly (hopefully).
broker.Negotiate() can still return a nil answer and we're no longer checking for that like we used to. At first glance, it looks like it shouldn't return a nil answer and a non-nil error, but util.DeserializeSessionDescriptioncan return a nil value without an error attached to it which would cause problems.
That's a great catch. Indeed I looked at broker.Negotiate quickly and wrongly decided that it wouldn't return (nil, nil). Here's a new branch to review. The only difference is this patch to make util.DeserializeSessionDescription and util.SerializeSessionDescription return an error.