Opened 5 months ago

Last modified 5 months ago

#30420 new task

Should we recommend that relay operators turn on tcp bbr?

Reported by: arma Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: network-health, performance
Cc: robgjansen Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The internet seems to have a growing number of howto's for switching your kernel to use the "bbr" congestion control mode of tcp:
https://github.com/google/bbr
https://en.wikipedia.org/wiki/TCP_congestion_control#TCP_BBR

Thought 1: doing an experiment where various fractions of Tor relays switch to this congestion control mode would be neat. Maybe it's the sort of thing that Shadow could help with, since switching the real Tor network is both cumbersome and dangerous.

(Though, since Shadow builds its own tcp implementation, it would need to have an implementation of the bbr variation in order to do a test with it. And it would need to have realistic *non* Tor background flows to test the comparison. What a great use case for driving forward Shadow innovation to be able to capture this test. Cc'ing Rob.)

Thought 2: If God wanted us to be using tcp bbr, we'd be using it by default already. And we're not, so we should learn why that is. For example, the wikipedia page indicates that it's not good at fairness in some situations -- and since Tor relays are often guests on their network, we might not want to give people more reasons to get angry at them.

Child Tickets

Change History (3)

comment:1 Changed 5 months ago by ahf

Milestone: Tor: unspecified
Version: Tor: unspecified

comment:2 Changed 5 months ago by irl

The fairness issue is also a problem with relays because relays will have multiple connections to other relays. If some of those flows are using BBR and some are not, the ones that are will starve out the ones that are not, and will even starve out each other. This would cause relay->relay available bandwidth to vary based on *which* other relays are sending traffic, not just based on how much other traffic is coming into a relay.

My understanding here, based on conversations with knowledgeable researchers, is that it would be a really bad idea to enable BBR. It's used by Google for YouTube, which is probably why there is hype and tutorials, etc. but it's designed with the YouTube use case in mind. The bottleneck on all the connections will be at the user and a user will only watch one YouTube video at a time. This is not our use case.

We might revisit this with BBR2, which is meant to solve the fairness issue.

There are things that have been happening that we haven't really paid much attention to so far that could be negatively affecting users on restricted/impaired networks like IW10 (another Google thing). That's an issue on the TCP connection from client->guard though not relay->relay.

BBR for client->guard might actually be a really good idea, assuming that there is only one Tor user behind each bottleneck for each client connection.

Google will also maintain databases of which IP ranges support which speeds, etc. and then adapt congestion control and other properties to give the best performance for that user. We cannot do that, we have no fallback when something like BBR gives poor performance. I think it is more important to raise the baseline performance than it is to focus on getting the best performance to the users that are already using the best connections.

This is yet another use case for a WAN testbed, and we should keep that in mind, along with testing TCP extensions like ECN and alternate transports altogether.

comment:3 Changed 5 months ago by LiveChief

can we test this in the real world?

Last edited 5 months ago by LiveChief (previous) (diff)
Note: See TracTickets for help on using tickets.