We currently choose bridges by bw according to their advertised bw capacity. This leads to fun bugs, for example: Use just one bridge with capacity 0, bug #1805 (moved) is triggered but things work. Add another (working) bridge to the set with a capacity != 0: We never choose the bridge with capacity 0. Make the bridge with capacity != 0 nonworking: The Tor client breaks, because it wants to use the bridge with capacity. It doesn't try the other bridge, because it has a weight of 0.
This means two things: new bridges don't get usage as quickly as they could, because people won't choose them after starting up, and that one of two in-theory working bridges gets blocked, we might not try the one that is still good.
One obvious solution would be to just not weight bridges. I think that's probably the sanest choice for situations where some of your bridges will frequently be blocked, and others might not be - we don't really want to keep picking the non-working bridge more than we need to. Not sure if that has other unwanted side-effects though.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
"Don't weight bridges" seems like the simplest solution to me for now. We don't have a reliable view of bridge bw from any authority, so looking at any other bw source atm seems futile.
Trac: Milestone: N/Ato Tor: 0.2.2.x-final Status: new to accepted Owner: N/Ato nickm
How often is it the case that bridge descriptors advertise a bandwidth of 0? bridges should do twice-daily bandwidth tests just like relays do if their bandwidth is under 50KB:
} else if (time_to_recheck_bandwidth < now) { /* If we haven't checked for 12 hours and our bandwidth estimate is * low, do another bandwidth test. This is especially important for * bridges, since they might go long periods without much use. */ routerinfo_t *me = router_get_my_routerinfo(); if (time_to_recheck_bandwidth && me && me->bandwidthcapacity < me->bandwidthrate && me->bandwidthcapacity < 51200) { reset_bandwidth_test(); }#define BANDWIDTH_RECHECK_INTERVAL (12*60*60) time_to_recheck_bandwidth = now + BANDWIDTH_RECHECK_INTERVAL; }
As for Nick's comment about using any other bandwidth source besides the authority's view being futile, the reason we load balance across our bridges by bandwidth is for the case where you have a fast bridge and a bridge that rate limits to 10KB both rate and burst -- you want to use the fast one more often, or your Tor experience will suck, especially if somebody else is trying to use that bridge too.
It looks like clients fetch new bridge descriptors every hour, so there's really quite quick feedback here.
So it would be good in any case to make it more rare for a bridge to provide a descriptor that says it has zero bandwidth. When exactly does this happen? If you set up a bridge, then set your client to use it before the bridge has finished its bandwidth test? When else?
Another fix would be to mark bridges down if they're not reachable, so we fall back on the remaining ones. If we're not doing that, we might want to start anyway, to improve our odds of establishing circuits in the case where the user has 80 bridge entries and only 2 are up.
I guess I should distinguish here between the bandwidth from the bridge descriptor (it should basically never be zero after the first bandwidth test), and the bandwidth from the consensus (which is probably always zero because the bridge isn't in the consensus).
For cases where we're trying to choose a bridge but we don't have a descriptor for it, it seems smart to skip that bridge, yes?
We already try pretty hard to get missing descriptors for bridges, but that's on a separate schedule.
We do cap the bandwidth at the max believable rate (we never removed this functionality when looking at values in descriptors, we only removed it implicitly by using consensus values).
We're never trying to use consensus bw for bridges (we first check if the node is in the consensus, only if it is we're trying to use the consensus value).
I'll run tests in a bit to see how long the bw actually remains at 0. I believe this is the case for quite a bit in the bad case, which means your Tor might end up not working during that time.
I want to close #1912 (moved) as a wontfix, and open a new trac entry, for 0.2.3.x or later, to build a plan for what should happen when you run out of bridges.
But before that: if you have a descriptor for a bridge which is down, shouldn't you be marking it down? The people who have 80 bridges in their torrc must be having a heck of a time connecting, if only 2 of out every 80 circuits are going to a bridge that's up.
The flip side of course is that marking them down increases the odds you'll mark them all down, and then you lose for a while (until your Tor decides it's time to try fetching a new descriptor from one of them).
17:35 < nickm> Sebastian: Are we moving at all towards a resolution for 1912 IYO? Is the max-believable-bandwith for bridges low enough?17:39 < Sebastian> nickm: hm. I still think my branch is what we should merge. I don't really understand why arma thinks we should close as wontfix.17:41 < nickm> Sebastian: I think that armadev doesn't want unbalanced circuit assignment where high-capacity bridges wind up underused and low-capacity bridges wind up overused.17:42 < Sebastian> nickm: yeah, I do understand that. I think that's unrealistic though, and actively hurts newly starting bridges that might not be around for a long time.17:42 < nickm> I am concerned about the lie-about-your-bw attack17:42 < Sebastian> that too17:42 < Sebastian> We do cap it at 10mb, but that doesn't mean much17:42 < Sebastian> because so many bridges have really low bw17:43 < nickm> Also, since we distribute bridges uniformly (rather than by bandwidth) I am not sure we are doing balancing at all well17:44 < Sebastian> nickm: yeah. I think our usual bw ranking stuff fails here.17:45 < nickm> Sebastian: So one solution could be to have a lower MAX_BELIEVABLE_BANDWIDTH for bridges, and also a MIN_BELIEVABLE_BANDWIDTH for bridges.17:47 < Sebastian> nickm: hm, yeah. Maybe my intuition is way off, but I feel like the case where we have more than very few options for bridges is rare17:56 < Sebastian> nickm: So I guess what I meant to say with that last comment is that I'm not sure that is worth the complexity.17:59 < nickm> Sebastian: maybe the right answer is to do the weighting in the bridgedb.18:00 < Sebastian> yeah, I thought about that too.18:01 < Sebastian> It does seem like that would be the right approach.18:01 < nickm> Okay if I copy-and-paste this conversation into the 1912 discussion?18:03 < Sebastian> Yeah. Feel free to always copy stuff from me from irc.
17:43 < nickm> Also, since we distribute bridges uniformly (rather than by
bandwidth) I am not sure we are doing balancing at all well
Yes, that's the issue.
Some people have very few bridges -- in that case, any bridge they use should be good enough for them, and this bug isn't relevant for them.
But some people have several bridges, and they should tend to use the ones that claim to be faster.
I think we're mixing together multiple bugs here. One of them is:
Use just one bridge with capacity 0, bug #1805 (moved) is triggered but things work. Add another (working) bridge to the set with a capacity != 0: We never choose the bridge with capacity 0.
So far so good.
Make the bridge with capacity != 0 nonworking: The Tor client breaks, because it wants to use the bridge with capacity. It doesn't try the other bridge, because it has a weight of 0."
Is that really happening? If it is, it's a bug that we should fix. The fix is not to start weighting bridges differently. It's to make Tor notice when a bridge has broken.
Hrm. It just seems that bw weighting is a really bad idea - the bridge can just claim to be super fast, then be so slow that it just doesn't get marked as down, and create issues for the clients. This is a really easy attack
Make the bridge with capacity != 0 nonworking: The Tor client breaks, because it wants to use the bridge with capacity. It doesn't try the other bridge, because it has a weight of 0."
Is that really happening? If it is, it's a bug that we should fix. The fix is not to start weighting bridges differently. It's to make Tor notice when a bridge has broken.
I just tried to reproduce (ran bridge on moria:9009 with bandwidth 50KB, and bridge2 on moria:9010 with bandwidth hard-coded in the descriptor to 0). Run with tor client with
Started it, it used 9009 exclusively (since it has bandwidth). Then I killed the first bridge, and it smoothly moved over to making circuits with bridge2.