Opened 9 years ago

Closed 9 years ago

Last modified 7 years ago

#1912 closed defect (fixed)

Choosing bridges by bw is problematic

Reported by: Sebastian Owned by: nickm
Priority: High Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We currently choose bridges by bw according to their advertised bw capacity. This leads to fun bugs, for example: Use just one bridge with capacity 0, bug #1805 is triggered but things work. Add another (working) bridge to the set with a capacity != 0: We never choose the bridge with capacity 0. Make the bridge with capacity != 0 nonworking: The Tor client breaks, because it wants to use the bridge with capacity. It doesn't try the other bridge, because it has a weight of 0.

This means two things: new bridges don't get usage as quickly as they could, because people won't choose them after starting up, and that one of two in-theory working bridges gets blocked, we might not try the one that is still good.

One obvious solution would be to just not weight bridges. I think that's probably the sanest choice for situations where some of your bridges will frequently be blocked, and others might not be - we don't really want to keep picking the non-working bridge more than we need to. Not sure if that has other unwanted side-effects though.

Child Tickets

Change History (18)

comment:1 Changed 9 years ago by Sebastian

Component: - Select a componentTor Client

grmpf, of course I forgot to add the right component after trying to open this report three times -.-

comment:2 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-final
Owner: set to nickm
Status: newaccepted

"Don't weight bridges" seems like the simplest solution to me for now. We don't have a reliable view of bridge bw from any authority, so looking at any other bw source atm seems futile.

comment:3 Changed 9 years ago by Sebastian

Status: acceptedneeds_review

Proposed fix in bug1912 in my repository.

comment:4 Changed 9 years ago by arma

How often is it the case that bridge descriptors advertise a bandwidth of 0? bridges should do twice-daily bandwidth tests just like relays do if their bandwidth is under 50KB:

      } else if (time_to_recheck_bandwidth < now) {
        /* If we haven't checked for 12 hours and our bandwidth estimate is
         * low, do another bandwidth test. This is especially important for
         * bridges, since they might go long periods without much use. */
        routerinfo_t *me = router_get_my_routerinfo();
        if (time_to_recheck_bandwidth && me &&
            me->bandwidthcapacity < me->bandwidthrate &&
            me->bandwidthcapacity < 51200) {
          reset_bandwidth_test();
        }
#define BANDWIDTH_RECHECK_INTERVAL (12*60*60)
        time_to_recheck_bandwidth = now + BANDWIDTH_RECHECK_INTERVAL;
      }

As for Nick's comment about using any other bandwidth source besides the authority's view being futile, the reason we load balance across our bridges by bandwidth is for the case where you have a fast bridge and a bridge that rate limits to 10KB both rate and burst -- you want to use the fast one more often, or your Tor experience will suck, especially if somebody else is trying to use that bridge too.

It looks like clients fetch new bridge descriptors every hour, so there's really quite quick feedback here.

So it would be good in any case to make it more rare for a bridge to provide a descriptor that says it has zero bandwidth. When exactly does this happen? If you set up a bridge, then set your client to use it before the bridge has finished its bandwidth test? When else?

Another fix would be to mark bridges down if they're not reachable, so we fall back on the remaining ones. If we're not doing that, we might want to start anyway, to improve our odds of establishing circuits in the case where the user has 80 bridge entries and only 2 are up.

comment:5 Changed 9 years ago by arma

I guess I should distinguish here between the bandwidth from the bridge descriptor (it should basically never be zero after the first bandwidth test), and the bandwidth from the consensus (which is probably always zero because the bridge isn't in the consensus).

For cases where we're trying to choose a bridge but we don't have a descriptor for it, it seems smart to skip that bridge, yes?

We already try pretty hard to get missing descriptors for bridges, but that's on a separate schedule.

comment:6 Changed 9 years ago by nickm

In that case, we should at least reinstate the max-believable-bandwidth logic for bridges? Or did we never remove it?

comment:7 Changed 9 years ago by Sebastian

We do cap the bandwidth at the max believable rate (we never removed this functionality when looking at values in descriptors, we only removed it implicitly by using consensus values).

We're never trying to use consensus bw for bridges (we first check if the node is in the consensus, only if it is we're trying to use the consensus value).

I'll run tests in a bit to see how long the bw actually remains at 0. I believe this is the case for quite a bit in the bad case, which means your Tor might end up not working during that time.

comment:8 Changed 9 years ago by arma

I want to close #1912 as a wontfix, and open a new trac entry, for 0.2.3.x or later, to build a plan for what should happen when you run out of bridges.

But before that: if you have a descriptor for a bridge which is down, shouldn't you be marking it down? The people who have 80 bridges in their torrc must be having a heck of a time connecting, if only 2 of out every 80 circuits are going to a bridge that's up.

The flip side of course is that marking them down increases the odds you'll mark them all down, and then you lose for a while (until your Tor decides it's time to try fetching a new descriptor from one of them).

comment:9 Changed 9 years ago by nickm

17:35 < nickm> Sebastian: Are we moving at all towards a resolution for 1912 
               IYO?  Is the max-believable-bandwith for bridges low enough?
17:39 < Sebastian> nickm: hm. I still think my branch is what we should merge. 
                   I don't really understand why arma thinks we should close as 
                   wontfix.

17:41 < nickm> Sebastian: I think that armadev doesn't want unbalanced circuit 
               assignment where high-capacity bridges wind up underused and 
               low-capacity bridges wind up overused.
17:42 < Sebastian> nickm: yeah, I do understand that. I think that's 
                   unrealistic though, and actively hurts newly starting 
                   bridges that might not be around for a long time.
17:42 < nickm> I am concerned about the lie-about-your-bw attack
17:42 < Sebastian> that too
17:42 < Sebastian> We do cap it at 10mb, but that doesn't mean much
17:42 < Sebastian> because so many bridges have really low bw
17:43 < nickm> Also, since we distribute bridges uniformly (rather than by 
               bandwidth) I am not sure we are doing balancing at all well
17:44 < Sebastian> nickm: yeah. I think our usual bw ranking stuff fails here.
17:45 < nickm> Sebastian: So one solution could be to have a lower 
               MAX_BELIEVABLE_BANDWIDTH for bridges, and also a 
               MIN_BELIEVABLE_BANDWIDTH for bridges.
17:47 < Sebastian> nickm: hm, yeah. Maybe my intuition is way off, but I feel 
                   like the case where we have more than very few options for 
                   bridges is rare
17:56 < Sebastian> nickm: So I guess what I meant to say with that last comment 
                   is that I'm not sure that is worth the complexity.
17:59 < nickm> Sebastian: maybe the right answer is to do the weighting in the 
               bridgedb.
18:00 < Sebastian> yeah, I thought about that too.
18:01 < Sebastian> It does seem like that would be the right approach.
18:01 < nickm> Okay if I copy-and-paste this conversation into the 1912 
               discussion?
18:03 < Sebastian> Yeah. Feel free to always copy stuff from me from irc.

comment:10 Changed 9 years ago by arma

17:43 < nickm> Also, since we distribute bridges uniformly (rather than by

bandwidth) I am not sure we are doing balancing at all well

Yes, that's the issue.

Some people have very few bridges -- in that case, any bridge they use should be good enough for them, and this bug isn't relevant for them.

But some people have several bridges, and they should tend to use the ones that claim to be faster.

I think we're mixing together multiple bugs here. One of them is:

Use just one bridge with capacity 0, bug #1805 is triggered but things work. Add another (working) bridge to the set with a capacity != 0: We never choose the bridge with capacity 0.

So far so good.

Make the bridge with capacity != 0 nonworking: The Tor client breaks, because it wants to use the bridge with capacity. It doesn't try the other bridge, because it has a weight of 0."

Is that really happening? If it is, it's a bug that we should fix. The fix is not to start weighting bridges differently. It's to make Tor notice when a bridge has broken.

comment:11 Changed 9 years ago by Sebastian

Hrm. It just seems that bw weighting is a really bad idea - the bridge can just claim to be super fast, then be so slow that it *just* doesn't get marked as down, and create issues for the clients. This is a really easy attack

comment:12 Changed 9 years ago by arma

As for the security issue of lying about bridge bandwidth (well, more of an availability issue really), we should fix that for 0.2.2.x.

I think the idea of a min believable bandwidth and max believable bandwidth is a fine first cut. I suggest 10KB and 100KB respectively.

comment:13 Changed 9 years ago by arma

I would also be fine with 20KB and 100KB respectively.

comment:14 Changed 9 years ago by nickm

Priority: normalmajor

Bumping to "major" as a security issue. I am fine with either of those ranges.

comment:15 in reply to:  10 Changed 9 years ago by arma

Replying to arma:

Make the bridge with capacity != 0 nonworking: The Tor client breaks, because it wants to use the bridge with capacity. It doesn't try the other bridge, because it has a weight of 0."

Is that really happening? If it is, it's a bug that we should fix. The fix is not to start weighting bridges differently. It's to make Tor notice when a bridge has broken.

I just tried to reproduce (ran bridge on moria:9009 with bandwidth 50KB, and bridge2 on moria:9010 with bandwidth hard-coded in the descriptor to 0). Run with tor client with

UseBridges 1
bridge 128.31.0.34:9009
bridge 128.31.0.34:9010

Started it, it used 9009 exclusively (since it has bandwidth). Then I killed the first bridge, and it smoothly moved over to making circuits with bridge2.

Sebastian, can you still reproduce this bug?

comment:16 Changed 9 years ago by Sebastian

No, I can't reproduce that bug anymore. I guess it was fixed in a recent commit. See branch bug1912_v2 for an attempt to implement nickm's idea.

comment:17 Changed 9 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Merged and closing. Thanks!

comment:18 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.