Choosing bridges by bw is problematic

changed milestone to %Tor: 0.2.2.x-final

added component::core tor/tor milestone::Tor: 0.2.2.x-final owner::nickm priority::high resolution::fixed status::closed type::defect labels

grmpf, of course I forgot to add the right component after trying to open this report three times -.-

Trac:
Component: - Select a component to Tor Client

"Don't weight bridges" seems like the simplest solution to me for now. We don't have a reliable view of bridge bw from any authority, so looking at any other bw source atm seems futile.

Trac:
Milestone: N/A to Tor: 0.2.2.x-final
Status: new to accepted
Owner: N/A to nickm

Proposed fix in bug1912 in my repository.

Trac:
Status: accepted to needs_review

How often is it the case that bridge descriptors advertise a bandwidth of 0? bridges should do twice-daily bandwidth tests just like relays do if their bandwidth is under 50KB:

      } else if (time_to_recheck_bandwidth < now) {
        /* If we haven't checked for 12 hours and our bandwidth estimate is
         * low, do another bandwidth test. This is especially important for
         * bridges, since they might go long periods without much use. */
        routerinfo_t *me = router_get_my_routerinfo();
        if (time_to_recheck_bandwidth && me &&
            me->bandwidthcapacity < me->bandwidthrate &&
            me->bandwidthcapacity < 51200) {
          reset_bandwidth_test();
        }
#define BANDWIDTH_RECHECK_INTERVAL (12*60*60)
        time_to_recheck_bandwidth = now + BANDWIDTH_RECHECK_INTERVAL;
      }

As for Nick's comment about using any other bandwidth source besides the authority's view being futile, the reason we load balance across our bridges by bandwidth is for the case where you have a fast bridge and a bridge that rate limits to 10KB both rate and burst -- you want to use the fast one more often, or your Tor experience will suck, especially if somebody else is trying to use that bridge too.

It looks like clients fetch new bridge descriptors every hour, so there's really quite quick feedback here.

So it would be good in any case to make it more rare for a bridge to provide a descriptor that says it has zero bandwidth. When exactly does this happen? If you set up a bridge, then set your client to use it before the bridge has finished its bandwidth test? When else?

Another fix would be to mark bridges down if they're not reachable, so we fall back on the remaining ones. If we're not doing that, we might want to start anyway, to improve our odds of establishing circuits in the case where the user has 80 bridge entries and only 2 are up.

I guess I should distinguish here between the bandwidth from the bridge descriptor (it should basically never be zero after the first bandwidth test), and the bandwidth from the consensus (which is probably always zero because the bridge isn't in the consensus).

For cases where we're trying to choose a bridge but we don't have a descriptor for it, it seems smart to skip that bridge, yes?

We already try pretty hard to get missing descriptors for bridges, but that's on a separate schedule.

In that case, we should at least reinstate the max-believable-bandwidth logic for bridges? Or did we never remove it?

We do cap the bandwidth at the max believable rate (we never removed this functionality when looking at values in descriptors, we only removed it implicitly by using consensus values).

We're never trying to use consensus bw for bridges (we first check if the node is in the consensus, only if it is we're trying to use the consensus value).

I'll run tests in a bit to see how long the bw actually remains at 0. I believe this is the case for quite a bit in the bad case, which means your Tor might end up not working during that time.

I want to close #1912 (moved) as a wontfix, and open a new trac entry, for 0.2.3.x or later, to build a plan for what should happen when you run out of bridges.

But before that: if you have a descriptor for a bridge which is down, shouldn't you be marking it down? The people who have 80 bridges in their torrc must be having a heck of a time connecting, if only 2 of out every 80 circuits are going to a bridge that's up.

The flip side of course is that marking them down increases the odds you'll mark them all down, and then you lose for a while (until your Tor decides it's time to try fetching a new descriptor from one of them).

17:35 < nickm> Sebastian: Are we moving at all towards a resolution for 1912 
               IYO?  Is the max-believable-bandwith for bridges low enough?
17:39 < Sebastian> nickm: hm. I still think my branch is what we should merge. 
                   I don't really understand why arma thinks we should close as 
                   wontfix.

17:41 < nickm> Sebastian: I think that armadev doesn't want unbalanced circuit 
               assignment where high-capacity bridges wind up underused and 
               low-capacity bridges wind up overused.
17:42 < Sebastian> nickm: yeah, I do understand that. I think that's 
                   unrealistic though, and actively hurts newly starting 
                   bridges that might not be around for a long time.
17:42 < nickm> I am concerned about the lie-about-your-bw attack
17:42 < Sebastian> that too
17:42 < Sebastian> We do cap it at 10mb, but that doesn't mean much
17:42 < Sebastian> because so many bridges have really low bw
17:43 < nickm> Also, since we distribute bridges uniformly (rather than by 
               bandwidth) I am not sure we are doing balancing at all well
17:44 < Sebastian> nickm: yeah. I think our usual bw ranking stuff fails here.
17:45 < nickm> Sebastian: So one solution could be to have a lower 
               MAX_BELIEVABLE_BANDWIDTH for bridges, and also a 
               MIN_BELIEVABLE_BANDWIDTH for bridges.
17:47 < Sebastian> nickm: hm, yeah. Maybe my intuition is way off, but I feel 
                   like the case where we have more than very few options for 
                   bridges is rare
17:56 < Sebastian> nickm: So I guess what I meant to say with that last comment 
                   is that I'm not sure that is worth the complexity.
17:59 < nickm> Sebastian: maybe the right answer is to do the weighting in the 
               bridgedb.
18:00 < Sebastian> yeah, I thought about that too.
18:01 < Sebastian> It does seem like that would be the right approach.
18:01 < nickm> Okay if I copy-and-paste this conversation into the 1912 
               discussion?
18:03 < Sebastian> Yeah. Feel free to always copy stuff from me from irc.

17:43 < nickm> Also, since we distribute bridges uniformly (rather than by bandwidth) I am not sure we are doing balancing at all well

Yes, that's the issue.

Some people have very few bridges -- in that case, any bridge they use should be good enough for them, and this bug isn't relevant for them.

But some people have several bridges, and they should tend to use the ones that claim to be faster.

I think we're mixing together multiple bugs here. One of them is:

Use just one bridge with capacity 0, bug #1805 (moved) is triggered but things work. Add another (working) bridge to the set with a capacity != 0: We never choose the bridge with capacity 0.

So far so good.

Make the bridge with capacity != 0 nonworking: The Tor client breaks, because it wants to use the bridge with capacity. It doesn't try the other bridge, because it has a weight of 0."

Is that really happening? If it is, it's a bug that we should fix. The fix is not to start weighting bridges differently. It's to make Tor notice when a bridge has broken.

Hrm. It just seems that bw weighting is a really bad idea - the bridge can just claim to be super fast, then be so slow that it just doesn't get marked as down, and create issues for the clients. This is a really easy attack

As for the security issue of lying about bridge bandwidth (well, more of an availability issue really), we should fix that for 0.2.2.x.

I think the idea of a min believable bandwidth and max believable bandwidth is a fine first cut. I suggest 10KB and 100KB respectively.

I would also be fine with 20KB and 100KB respectively.

Bumping to "major" as a security issue. I am fine with either of those ranges.

Trac:
Priority: normal to major

Replying to arma:

Make the bridge with capacity != 0 nonworking: The Tor client breaks, because it wants to use the bridge with capacity. It doesn't try the other bridge, because it has a weight of 0."

Is that really happening? If it is, it's a bug that we should fix. The fix is not to start weighting bridges differently. It's to make Tor notice when a bridge has broken.

I just tried to reproduce (ran bridge on moria:9009 with bandwidth 50KB, and bridge2 on moria:9010 with bandwidth hard-coded in the descriptor to 0). Run with tor client with

UseBridges 1
bridge 128.31.0.34:9009
bridge 128.31.0.34:9010

Started it, it used 9009 exclusively (since it has bandwidth). Then I killed the first bridge, and it smoothly moved over to making circuits with bridge2.

Sebastian, can you still reproduce this bug?

No, I can't reproduce that bug anymore. I guess it was fixed in a recent commit. See branch bug1912_v2 for an attempt to implement nickm's idea.

Merged and closing. Thanks!

Trac:
Resolution: N/A to fixed
Status: needs_review to closed

Trac:
Component: Tor Client to Tor

closed

moved to tpo/core/tor#1912 (closed)

Choosing bridges by bw is problematic

Child items ...

Activity