Opened 14 months ago

Last modified 4 weeks ago

#8240 new defect

Raise our guard rotation period

Reported by: arma Owned by:
Priority: major Milestone: Tor: 0.2.???
Component: Tor Version:
Keywords: tor-client needs-proposal 023-backport Cc: mikeperry, iang, tariq.ee, rpw, amj703
Actual Points: Parent ID: #9321
Points:

Description

Tariq's COGS paper from WPES 2012 shows that a significant component of guard churn is due to voluntary rotation, rather than actual network changes:
http://freehaven.net/anonbib/#wpes12-cogs

In short, if the target client makes sensitive connections continuously every day for months, and you (the attacker) run some fast guards, the odds get pretty good that you'll become the client's guard at some point and get to do a correlation attack.

We could argue that the "continuously every day for months" assumption is unrealistic, so in practice we don't know how bad this issue really is. But for hidden services, it could well be a realistic assumption.

There are going to be (at least) two problems with raising the guard rotation period. The first is that we unbalance the network further wrt old guards vs new guards, and I'm not sure by how much, so I'm not sure how much our bwauth measurers will have to compensate. The second (related) problem is that we'll expand the period during which new guards don't get as much load as they will eventually get. This issue already results in confused relay operators trying to shed their Guard flag so they can resume having load.

In sum, if we raise the rotation period enough that it really results in load changes, then we could have unexpected side effects like having the bwauths raise the weights of new (and thus totally unloaded) guards to huge numbers, thus ensuring that anybody who rotates a guard will basically for sure get one of these new ones.

The real plan here needs a proposal, and should be for 0.2.5 or later. I wonder if we can raise it 'some but not too much' in the 0.2.4 timeframe though?

Child Tickets

TicketSummaryOwner
#9733Generate statistics about compromise due to traffic correlation with different guard selection and rotation parameters

Change History (39)

comment:1 follow-up: Changed 14 months ago by nickm

  • Keywords 023-backpor added
  • Status changed from new to needs_review

See branch "bug8240". It's against 0.2.3, actually -- I think this is serious and important. Please review and discuss which branches should get this.

comment:2 Changed 14 months ago by arma

  • Keywords 023-backport added; 023-backpor removed

comment:3 Changed 14 months ago by nickm

  • Priority changed from normal to major

comment:4 Changed 14 months ago by arma

  • Cc mikeperry iang tariq.ee rpw added

Nick's patch raises the guard rotation period to ~9.5 months (from ~1.5 months).

If we keep giving out the Guard flag in the same way, and it remains the case that well more than half of the capacity in the network has the Guard flag (~65% on https://metrics.torproject.org/network.html#bwhist-flags), and the median byte of guard capacity has had the Guard flag for at most 4.75 out of the last 9.5 months (I just made that number up, but I bet there exist times when it's plausible), then we basically just threw away >1/3 of our total network capacity by having clients never use it when they could have. Our bwauths might try to compensate by blowing up the weights of those new nodes, but from a security perspective that's exactly what we don't want (especially if they're Exits too, and the same weight inflates their chance of being used as an exit too).

Our current client weighting in path selection assumes a steady-state where everybody with the Guard flag has had it long enough to attract its fair share of users. This isn't true now, but we've been doing ok pretending it is. I fear we won't be able to pretend once you need to have run your Guard for nine months before you hit steady-state.

I like the idea of putting in a parameter now, so we can teach clients to obey the parameter now, and change it later. But I think clients need to know how close to steady-state a guard is, so they can balance appropriately. Is that a new weight on the w line? Or something else?

I'm cc'ing Mike here, since he started the whole balance-by-position-in-path strategy; and Ian and Tariq, since they worked on the COGS paper; and Ralf, since he touched on this issue in his upcoming Oakland paper.

comment:5 Changed 14 months ago by arma

While we're planning, though: it seems that hidden services are extra vulnerable to this issue, since they don't move and since the adversary can induce them to talk. Should we disable guard rotation for hidden services? Or just crank up its rotation period a lot?

So long as hidden services aren't a big piece of network traffic, such a move shouldn't influence overall network load balancing, and should help the hidden services a lot.

comment:6 follow-ups: Changed 14 months ago by mikeperry

These are hard questions. People already hate the fact that when their relays get the guard flag: throughput drops off for days. I don't believe the transition takes weeks though (probably thanks to the bw auths), but I have not studied it in detail.

One way to improve this balancing problem might to adjust the Wxx weights such that the guard ones are dependent on how long you've had the guard flag vs this rotation parameter. If we had a curve to model the migration rate and metadata to record the Guard flag age to create points on this curve, this might not be too hard to do. I suppose a uniform migration rate might be as good an assumption as any...

However, personally, I think that in reality clients are rotating off of their guards much quicker than even the 1.5mo limit. At least, it feels like my Tor clients were doing that when watching path bias counts.. I think this might be the same problem you describe when talking about the age of the median byte of the Guard flag (Guards may actually already going up and down/losing their flags way faster than our limits). For this reason, I'm wondering if simply changing the rotation period to 9.5mos might not actually change the rotation rate in practice.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 14 months ago by arma

Replying to mikeperry:

One way to improve this balancing problem might to adjust the Wxx weights such that the guard ones are dependent on how long you've had the guard flag vs this rotation parameter. If we had a curve to model the migration rate and metadata to record the Guard flag age to create points on this curve, this might not be too hard to do. I suppose a uniform migration rate might be as good an assumption as any...

I agree that a uniform migration rate is as good as any (I assume by migration you mean from clients with the old behavior to clients with the new behavior). But further, don't forget that another factor here is new users showing up and picking guards. I guess we could assume that those are negligible (not true but hey, maybe it's close enough).

I like the notion of changing the weights, but I feel like inflating the Bandwidth= weight is the wrong way to do it. I increasingly think we need a per-relay thing to say "how much of a guard it is".

comment:8 in reply to: ↑ 6 Changed 14 months ago by arma

Replying to mikeperry:

However, personally, I think that in reality clients are rotating off of their guards much quicker than even the 1.5mo limit. At least, it feels like my Tor clients were doing that when watching path bias counts.. I think this might be the same problem you describe when talking about the age of the median byte of the Guard flag (Guards may actually already going up and down/losing their flags way faster than our limits). For this reason, I'm wondering if simply changing the rotation period to 9.5mos might not actually change the rotation rate in practice.

See Figures 4 and 5 in the COGS paper. At least during the time period of that Tor network dataset in 2011, voluntary rotation was a much bigger risk component than natural churn.

comment:9 in reply to: ↑ 7 ; follow-up: Changed 14 months ago by mikeperry

Replying to arma:

Replying to mikeperry:

One way to improve this balancing problem might to adjust the Wxx weights such that the guard ones are dependent on how long you've had the guard flag vs this rotation parameter. If we had a curve to model the migration rate and metadata to record the Guard flag age to create points on this curve, this might not be too hard to do. I suppose a uniform migration rate might be as good an assumption as any...

I agree that a uniform migration rate is as good as any (I assume by migration you mean from clients with the old behavior to clients with the new behavior). But further, don't forget that another factor here is new users showing up and picking guards. I guess we could assume that those are negligible (not true but hey, maybe it's close enough).

Actually no, I mean migration rate in terms of how quickly new guards can expect to accumulate their proper fraction of clients actually using them as a Guard node. The problem I'm describing is that giving new relays a Guard flag means the weights from https://gitweb.torproject.org/torspec.git/blob/master:/path-spec.txt#l206 cause fresh guards get substantially less clients until people migrate. Increasing the rotation period would exacerbate this problem. Hence, we might want to use an additional computation on the Wg* and W*g weights.

In fact there may be two rates at work here: the natural rate of migration of clients to your new Guard node, and then later, the fraction of Guard flagged nodes who are of a certain age. Both of these will require some kind of annotation or record keeping on the authority side to compute, as they are likely best represented as points along the (0, 9.5mo] domain of two different curves.

I like the notion of changing the weights, but I feel like inflating the Bandwidth= weight is the wrong way to do it. I increasingly think we need a per-relay thing to say "how much of a guard it is".

Correct, this would be a change to the authority consensus process that computes these weights: https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1482

comment:10 in reply to: ↑ 9 Changed 14 months ago by arma

Replying to mikeperry:

Replying to arma:

I like the notion of changing the weights, but I feel like inflating the Bandwidth= weight is the wrong way to do it. I increasingly think we need a per-relay thing to say "how much of a guard it is".

Correct, this would be a change to the authority consensus process that computes these weights: https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt#l1482

I guess I'm not being clear. Here's an attack, if we continue having only one weight per relay in the consensus. Let's say a new mid-sized adversarial exit relay shows up. It has the Exit flag, no guard flag, and not a very high Bandwidth= number on the w line, since it's being used as an Exit and a Middle, so it isn't super-impressive with its download speeds.

When it earns the Guard flag, clients will back off from using it as the middle hop, and partially back off from using it as the exit hop, since they assume other clients will be using it as a guard. So its usage will go way down.

In this ticket we remark several times that we hope the bwauths will then find it to be much faster, and give it a larger Bandwidth= weight, so clients will more quickly pick it as a Guard.

But as a side effect, we have just inflated the chance that clients will pick it as an Exit too, since it's the same Bandwidth= weight that tells how useful it 'should' be for either position. So an adversary can arrange to run lots of these newly-got-the-Guard-flag relays and get more than his fair share of exit traffic.

Instead I'm suggesting that we have a second weight on that w line, which shows, for each guard, how much of his steady-state client quota we think he should have by now. And then clients would use this number to treat a half-way-there guard as having more available capacity for other path positions than an all-the-way-there guard.

I agree that this complicates things. I don't see a way of doing it without having a new parameter, per relay, though.

comment:11 follow-up: Changed 14 months ago by mikeperry

I'm not talking about the "Bandwidth=" weights. I'm talking about the flag weights. In fact, it appears to me that I am suggesting exactly the same thing you are, just using a different mechanism (one that existing clients already obey today).

comment:12 Changed 14 months ago by nickm

Could one/both you spell out with more exactitude what additional fix you prefer? I'll implement something if I need to, but I'd rather have somebody else figure out *what* to implement. Please don't leave any steps out.

Also, does any of the above militate against actually merging this patch, possibly with the default value a little lower (3 months?), and a plan to move the default value higher once we have the Guard flag/W parameters/whatever working like we'd like?

comment:13 in reply to: ↑ 11 ; follow-up: Changed 14 months ago by arma

Replying to mikeperry:

I'm not talking about the "Bandwidth=" weights. I'm talking about the flag weights. In fact, it appears to me that I am suggesting exactly the same thing you are, just using a different mechanism (one that existing clients already obey today).

Great. What do we change in these weights then? I still don't see with these system-wide weights how we can tell clients to still back off from using a Guard-for-a-long-time relay for other path positions, but not back off so much from using a Guard-for-just-a-short-time relay for other path positions.

comment:14 in reply to: ↑ 1 Changed 14 months ago by arma

  • Status changed from needs_review to needs_revision

Replying to nickm:

See branch "bug8240".

guards_get_lifetime()'s comments says it's about "directory guards", but I think it's about the other kinds of guards too, yes?

I believe I misremembered the code when making some of the above comments -- I now believe setting GuardLifetime to 9 months will make your guards last between 8 and 9 months (as opposed to between 9 and 10 months). Since your consensus param wants to be "minimum lifetime" (which is a fine choice), we should deal with that fencepost issue internally by adding on an extra fencepost or something.

I notice that our time_units array doesn't know what a month is. (I noticed because if we add a month and define it as 30 days, then 2 months, which is the current value, will be under your MIN_GUARD_LIFETIME value.) (That said, see the above fencepost issue -- I think the current value is actually best described as "1 month", meaning "at least 1 month", i.e. when it chooses an expiration time it chooses it up to 30 days in the past, and when it checks for expiration it checks if 60 days have passed.)

Your patch doesn't change the two comments in remove_obsolete_entry_guards() that say "2 months".

I'd be fine changing the value to "at least 2 months" while we're discussing how to deal with the weights issue.

comment:15 in reply to: ↑ 13 Changed 14 months ago by mikeperry

Replying to arma:

Replying to mikeperry:

I'm not talking about the "Bandwidth=" weights. I'm talking about the flag weights. In fact, it appears to me that I am suggesting exactly the same thing you are, just using a different mechanism (one that existing clients already obey today).

Great. What do we change in these weights then? I still don't see with these system-wide weights how we can tell clients to still back off from using a Guard-for-a-long-time relay for other path positions, but not back off so much from using a Guard-for-just-a-short-time relay for other path positions.

Ugh, I think I have braindamage from juggling too many things. You're right, individual relays can't be re-weighted in this way currently. We would need client side changes for what I described: we'd need to get the duration that each node has had the Guard flag to the client somehow, and then the client would have to adjust that node's W*g* weights themselves..

Either way, it's not something we'd do on the 0.2.4.x timescale. For now, we should avoid raising the new limit too far beyond its current value..

comment:16 Changed 14 months ago by nickm

Okay. Mike, do you agree with arma's list of what to do above? I'd like to get consensus here so we can merge this.

comment:17 Changed 14 months ago by mikeperry

For the branch for 0.2.4.x: I agree we should default to 2 or 3 months instead of 9 months here. I also took a quick look at the branch and it seems weird to clamp the torrc option silently. If we're going to alter user torrc values, it should probably be in options_validate() with a log message. I also think 2 months is a high minimum, especially for torrc. Seems like "10 minutes" is a better minimum there. The consensus I agree we might want to bottom out at a month.

For 0.2.5.x when we actually change this to a larger value: I thought about the weight discussion a bit more. It of course needs a proposal to make it specific enough to implement, but I think the best option would be to create a new consensus method that allows each relay to optionally have a subset of the bandwidth-weights keyword pairs (the Wxx ones used by compute_weighted_bandwidths() and smartlist_choose_node_by_bandwidth_weights()) on its 'w' line, which would override the values from the consensus footer if present.

We would then compute these W*g* weights for each relay at the authorities depending on how old of a guard a relay is using a scaling function similar to what arma/I mentioned earlier (probably a simple linear function that represents a constant rate of client arrival and migration until we hit an age greater than the rotation period).

We'd also probably want to alter the bandwidth-weight computation at the end to multiply these overrides by the relay bandwidth, as if that multiplied value were the total bandwidth for that relay for that flag. This would give us more realistic fractions of how much bandwidth actually is being actively used for the Guard vs other positions during any given consensus period.

Once those two changes are made, we should be free to make this value as large as we want without impacting balancing significantly, I think. We should also be able to observe in practice that getting the Guard flag should no longer cause your relay to suddenly drop in traffic volume, so it will hopefully be obvious if its actually working.

comment:18 Changed 13 months ago by nickm

  • Status changed from needs_revision to needs_review

I've made arma's changes in branch "bug8240_v2", still on 0.2.3. I'm satisfied with clamping the option silently for now, given that I added documentation. 10 minutes is insanely low; I guess we could make it possible for testingtornetwork purposes, but that seems like a feature, and therefore 0.2.5 stuff. Adding a lower consensus-based minimum I kinda want to lump into the same category.

Putting this back in needs_review: shall I forward-port to 0.2.4 and merge?

comment:19 Changed 13 months ago by nickm

(I'm still considering this an 0.2.3 backport candidate)

comment:20 Changed 13 months ago by mikeperry

  • Status changed from needs_review to needs_revision

I think it's a bad idea to change this minimum in a way that we cannot quickly correct later, given that there are current known load balancing issues with the existing Guard rotation duration that will be made worse by this change. Raising the consensus minimum *above* our current default does exactly this. The minimum should be the current default, at least.

We need to do at least this much for anything prior to 0.2.5.x. I'm also nervous about letting 0.2.5.x clients start existing with a new load balancing parameter we have no control over. It makes solving other performance problems harder.

comment:21 Changed 13 months ago by nickm

  • Status changed from needs_revision to needs_review

The minimum should be the current default, at least.

Okay; done in bug8240_v2.

with a new load balancing parameter we have no control over

What's the uncontrollable parameter you're referring to in this case? It's a consensus parameter, after all; we control that. (At least, the authorities do.)

FWIW, I am pushing on this branch because I am worried about letting the status quo of "wait long enough and you'll get a bad guard" sit around indefinitely while we analyze ourselves into oblivion. I hope this doesn't turn into one of those tickets where we can't fix Tor until something else is done, but we never make a plan to do that something else.

I guess that means that once we merge this, we need to make a realistic plan for making it safe to increase the guard lifetime a lot.

comment:22 Changed 13 months ago by mikeperry

Understood. I think 2 months will likely cause us enough load-balancing pain that we'll want to fix the underlying load balancing problem in 0.2.5.x anyway. I think I also want to work on a few performance problems in 0.2.5.x, and flag-weights are included in that. I've created #8453 to remind me of the flag-weight changes.

Also for the record, I am made slightly nervous about increasing the Guard node duration to insane levels until we also reduce the amount of influence a Guard's identity key allows over your paths. Hopefully that can also be improved/solved in 0.2.5.x. (I don't like the idea that Guard lifetime might considerably exceed a predecessor attack that is designed to find your Guards and exploit/coerce/corrupt them).

comment:23 Changed 13 months ago by andrea

I think this looks okay to me given the latest version with the 1 month default.

comment:24 Changed 13 months ago by nickm

  • Milestone changed from Tor: 0.2.4.x-final to Tor: 0.2.3.x-final

Okay. (To clarify, default is 2 months; it's the minimum that I dropped to one month.)

Merging into 0.2.4, with some trickiness. Tossing ticket into 0.2.3 because of 023-backport. The branch is now "bug8240_v2_squashed"

comment:25 Changed 13 months ago by nickm

When backporting, include the trivial fix for bug #8553 in commit 6196d0e83d78e2e8efff575d490f4cb254415832

comment:26 Changed 12 months ago by arma

I bet weasel could be talked into a trivial patch that just raises a few constants, for 0.2.3.

I bet he'd be pretty sad with all the infrastructure around the new config option, the new consensus param, the defaults and mins and maxes, etc.

comment:27 Changed 12 months ago by nickm

Branch "bug8240_simple" in my public repository now has such a branch.

comment:28 Changed 12 months ago by arma

Looks good to me. It should probably change the comment from '2 months' to '3 months' though.

comment:29 Changed 12 months ago by nickm

Added a fixup commit. Shall we wait for weal, or merge as-is?

comment:30 Changed 12 months ago by nickm

Err, weasel. Should we wait for weasel, or merge as-is?

comment:31 Changed 12 months ago by mikeperry

  • Status changed from needs_review to needs_revision

*sigh* Let me try this again.

I think it's a bad idea to change this minimum in a way that we cannot quickly correct later, given that there are current known load balancing issues with the existing Guard rotation duration that will be made worse by this change. Raising the compiled-in value without consensus parameter support does exactly this.

Please don't let Debian's squeamishness mess up our load balancing without the ability to recover or experiment with it.

comment:32 Changed 12 months ago by nickm

Roger, Mike, Weasel: Please discuss this and form a consensus. I do not care which of these we merge, and I do not want "everybody try to persuade nick" to be the decision-making mechanism.

comment:33 Changed 12 months ago by mikeperry

I think in terms of load balancing issues, I have a slight preference for merging the consensus parameter version of this into 0.2.3.x.. At least then the whole network would be doing the same thing sooner.. that would actually make #8453 easier to analyze, study, and reason about.

However, I still really, really don't want just a constant update that we have no control over to suddenly appear in an already-stable release. That could easily turn into a nightmare. :/

comment:34 follow-up: Changed 11 months ago by arma

Well, here we are in a deadlock again.

If we merge the consensus parameter version to 0.2.3 (or heck, to 0.2.4), but we don't add any rebalancing code to these versions, then when it comes time to set that consensus parameter, we're going to hesitate and try not to do it, right?

We need a lot of our users to be choosing based on the new (not yet designed, not yet implemented, not yet deployed) weighting strategy, when we tell a lot of our users to lengthen their guard rotation interval. Unless I'm mistaken?

Or is Mike recommending that we roll out the consensus parameter version, have many of our users change to 9 months, then figure we'll do #8453 and then the newer users will slowly upgrade to that newer code and eventually we'll be rebalanced again?

comment:35 in reply to: ↑ 34 Changed 11 months ago by mikeperry

Replying to arma:

Well, here we are in a deadlock again.

If we merge the consensus parameter version to 0.2.3 (or heck, to 0.2.4), but we don't add any rebalancing code to these versions, then when it comes time to set that consensus parameter, we're going to hesitate and try not to do it, right?

The consensus parameter version is in main-0.2.4.x. The merge commit was 6f20a74d52741cce521cf03b8afee570e3cb367b.

We need a lot of our users to be choosing based on the new (not yet designed, not yet implemented, not yet deployed) weighting strategy, when we tell a lot of our users to lengthen their guard rotation interval. Unless I'm mistaken?


Or is Mike recommending that we roll out the consensus parameter version, have many of our users change to 9 months, then figure we'll do #8453 and then the newer users will slowly upgrade to that newer code and eventually we'll be rebalanced again?

I don't plan to push back against the consensus parameter increase for values like 2 months.

However, anything more I think is unwise for several reasons: We need load balancing fixes; defense-in-depth fixes against path bias and route capture attacks that are possible via Guard node key compromise; and better protection against the ability of the Guard nodes of hidden services to be discovered.

I also don't fully believe that simply raising the Guard lifespan is the best defense against rpw's hidden service paper against all classes of adversary. In the event that you can find the service's Guard nodes so much more quickly than you could find the client after a few months of guard rotation, it seems *worse* to make them stick with those Guard nodes even longer.

For clients, it seems clear to me that you should have your Guard node for a function of the amount of time it takes for the adversary to discover it. Because of this, I think I *do* want to go back to arguing that the minimum accepted torrc value should be an hour for clients (this is the duration of rpw's attack).

I think the risk calculations might be different for hidden services than for clients, but near as I can tell, I think a longer Guard duration means we just dumped a whole lot of risk onto our Guard nodes, and we're going to make it a whole lot worse by making those guard nodes targets for much longer periods of time..

If we can fix all of the attacks that stealing Guard node identity keys enables *and* make it harder to discover Guard nodes in the first place, then I might be talked into values like 9-12 months.

comment:36 Changed 9 months ago by arma

  • Parent ID set to #9321

comment:37 Changed 5 months ago by amj703

  • Cc amj703 added

comment:38 Changed 4 weeks ago by nickm

  • Milestone changed from Tor: 0.2.3.x-final to Tor: 0.2.???

comment:39 Changed 4 weeks ago by nickm

  • Status changed from needs_revision to new
Note: See TracTickets for help on using tickets.