We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps? What potentially bad consequences did we overlook? How do we find out? Who do we ask?
teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?
We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps?
We change the code and potentially backport as a security issue.
What potentially bad consequences did we overlook? How do we find out? Who do we ask?
How does metrics handle a network where some relays report every 6 hours, and others report every 24 hours?
Do we need to remove some of the graphs from atlas, because they won't have data any more?
I can't imagine any other consequences.
teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?
Yes, that seems sensible. Did you want to do that, or should I?
We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps?
We change the code and potentially backport as a security issue.
Okay. Let's first consider all possible consequences before doing so.
What potentially bad consequences did we overlook? How do we find out? Who do we ask?
How does metrics handle a network where some relays report every 6 hours, and others report every 24 hours?
In theory this should be fine, and it already had to handle the case of some relays reporting every 15 minutes and others every 4 (not 6!) hours.
But we can try this out by rewriting some descriptors and feeding them into the various metrics parts.
Do we need to remove some of the graphs from atlas, because they won't have data any more?
Yes, we'll want to do that as soon as more and more relays switch over to reporting every 24 hours.
I can't imagine any other consequences.
Okay!
teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?
Yes, that seems sensible. Did you want to do that, or should I?
Would you mind doing it? I think you can describe the guard discovery attack better than I could.
What happens in those cases where client and adversary are one and the same? An adversary could create many connections to the service, which could lead to the spike in stats. Would 24 hour interval be immune to that?
What happens in those cases where client and adversary are one and the same? An adversary could create many connections to the service, which could lead to the spike in stats.
To defend against this particular case, onion service operators could use a tool like OnionBalance to spread load across a set of service instances. But this comes with its own security tradeoffs. It's also possible to limit bandwidth at the onion service, but that doesn't stop the traffic being sent as far as the guard.
Would 24 hour interval be immune to that?
There are multiple ways to determine relay load: using published relay statistics is one of the easiest. We are trying to decrease the usefulness of published relay statistics for this attack, while preserving their utility to relay operators and the network.
No simple change will make tor immune. This is because there is a design tradeoff in tor: clients choose one guard, so they have a low probability of encountering a malicious guard, and so they are less linkable. But using one guard makes inflating its bandwidth easier.
And maybe we should also change MAX_BANDWIDTH_CHANGE_FREQ, otherwise relays will report bandwidth spikes every 20 minutes in their descriptors. But we should be careful here, because this affects the bandwidth authority system. But it seems that at least an hour would be reasonable:
https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2516
NUM_SECS_BW_SUM_INTERVAL 24 hours
/* similar to the 6 periods * 4 hours we had before */
NUM_SECS_BW_SUM_IS_VALID 5 days
And maybe we should also change MAX_BANDWIDTH_CHANGE_FREQ, otherwise relays will report bandwidth spikes every 20 minutes in their descriptors. But we should be careful here, because this affects the bandwidth authority system. But it seems that at least an hour would be reasonable:
https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2516
/* descriptor bandwidth changes propagate to clients between:
10 minutes (upload right before vote, then client bootstrap from authority)
3 hours (upload just after vote, then client fetch from mirror as late as possible)
after they are uploaded by relays. There's no point in reporting them much more often
than this, because the feedback loop only runs at this speed. */
MAX_BANDWIDTH_CHANGE_FREQ 3 hours
arma believes relays that already have enough measured bandwidth are able to share their bandwidths less often (that is, when they are Fast or Guard or Measured or something), or share them after a larger change. (That is, arma said that small relays should share it more often, and I inverted his logic and set a minimum instead.)
I agree, but I'm not sure how much more delay or how much more bandwidth we can allow. So let's take this change in 0.3.2, and do some analysis for 0.3.3. I split this part off into #24104 (moved).
Someone needs to write a patch that changes the 3 constants listed above, and a changes file.
It's a nice easy patch, except maybe for the changes file.
Trac: Keywords: N/Adeleted, intro, easy added Status: new to needs_revision