Reduce relay bandwidth stats interval to 24 hours

changed milestone to %Tor: 0.2.5.x-final

added actualpoints::0.5 component::core tor/tor easy guard-discovery-stats intro milestone::Tor: 0.2.5.x-final points::1 priority::high resolution::implemented severity::normal sponsor::Q status::closed type::enhancement labels

Trac:
Cc: N/A to karsten
Priority: Medium to High

Trac:
Sponsor: N/A to SponsorQ

We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps? What potentially bad consequences did we overlook? How do we find out? Who do we ask?

teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?

Replying to karsten:

We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps?

We change the code and potentially backport as a security issue.

What potentially bad consequences did we overlook? How do we find out? Who do we ask?

How does metrics handle a network where some relays report every 6 hours, and others report every 24 hours?

Do we need to remove some of the graphs from atlas, because they won't have data any more?

I can't imagine any other consequences.

teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?

Yes, that seems sensible. Did you want to do that, or should I?

Replying to teor:

Replying to karsten:

We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps?

We change the code and potentially backport as a security issue.

Okay. Let's first consider all possible consequences before doing so.

What potentially bad consequences did we overlook? How do we find out? Who do we ask?

How does metrics handle a network where some relays report every 6 hours, and others report every 24 hours?

In theory this should be fine, and it already had to handle the case of some relays reporting every 15 minutes and others every 4 (not 6!) hours.

But we can try this out by rewriting some descriptors and feeding them into the various metrics parts.

Do we need to remove some of the graphs from atlas, because they won't have data any more?

Yes, we'll want to do that as soon as more and more relays switch over to reporting every 24 hours.

I can't imagine any other consequences.

Okay!

teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?

Yes, that seems sensible. Did you want to do that, or should I?

Would you mind doing it? I think you can describe the guard discovery attack better than I could.

Trac:
Keywords: guard-discovery deleted, guard-discovery-033 added

Trac:
Keywords: guard-discovery-033 deleted, guard-discovery-stats added

See https://lists.torproject.org/pipermail/tor-dev/2017-October/012517.html

What happens in those cases where client and adversary are one and the same? An adversary could create many connections to the service, which could lead to the spike in stats. Would 24 hour interval be immune to that?

Replying to jvsg:

What happens in those cases where client and adversary are one and the same? An adversary could create many connections to the service, which could lead to the spike in stats.

Yes, this is one possible scenario I describe in my tor-dev@ email at https://lists.torproject.org/pipermail/tor-dev/2017-October/012517.html

To defend against this particular case, onion service operators could use a tool like OnionBalance to spread load across a set of service instances. But this comes with its own security tradeoffs. It's also possible to limit bandwidth at the onion service, but that doesn't stop the traffic being sent as far as the guard.

Would 24 hour interval be immune to that?

There are multiple ways to determine relay load: using published relay statistics is one of the easiest. We are trying to decrease the usefulness of published relay statistics for this attack, while preserving their utility to relay operators and the network.

No simple change will make tor immune. This is because there is a design tradeoff in tor: clients choose one guard, so they have a low probability of encountering a malicious guard, and so they are less linkable. But using one guard makes inflating its bandwidth easier.

Just had time to read the [tor-dev] thread. +1 for this ticket.

The relevant constants in tor are:

NUM_SECS_BW_SUM_INTERVAL and maybe we will need to increase NUM_SECS_BW_SUM_IS_VALID to remember 2-4 days of daily bandwidth, rather than just one day: https://gitweb.torproject.org/tor.git/tree/src/or/rephist.c#n1242

And maybe we should also change MAX_BANDWIDTH_CHANGE_FREQ, otherwise relays will report bandwidth spikes every 20 minutes in their descriptors. But we should be careful here, because this affects the bandwidth authority system. But it seems that at least an hour would be reasonable: https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2516

arma and nickm and I had an irc conversation about this today:

Replying to teor:

The relevant constants in tor are:

NUM_SECS_BW_SUM_INTERVAL and maybe we will need to increase NUM_SECS_BW_SUM_IS_VALID to remember 2-4 days of daily bandwidth, rather than just one day: https://gitweb.torproject.org/tor.git/tree/src/or/rephist.c#n1242

NUM_SECS_BW_SUM_INTERVAL 24 hours /* similar to the 6 periods * 4 hours we had before */ NUM_SECS_BW_SUM_IS_VALID 5 days

And maybe we should also change MAX_BANDWIDTH_CHANGE_FREQ, otherwise relays will report bandwidth spikes every 20 minutes in their descriptors. But we should be careful here, because this affects the bandwidth authority system. But it seems that at least an hour would be reasonable: https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2516

/* descriptor bandwidth changes propagate to clients between:

10 minutes (upload right before vote, then client bootstrap from authority)
3 hours (upload just after vote, then client fetch from mirror as late as possible)
after they are uploaded by relays. There's no point in reporting them much more often
than this, because the feedback loop only runs at this speed. */ MAX_BANDWIDTH_CHANGE_FREQ 3 hours

arma believes relays that already have enough measured bandwidth are able to share their bandwidths less often (that is, when they are Fast or Guard or Measured or something), or share them after a larger change. (That is, arma said that small relays should share it more often, and I inverted his logic and set a minimum instead.)

I agree, but I'm not sure how much more delay or how much more bandwidth we can allow. So let's take this change in 0.3.2, and do some analysis for 0.3.3. I split this part off into #24104 (moved).

I emailed tor-dev@ about this change, including a summary of my comment above: https://lists.torproject.org/pipermail/tor-dev/2017-November/012538.html

The discussions on tor-dev and IRC were positive.

Someone needs to write a patch that changes the 3 constants listed above, and a changes file. It's a nice easy patch, except maybe for the changes file.

Trac:
Keywords: N/A deleted, intro, easy added
Status: new to needs_revision

To summarize the above, we are talking about a patch that changes these to those values?:

+#define NUM_SECS_BW_SUM_IS_VALID (5*24*60*60) /* 5 days */
+#define NUM_SECS_BW_SUM_INTERVAL (24*60*60) /* 24 hours */
+#define MAX_BANDWIDTH_CHANGE_FREQ (3*60*60) /* 3 hours */

If so, please see branch: ticket23856_032_01

Trac:
Status: needs_revision to needs_review

Yes, that's exactly the patch!

Let's merge it, and then metrics can update their side.

Trac:
Type: defect to enhancement
Actualpoints: N/A to 0.5
Status: needs_review to merge_ready

Cherry-picked backwards as ticket23856_025_01.

Merged to 0.3.2.x and forward; marking for possible backport.

Trac:
Milestone: Tor: 0.3.2.x-final to Tor: 0.3.1.x-final

Backported to 0.2.5 and forward.

Trac:
Milestone: Tor: 0.3.1.x-final to Tor: 0.2.5.x-final
Resolution: N/A to implemented
Status: merge_ready to closed

closed

changed time estimate to 8h

added 4h of time spent

mentioned in issue #24104 (moved)

mentioned in issue #24155 (moved)

mentioned in issue #24248 (moved)

mentioned in issue #24729 (moved)

mentioned in issue #24821 (moved)

mentioned in issue #25192 (moved)

mentioned in issue #26301 (moved)

mentioned in issue #27135 (moved)

mentioned in issue #28984 (moved)

mentioned in issue #28985 (moved)

moved to tpo/core/tor#23856 (closed)

mentioned in issue tpo/core/tor#24104 (closed)

mentioned in issue tpo/core/tor#28984 (closed)

mentioned in issue tpo/network-health/sbws#28985 (moved)

mentioned in issue tpo/network-health/onbasca#91

Reduce relay bandwidth stats interval to 24 hours

Child items ...

Activity