Opened 2 months ago

Closed 43 hours ago

#23856 closed enhancement (implemented)

Reduce relay bandwidth stats interval to 24 hours

Reported by: teor Owned by:
Priority: High Milestone: Tor: 0.2.5.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: guard-discovery-stats, easy, intro
Cc: karsten Actual Points: 0.5
Parent ID: Points: 1
Reviewer: Sponsor: SponsorQ

Description

We want to do this to reduce the efficiency of guard discovery attacks.

Child Tickets

Change History (20)

comment:1 Changed 2 months ago by nickm

Cc: karsten added
Priority: MediumHigh

comment:2 Changed 7 weeks ago by nickm

Sponsor: SponsorQ

comment:3 Changed 7 weeks ago by karsten

We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps? What potentially bad consequences did we overlook? How do we find out? Who do we ask?

teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?

comment:4 in reply to:  3 ; Changed 7 weeks ago by teor

Replying to karsten:

We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps?

We change the code and potentially backport as a security issue.

What potentially bad consequences did we overlook? How do we find out? Who do we ask?

How does metrics handle a network where some relays report every 6 hours, and others report every 24 hours?

Do we need to remove some of the graphs from atlas, because they won't have data any more?

I can't imagine any other consequences.

teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?

Yes, that seems sensible. Did you want to do that, or should I?

comment:5 in reply to:  4 Changed 7 weeks ago by karsten

Replying to teor:

Replying to karsten:

We briefly talked about this in Montreal and agreed that it's probably a good idea. What are the next steps?

We change the code and potentially backport as a security issue.

Okay. Let's first consider all possible consequences before doing so.

What potentially bad consequences did we overlook? How do we find out? Who do we ask?

How does metrics handle a network where some relays report every 6 hours, and others report every 24 hours?

In theory this should be fine, and it already had to handle the case of some relays reporting every 15 minutes and others every 4 (not 6!) hours.

But we can try this out by rewriting some descriptors and feeding them into the various metrics parts.

Do we need to remove some of the graphs from atlas, because they won't have data any more?

Yes, we'll want to do that as soon as more and more relays switch over to reporting every 24 hours.

I can't imagine any other consequences.

Okay!

teor, do you think it would make sense to send a short summary of our Montreal discussion to tor-dev@, suggest our plan there, and ask for feedback? Or is that too much?

Yes, that seems sensible. Did you want to do that, or should I?

Would you mind doing it? I think you can describe the guard discovery attack better than I could.

comment:6 Changed 7 weeks ago by mikeperry

Keywords: guard-discovery-033 added; guard-discovery removed

comment:7 Changed 7 weeks ago by mikeperry

Keywords: guard-discovery-stats added; guard-discovery-033 removed

comment:9 Changed 7 weeks ago by jvsg

What happens in those cases where client and adversary are one and the same? An adversary could create many connections to the service, which could lead to the spike in stats. Would 24 hour interval be immune to that?

comment:10 in reply to:  9 Changed 7 weeks ago by teor

Replying to jvsg:

What happens in those cases where client and adversary are one and the same? An adversary could create many connections to the service, which could lead to the spike in stats.

Yes, this is one possible scenario I describe in my tor-dev@ email at https://lists.torproject.org/pipermail/tor-dev/2017-October/012517.html

To defend against this particular case, onion service operators could use a tool like OnionBalance to spread load across a set of service instances. But this comes with its own security tradeoffs. It's also possible to limit bandwidth at the onion service, but that doesn't stop the traffic being sent as far as the guard.

Would 24 hour interval be immune to that?

There are multiple ways to determine relay load: using published relay statistics is one of the easiest. We are trying to decrease the usefulness of published relay statistics for this attack, while preserving their utility to relay operators and the network.

No simple change will make tor immune. This is because there is a design tradeoff in tor: clients choose one guard, so they have a low probability of encountering a malicious guard, and so they are less linkable. But using one guard makes inflating its bandwidth easier.

comment:11 Changed 7 weeks ago by asn

Just had time to read the [tor-dev] thread. +1 for this ticket.

comment:12 Changed 6 weeks ago by teor

The relevant constants in tor are:

NUM_SECS_BW_SUM_INTERVAL and maybe we will need to increase NUM_SECS_BW_SUM_IS_VALID to remember 2-4 days of daily bandwidth, rather than just one day:
https://gitweb.torproject.org/tor.git/tree/src/or/rephist.c#n1242

And maybe we should also change MAX_BANDWIDTH_CHANGE_FREQ, otherwise relays will report bandwidth spikes every 20 minutes in their descriptors. But we should be careful here, because this affects the bandwidth authority system. But it seems that at least an hour would be reasonable:
https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2516

comment:13 in reply to:  12 Changed 6 weeks ago by teor

arma and nickm and I had an irc conversation about this today:

Replying to teor:

The relevant constants in tor are:

NUM_SECS_BW_SUM_INTERVAL and maybe we will need to increase NUM_SECS_BW_SUM_IS_VALID to remember 2-4 days of daily bandwidth, rather than just one day:
https://gitweb.torproject.org/tor.git/tree/src/or/rephist.c#n1242

NUM_SECS_BW_SUM_INTERVAL 24 hours
/* similar to the 6 periods * 4 hours we had before */
NUM_SECS_BW_SUM_IS_VALID 5 days

And maybe we should also change MAX_BANDWIDTH_CHANGE_FREQ, otherwise relays will report bandwidth spikes every 20 minutes in their descriptors. But we should be careful here, because this affects the bandwidth authority system. But it seems that at least an hour would be reasonable:
https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2516

/* descriptor bandwidth changes propagate to clients between:

  • 10 minutes (upload right before vote, then client bootstrap from authority)
  • 3 hours (upload just after vote, then client fetch from mirror as late as possible)
  • after they are uploaded by relays. There's no point in reporting them much more often
  • than this, because the feedback loop only runs at this speed. */

MAX_BANDWIDTH_CHANGE_FREQ 3 hours

arma believes relays that already have enough measured bandwidth are able to share their bandwidths less often (that is, when they are Fast or Guard or Measured or something), or share them after a larger change. (That is, arma said that small relays should share it more often, and I inverted his logic and set a minimum instead.)

I agree, but I'm not sure how much more delay or how much more bandwidth we can allow. So let's take this change in 0.3.2, and do some analysis for 0.3.3. I split this part off into #24104.

comment:14 Changed 6 weeks ago by teor

I emailed tor-dev@ about this change, including a summary of my comment above:
https://lists.torproject.org/pipermail/tor-dev/2017-November/012538.html

comment:15 Changed 3 weeks ago by teor

Keywords: easy intro added
Status: newneeds_revision

The discussions on tor-dev and IRC were positive.

Someone needs to write a patch that changes the 3 constants listed above, and a changes file.
It's a nice easy patch, except maybe for the changes file.

comment:16 Changed 3 weeks ago by dgoulet

To summarize the above, we are talking about a patch that changes these to those values?:

+#define NUM_SECS_BW_SUM_IS_VALID (5*24*60*60) /* 5 days */
+#define NUM_SECS_BW_SUM_INTERVAL (24*60*60) /* 24 hours */
+#define MAX_BANDWIDTH_CHANGE_FREQ (3*60*60) /* 3 hours */

If so, please see branch: ticket23856_032_01

Last edited 3 weeks ago by dgoulet (previous) (diff)

comment:17 Changed 3 weeks ago by dgoulet

Status: needs_revisionneeds_review

comment:18 Changed 3 weeks ago by teor

Actual Points: 0.5
Status: needs_reviewmerge_ready
Type: defectenhancement

Yes, that's exactly the patch!

Let's merge it, and then metrics can update their side.

comment:19 Changed 2 weeks ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.1.x-final

Cherry-picked backwards as ticket23856_025_01.

Merged to 0.3.2.x and forward; marking for possible backport.

comment:20 Changed 43 hours ago by nickm

Milestone: Tor: 0.3.1.x-finalTor: 0.2.5.x-final
Resolution: implemented
Status: merge_readyclosed

Backported to 0.2.5 and forward.

Note: See TracTickets for help on using tickets.