Opened 13 months ago

Closed 5 days ago

#24104 closed defect (fixed)

Delay descriptor bandwidth reporting on established relays

Reported by: teor Owned by: juga
Priority: Medium Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 034-backport-maybe, 033-backport-maybe, 029-backport-maybe, security-low, guard-discovery-stats, chutney-wants, bwauth-wants, 034-triage-20180328, 034-removed-20180328, tor-bwauth, 031-unreached-backport-maybe, 035-triaged-in-20180711, 032-unreached-backport-maybe
Cc: arma Actual Points:
Parent ID: #25925 Points: 1
Reviewer: ahf Sponsor:

Description

In #23856, we:

  • reduced the bandwidth stats interval from 4 hours to 24 hours, and
  • reduced urgent (2x change) descriptor bandwidth reports from every 20 minutes to every 3 hours.

But we think we can make descriptor bandwidth reports even slower on large relays, because they have less need to ramp up their bandwidth.

Here are our options:

  • don't report until the change is larger, for example, 4x
  • don't report for longer, for example, every 6 hours or 24 hours
  • delay the reporting of the *first* large change, as well as subsequent large changes

Here are the open questions:

  • tor traffic has a daily cycle, so do we need to report large bandwidth changes multiple times a day to cope with this? (I think the answer is "no", because small changes are already reported every 18 hours on the standard descriptor schedule, and that seems to work fine. And large changes are already reported once, then the delay is imposed.)

Child Tickets

TicketStatusOwnerSummaryComponent
#26301closedjugadir-spec: update descriptor on bandwidth changes only when uptime is less than a dayCore Tor/Tor

Change History (54)

comment:1 Changed 13 months ago by teor

Also, relays will re-post their descriptors when any important config items change. But I think that's out of scope for this ticket.

comment:2 Changed 13 months ago by arma

I am a fan of "publish a new descriptor for sufficiently-changed bandwidth estimates if your uptime is less than 24 hours, else assume that things are sufficiently established and the next regularly scheduled descriptor update will be enough".

comment:3 in reply to:  2 Changed 13 months ago by teor

Replying to arma:

I am a fan of "publish a new descriptor for sufficiently-changed bandwidth estimates if your uptime is less than 24 hours, else assume that things are sufficiently established and the next regularly scheduled descriptor update will be enough".

I can imagine two pathological cases here:

  • relays that reboot (or hibernate) every 24 hours, and
  • relays that hibernate or go offline, rather than just having started.

We'll need to account for both these cases, but they'll make the relay lose the guard flag, so we don't need to be so concerned about them.

comment:4 Changed 12 months ago by teor

Keywords: chutney-wants bwauth-wants added

If we code this feature so we can make bandwidth reporting faster on non-default networks, that would make it easier to test bandwidth authorities in chutney (see also #16386).

comment:5 Changed 10 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final
Type: defectenhancement

Label a bunch of (arguable and definite) enhancements as enhancements for 0.3.4.

comment:6 Changed 8 months ago by nickm

Keywords: 034-triage-20180328 added

comment:7 Changed 8 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:8 Changed 8 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

comment:9 in reply to:  2 Changed 7 months ago by teor

Replying to arma:

I am a fan of "publish a new descriptor for sufficiently-changed bandwidth estimates if your uptime is less than 24 hours, else assume that things are sufficiently established and the next regularly scheduled descriptor update will be enough".

This seems like a nice simple solution to the issue. Let's make it happen.

comment:10 Changed 7 months ago by juga

Cc: juga added

comment:11 Changed 7 months ago by juga

Parent ID: 25925

comment:12 Changed 7 months ago by juga

Parent ID: 25925#25925

comment:13 Changed 6 months ago by juga

Cc: juga removed
Owner: set to juga
Status: newassigned

Is the place to do this change check_descriptor_bandwidth_changed (https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2629)?

If i've the line https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2643 as:
if (get_uptime() < 60*60*24 && last_changed+MAX_BANDWIDTH_CHANGE_FREQ < now || !prev), then the descriptor would still be uploaded when there is not prev.

Should instead i do get_uptime() > 60*60*24 return; at the beginning of the function so that there are not any further checks when the relay is up for more than 1 day?

comment:14 in reply to:  13 Changed 6 months ago by teor

Replying to juga:

Is the place to do this change check_descriptor_bandwidth_changed (https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2629)?

If i've the line https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2643 as:
if (get_uptime() < 60*60*24 && last_changed+MAX_BANDWIDTH_CHANGE_FREQ < now || !prev), then the descriptor would still be uploaded when there is not prev.

Should instead i do get_uptime() > 60*60*24 return; at the beginning of the function so that there are not any further checks when the relay is up for more than 1 day?

Yes, but please make 60*60*24 a named #define with a comment. It should be like MAX_BANDWIDTH_CHANGE_FREQ.

comment:15 Changed 6 months ago by juga

Should i also make the 2 in cur > prev*2 and cur > prev/2 a named #define in case we want to change it to other number in the future?.

And in the end there is no need to check whether the relay is large here?

comment:16 in reply to:  15 Changed 6 months ago by teor

Replying to juga:

Should i also make the 2 in cur > prev*2 and cur > prev/2 a named #define in case we want to change it to other number in the future?.

Yes, that would be helpful.

And in the end there is no need to check whether the relay is large here?

No, we are just checking whether it has been up for a day.
See comment 2.

comment:17 Changed 6 months ago by juga

Keywords: tor-bwauth added
Status: assignedneeds_review

See my branch ticket20104.

test_router.c only contains 1 test and adding a test for this does not seem possible without adding more mocking functions, as get_uptime and bandwidthcapacity

Should we also change the dir-spec line https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n354 to say something like:

- Its uptime is less than 24h and bandwidth has changed...

comment:18 Changed 6 months ago by teor

Keywords: 034-backport-maybe 033-backport-maybe 032-backport-maybe 031-backport-maybe 029-backport-maybe security-low added
Milestone: Tor: unspecifiedTor: 0.3.5.x-final
Type: enhancementdefect
Version: Tor: unspecified

This is a low-severity security issue because bandwidth spike reporting enables guard discovery attacks.

We might want to backport it to 0.2.9, so please base the branch on 0.2.9.

comment:19 in reply to:  17 Changed 6 months ago by teor

Replying to juga:

See my branch ticket20104.

I will review it on GitHub.

Please base the branch on 0.2.9, we might want to backport it like #23856.

test_router.c only contains 1 test and adding a test for this does not seem possible without adding more mocking functions, as get_uptime and bandwidthcapacity

I think it's worth the extra time,

Should we also change the dir-spec line https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n354 to say something like:

- Its uptime is less than 24h and bandwidth has changed...

Yes, but we also need to say that old Tor versions ignored uptime. Please open a child ticket for the dir-spec change.

comment:20 Changed 6 months ago by teor

Status: needs_reviewneeds_revision

The patch looks good to me. Putting it in needs_revision for the rebase to 0.2.9 and the spec patch.

comment:21 Changed 6 months ago by juga

Status: needs_revisionneeds_review

Rebased on 0.2.9 in my branch bug20104_029.

comment:22 Changed 6 months ago by teor

Status: needs_reviewneeds_revision

Looks good to me, but it needs a changes file.

Please add the changes file, run "make check-changes", then put this ticket in merge_ready.

comment:23 Changed 6 months ago by juga

Status: needs_revisionmerge_ready

comment:24 Changed 5 months ago by juga

Status: merge_readyneeds_review

Back to needs_review because i added a test.
I don't know what is missing that makes router/check_descriptor_bandwidth_changed not found.

comment:25 Changed 5 months ago by asn

Reviewer: asn

comment:26 Changed 5 months ago by asn

Reviewer: asnahf

Swapping reviews with ahf.

comment:27 Changed 5 months ago by ahf

Status: needs_reviewneeds_revision

Minor issues that needs to be fixed. Commented in the PR.

comment:28 Changed 5 months ago by juga

Status: needs_revisionneeds_review

Fixed those issues

comment:29 Changed 5 months ago by ahf

Status: needs_reviewmerge_ready

LGTM.

This is now being tracked in https://github.com/torproject/tor/pull/184

comment:30 Changed 5 months ago by nickm

Cc: arma added
Status: merge_readyneeds_revision

One issue I spotted: the router tests won't actually run, since they aren't in the big list at the end of tests.c.

I'd also like Roger to look at this before we merge: this is a good idea, but it reduces the frequency with which we upload descriptors, and that has historically caused weird issues.

comment:31 Changed 5 months ago by arma

I continue to like the general idea of this patch.

Some code review suggestions:

In the changes file, it says "bigger than 24h" but I think we mean "smaller than 24h".

Juga, you might want to add a config line to your editor that complains to you about lines with trailing whitespace. There are some in test_router.c.

This is a lot of fixups (which is fine), so somebody might want to squash it before the merge.

The change as done here will mean we stop publishing a new descriptor when we go hibernating (see how check_descriptor_bandwidth_changed() checks we_are_hibernating() and sets cur to 0 if so). One simple fix would be for the function to still proceed if we_are_hibernating(), so we still let everybody know when we start hibernating. But another fix that I like a bit more would be to add a separate mark_my_descriptor_dirty() call when we decide to start hibernating... on more thought though, that should be a separate fix ("publish a new descriptor explicitly when we begin/end hibernating") that can be done separately from this fix and shouldn't delay it. So I resume liking the "proceed if we_are_hibernating()" idea as the simplest fix.

Speaking of hibernating, I also checked if this change makes us not publish a descriptor when we wake up from hibernating. I think it's ok, because see how hibernate_end() calls reset_uptime(), so now check_descriptor_bandwidth_changed() will be willing to proceed. All the more reason to open the "explicitly mark descriptor dirty when beginning/ending hibernation, rather than implicitly doing it as part of the bandwidth check" ticket.

We also reset_uptime(), and thus become willing to publish descriptors more readily for bandwidth changes, when our external IP address changes. I think that's ok, since it should not be an easy thing for an attacker to induce.

comment:32 in reply to:  31 Changed 5 months ago by arma

Replying to arma:

So I resume liking the "proceed if we_are_hibernating()" idea as the simplest fix.

To be clear (and to save you time), I was thinking something like

diff --git a/src/or/router.c b/src/or/router.c
index 4afba65..f1a9936 100644
--- a/src/or/router.c
+++ b/src/or/router.c
@@ -2440,17 +2440,18 @@ check_descriptor_bandwidth_changed(time_t now)
 {
   static time_t last_changed = 0;
   uint64_t prev, cur;
+  int hibernating = we_are_hibernating();
 
   /* If the relay uptime is bigger than MAX_UPTIME_BANDWIDTH_CHANGE,
    * the next regularly scheduled descriptor update (18h) will be enough */
-  if (get_uptime() > MAX_UPTIME_BANDWIDTH_CHANGE)
+  if (get_uptime() > MAX_UPTIME_BANDWIDTH_CHANGE && !hibernating)
     return;
 
   if (!router_get_my_routerinfo())
     return;
 
   prev = router_get_my_routerinfo()->bandwidthcapacity;
-  cur = we_are_hibernating() ? 0 : rep_hist_bandwidth_assess();
+  cur = hibernating ? 0 : rep_hist_bandwidth_assess();
   if ((prev != cur && (!prev || !cur)) ||
       cur > (prev * BANDWIDTH_CHANGE_FACTOR) ||
       cur < (prev / BANDWIDTH_CHANGE_FACTOR) ) {

And then don't forget to open the ticket for the follow-on (but can be handled separately) work. :)

comment:33 Changed 5 months ago by arma

Summary: Delay descriptor bandwidth reporting on large relaysDelay descriptor bandwidth reporting on established relays

(changing ticket title to reflect the new plan)

comment:34 in reply to:  24 Changed 5 months ago by juga

Replying to juga:

I don't know what is missing that makes router/check_descriptor_bandwidth_changed not found.

router_tests was missing in test.h

comment:35 in reply to:  30 Changed 5 months ago by juga

Replying to nickm:

One issue I spotted: the router tests won't actually run, since they aren't in the big list at the end of tests.c.

What do you mean by this?, maybe it has to do with my previous comment?. After that change, test-suite.log contains: router/check_descriptor_bandwidth_changed: [forking] OK

comment:36 Changed 5 months ago by juga

Status: needs_revisionneeds_review

Thanks arma, i've made the changes you suggested. See the new commits.

comment:37 Changed 5 months ago by teor

Keywords: 031-unreached-backport added

0.3.1 is end of life, there are no more backports.
Tagging with 031-unreached-backport instead.

comment:38 Changed 5 months ago by teor

Keywords: 031-unreached-backport-maybe added; 031-backport-maybe removed

0.3.1 is end of life, there are no more 0.3.1 maybe backports.
Tagging with 031-unreached-backport-maybe.

comment:39 Changed 5 months ago by teor

Keywords: 031-unreached-backport removed

Keywords does substrings in search, but not replace.

comment:40 in reply to:  31 Changed 4 months ago by arma

Replying to arma:

All the more reason to open the "explicitly mark descriptor dirty when beginning/ending hibernation, rather than implicitly doing it as part of the bandwidth check" ticket.

I opened #26713 for this idea.

comment:41 Changed 4 months ago by nickm

Keywords: 035-triaged-in-20180711 added

comment:42 Changed 4 months ago by teor

Version: Tor: unspecified

comment:43 Changed 3 months ago by ahf

Status: needs_reviewmerge_ready

LGTM.

comment:44 Changed 3 months ago by nickm

arma, you okay with this now? I'm planning to merge it soon unless somebody objects

comment:45 Changed 3 months ago by arma

i haven't looked, and have lost track of how to look. i say if ahf likes it, go for it.

comment:46 Changed 3 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

I've squashed this into a branch called bug24104_029_squashed, and merged that branch to master. Marking for possible backport.

comment:47 in reply to:  46 Changed 3 months ago by cypherpunks3

Replying to nickm:

I've squashed this into a branch called bug24104_029_squashed, and merged that branch to master.

Nothing's been pushed to the master branch in the repo yet.

comment:48 Changed 3 months ago by teor

Milestone: Tor: 0.3.4.x-finalTor: 0.3.5.x-final

comment:49 Changed 3 months ago by nickm

whoops, you're right. Looks like I didn't finish resolving conflicts there. Trying again...

comment:50 Changed 3 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

There; merged. Marking for possible backport, again.

comment:51 Changed 6 days ago by teor

Keywords: 032-unreached-backport added

0.3.2 is end of life, so 032-backport is now 032-unreached-backport.

comment:52 Changed 6 days ago by teor

Keywords: 032-unreached-backport-maybe added; 032-backport-maybe removed

Tag 032-backport-maybe with 032-unreached-backport-maybe

comment:53 Changed 6 days ago by teor

Keywords: 032-unreached-backport removed

Remove redundant 032-unreached-backport on 032-unreached-backport-maybe

comment:54 Changed 5 days ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: 0.2.9.x-final
Resolution: fixed
Status: merge_readyclosed

merged to 0.2.9 and forward. There were some small conflicts, but they appeared to be pretty straightforward.

Note: See TracTickets for help on using tickets.