Opened 9 months ago

Last modified 5 days ago

#24104 needs_review defect

Delay descriptor bandwidth reporting on established relays

Reported by: teor Owned by: juga
Priority: Medium Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 034-backport-maybe, 033-backport-maybe, 032-backport-maybe, 029-backport-maybe, security-low, guard-discovery-stats, chutney-wants, bwauth-wants, 034-triage-20180328, 034-removed-20180328, tor-bwauth, 031-unreached-backport-maybe, 035-triaged-in-20180711
Cc: arma Actual Points:
Parent ID: #25925 Points: 1
Reviewer: ahf Sponsor:

Description

In #23856, we:

  • reduced the bandwidth stats interval from 4 hours to 24 hours, and
  • reduced urgent (2x change) descriptor bandwidth reports from every 20 minutes to every 3 hours.

But we think we can make descriptor bandwidth reports even slower on large relays, because they have less need to ramp up their bandwidth.

Here are our options:

  • don't report until the change is larger, for example, 4x
  • don't report for longer, for example, every 6 hours or 24 hours
  • delay the reporting of the *first* large change, as well as subsequent large changes

Here are the open questions:

  • tor traffic has a daily cycle, so do we need to report large bandwidth changes multiple times a day to cope with this? (I think the answer is "no", because small changes are already reported every 18 hours on the standard descriptor schedule, and that seems to work fine. And large changes are already reported once, then the delay is imposed.)

Child Tickets

TicketStatusOwnerSummaryComponent
#26301closedjugadir-spec: update descriptor on bandwidth changes only when uptime is less than a dayCore Tor/Tor

Change History (42)

comment:1 Changed 9 months ago by teor

Also, relays will re-post their descriptors when any important config items change. But I think that's out of scope for this ticket.

comment:2 Changed 9 months ago by arma

I am a fan of "publish a new descriptor for sufficiently-changed bandwidth estimates if your uptime is less than 24 hours, else assume that things are sufficiently established and the next regularly scheduled descriptor update will be enough".

comment:3 in reply to:  2 Changed 9 months ago by teor

Replying to arma:

I am a fan of "publish a new descriptor for sufficiently-changed bandwidth estimates if your uptime is less than 24 hours, else assume that things are sufficiently established and the next regularly scheduled descriptor update will be enough".

I can imagine two pathological cases here:

  • relays that reboot (or hibernate) every 24 hours, and
  • relays that hibernate or go offline, rather than just having started.

We'll need to account for both these cases, but they'll make the relay lose the guard flag, so we don't need to be so concerned about them.

comment:4 Changed 8 months ago by teor

Keywords: chutney-wants bwauth-wants added

If we code this feature so we can make bandwidth reporting faster on non-default networks, that would make it easier to test bandwidth authorities in chutney (see also #16386).

comment:5 Changed 6 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final
Type: defectenhancement

Label a bunch of (arguable and definite) enhancements as enhancements for 0.3.4.

comment:6 Changed 4 months ago by nickm

Keywords: 034-triage-20180328 added

comment:7 Changed 4 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:8 Changed 3 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

comment:9 in reply to:  2 Changed 3 months ago by teor

Replying to arma:

I am a fan of "publish a new descriptor for sufficiently-changed bandwidth estimates if your uptime is less than 24 hours, else assume that things are sufficiently established and the next regularly scheduled descriptor update will be enough".

This seems like a nice simple solution to the issue. Let's make it happen.

comment:10 Changed 3 months ago by juga

Cc: juga added

comment:11 Changed 3 months ago by juga

Parent ID: 25925

comment:12 Changed 3 months ago by juga

Parent ID: 25925#25925

comment:13 Changed 7 weeks ago by juga

Cc: juga removed
Owner: set to juga
Status: newassigned

Is the place to do this change check_descriptor_bandwidth_changed (https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2629)?

If i've the line https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2643 as:
if (get_uptime() < 60*60*24 && last_changed+MAX_BANDWIDTH_CHANGE_FREQ < now || !prev), then the descriptor would still be uploaded when there is not prev.

Should instead i do get_uptime() > 60*60*24 return; at the beginning of the function so that there are not any further checks when the relay is up for more than 1 day?

comment:14 in reply to:  13 Changed 7 weeks ago by teor

Replying to juga:

Is the place to do this change check_descriptor_bandwidth_changed (https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2629)?

If i've the line https://gitweb.torproject.org/tor.git/tree/src/or/router.c#n2643 as:
if (get_uptime() < 60*60*24 && last_changed+MAX_BANDWIDTH_CHANGE_FREQ < now || !prev), then the descriptor would still be uploaded when there is not prev.

Should instead i do get_uptime() > 60*60*24 return; at the beginning of the function so that there are not any further checks when the relay is up for more than 1 day?

Yes, but please make 60*60*24 a named #define with a comment. It should be like MAX_BANDWIDTH_CHANGE_FREQ.

comment:15 Changed 7 weeks ago by juga

Should i also make the 2 in cur > prev*2 and cur > prev/2 a named #define in case we want to change it to other number in the future?.

And in the end there is no need to check whether the relay is large here?

comment:16 in reply to:  15 Changed 7 weeks ago by teor

Replying to juga:

Should i also make the 2 in cur > prev*2 and cur > prev/2 a named #define in case we want to change it to other number in the future?.

Yes, that would be helpful.

And in the end there is no need to check whether the relay is large here?

No, we are just checking whether it has been up for a day.
See comment 2.

comment:17 Changed 7 weeks ago by juga

Keywords: tor-bwauth added
Status: assignedneeds_review

See my branch ticket20104.

test_router.c only contains 1 test and adding a test for this does not seem possible without adding more mocking functions, as get_uptime and bandwidthcapacity

Should we also change the dir-spec line https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n354 to say something like:

- Its uptime is less than 24h and bandwidth has changed...

comment:18 Changed 7 weeks ago by teor

Keywords: 034-backport-maybe 033-backport-maybe 032-backport-maybe 031-backport-maybe 029-backport-maybe security-low added
Milestone: Tor: unspecifiedTor: 0.3.5.x-final
Type: enhancementdefect
Version: Tor: unspecified

This is a low-severity security issue because bandwidth spike reporting enables guard discovery attacks.

We might want to backport it to 0.2.9, so please base the branch on 0.2.9.

comment:19 in reply to:  17 Changed 7 weeks ago by teor

Replying to juga:

See my branch ticket20104.

I will review it on GitHub.

Please base the branch on 0.2.9, we might want to backport it like #23856.

test_router.c only contains 1 test and adding a test for this does not seem possible without adding more mocking functions, as get_uptime and bandwidthcapacity

I think it's worth the extra time,

Should we also change the dir-spec line https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n354 to say something like:

- Its uptime is less than 24h and bandwidth has changed...

Yes, but we also need to say that old Tor versions ignored uptime. Please open a child ticket for the dir-spec change.

comment:20 Changed 7 weeks ago by teor

Status: needs_reviewneeds_revision

The patch looks good to me. Putting it in needs_revision for the rebase to 0.2.9 and the spec patch.

comment:21 Changed 7 weeks ago by juga

Status: needs_revisionneeds_review

Rebased on 0.2.9 in my branch bug20104_029.

comment:22 Changed 7 weeks ago by teor

Status: needs_reviewneeds_revision

Looks good to me, but it needs a changes file.

Please add the changes file, run "make check-changes", then put this ticket in merge_ready.

comment:23 Changed 7 weeks ago by juga

Status: needs_revisionmerge_ready

comment:24 Changed 6 weeks ago by juga

Status: merge_readyneeds_review

Back to needs_review because i added a test.
I don't know what is missing that makes router/check_descriptor_bandwidth_changed not found.

comment:25 Changed 6 weeks ago by asn

Reviewer: asn

comment:26 Changed 6 weeks ago by asn

Reviewer: asnahf

Swapping reviews with ahf.

comment:27 Changed 3 weeks ago by ahf

Status: needs_reviewneeds_revision

Minor issues that needs to be fixed. Commented in the PR.

comment:28 Changed 3 weeks ago by juga

Status: needs_revisionneeds_review

Fixed those issues

comment:29 Changed 3 weeks ago by ahf

Status: needs_reviewmerge_ready

LGTM.

This is now being tracked in https://github.com/torproject/tor/pull/184

comment:30 Changed 3 weeks ago by nickm

Cc: arma added
Status: merge_readyneeds_revision

One issue I spotted: the router tests won't actually run, since they aren't in the big list at the end of tests.c.

I'd also like Roger to look at this before we merge: this is a good idea, but it reduces the frequency with which we upload descriptors, and that has historically caused weird issues.

comment:31 Changed 3 weeks ago by arma

I continue to like the general idea of this patch.

Some code review suggestions:

In the changes file, it says "bigger than 24h" but I think we mean "smaller than 24h".

Juga, you might want to add a config line to your editor that complains to you about lines with trailing whitespace. There are some in test_router.c.

This is a lot of fixups (which is fine), so somebody might want to squash it before the merge.

The change as done here will mean we stop publishing a new descriptor when we go hibernating (see how check_descriptor_bandwidth_changed() checks we_are_hibernating() and sets cur to 0 if so). One simple fix would be for the function to still proceed if we_are_hibernating(), so we still let everybody know when we start hibernating. But another fix that I like a bit more would be to add a separate mark_my_descriptor_dirty() call when we decide to start hibernating... on more thought though, that should be a separate fix ("publish a new descriptor explicitly when we begin/end hibernating") that can be done separately from this fix and shouldn't delay it. So I resume liking the "proceed if we_are_hibernating()" idea as the simplest fix.

Speaking of hibernating, I also checked if this change makes us not publish a descriptor when we wake up from hibernating. I think it's ok, because see how hibernate_end() calls reset_uptime(), so now check_descriptor_bandwidth_changed() will be willing to proceed. All the more reason to open the "explicitly mark descriptor dirty when beginning/ending hibernation, rather than implicitly doing it as part of the bandwidth check" ticket.

We also reset_uptime(), and thus become willing to publish descriptors more readily for bandwidth changes, when our external IP address changes. I think that's ok, since it should not be an easy thing for an attacker to induce.

comment:32 in reply to:  31 Changed 3 weeks ago by arma

Replying to arma:

So I resume liking the "proceed if we_are_hibernating()" idea as the simplest fix.

To be clear (and to save you time), I was thinking something like

diff --git a/src/or/router.c b/src/or/router.c
index 4afba65..f1a9936 100644
--- a/src/or/router.c
+++ b/src/or/router.c
@@ -2440,17 +2440,18 @@ check_descriptor_bandwidth_changed(time_t now)
 {
   static time_t last_changed = 0;
   uint64_t prev, cur;
+  int hibernating = we_are_hibernating();
 
   /* If the relay uptime is bigger than MAX_UPTIME_BANDWIDTH_CHANGE,
    * the next regularly scheduled descriptor update (18h) will be enough */
-  if (get_uptime() > MAX_UPTIME_BANDWIDTH_CHANGE)
+  if (get_uptime() > MAX_UPTIME_BANDWIDTH_CHANGE && !hibernating)
     return;
 
   if (!router_get_my_routerinfo())
     return;
 
   prev = router_get_my_routerinfo()->bandwidthcapacity;
-  cur = we_are_hibernating() ? 0 : rep_hist_bandwidth_assess();
+  cur = hibernating ? 0 : rep_hist_bandwidth_assess();
   if ((prev != cur && (!prev || !cur)) ||
       cur > (prev * BANDWIDTH_CHANGE_FACTOR) ||
       cur < (prev / BANDWIDTH_CHANGE_FACTOR) ) {

And then don't forget to open the ticket for the follow-on (but can be handled separately) work. :)

comment:33 Changed 3 weeks ago by arma

Summary: Delay descriptor bandwidth reporting on large relaysDelay descriptor bandwidth reporting on established relays

(changing ticket title to reflect the new plan)

comment:34 in reply to:  24 Changed 3 weeks ago by juga

Replying to juga:

I don't know what is missing that makes router/check_descriptor_bandwidth_changed not found.

router_tests was missing in test.h

comment:35 in reply to:  30 Changed 3 weeks ago by juga

Replying to nickm:

One issue I spotted: the router tests won't actually run, since they aren't in the big list at the end of tests.c.

What do you mean by this?, maybe it has to do with my previous comment?. After that change, test-suite.log contains: router/check_descriptor_bandwidth_changed: [forking] OK

comment:36 Changed 3 weeks ago by juga

Status: needs_revisionneeds_review

Thanks arma, i've made the changes you suggested. See the new commits.

comment:37 Changed 3 weeks ago by teor

Keywords: 031-unreached-backport added

0.3.1 is end of life, there are no more backports.
Tagging with 031-unreached-backport instead.

comment:38 Changed 3 weeks ago by teor

Keywords: 031-unreached-backport-maybe added; 031-backport-maybe removed

0.3.1 is end of life, there are no more 0.3.1 maybe backports.
Tagging with 031-unreached-backport-maybe.

comment:39 Changed 3 weeks ago by teor

Keywords: 031-unreached-backport removed

Keywords does substrings in search, but not replace.

comment:40 in reply to:  31 Changed 11 days ago by arma

Replying to arma:

All the more reason to open the "explicitly mark descriptor dirty when beginning/ending hibernation, rather than implicitly doing it as part of the bandwidth check" ticket.

I opened #26713 for this idea.

comment:41 Changed 9 days ago by nickm

Keywords: 035-triaged-in-20180711 added

comment:42 Changed 5 days ago by teor

Version: Tor: unspecified
Note: See TracTickets for help on using tickets.