Opened 5 years ago

Closed 5 years ago

#13000 closed enhancement (implemented)

Should relays wait with publishing a descriptor until they've run a bw self-test?

Reported by: Sebastian Owned by:
Priority: Medium Milestone: Tor: 0.2.6.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-auth tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Otherwise the first descriptor will be published using an observed bw of 0. This at least throws the bwauths off, but maybe more weird stuff happens? Is it useful at all to ever publish a descriptor with observed bw 0?

Child Tickets

Change History (15)

comment:1 Changed 5 years ago by arma

I don't think the current behavior is too bad. Seeing the first descriptor starts the directory authorities doing their reachability tests. And usually the relay publishes a second descriptor pretty soon after. And since directory authorities use the most recent one in their vote, they'll probably vote about the right one for the consensus.

So I guess the question is, how did your bwauth ever get the descriptor with bw 0 in it, if my above theory is right that typically those descriptors get replaced before the vote?

comment:2 Changed 5 years ago by Sebastian

descriptor publication dates according to gabelmoo's cached-descriptor file:

  • uploaded-at 2014-08-28 20:39:04
  • uploaded-at 2014-08-28 21:23:38
  • uploaded-at 2014-08-29 15:24:19

comment:3 in reply to:  2 Changed 5 years ago by arma

Replying to Sebastian:

descriptor publication dates according to gabelmoo's cached-descriptor file:

  • uploaded-at 2014-08-28 20:39:04
  • uploaded-at 2014-08-28 21:23:38
  • uploaded-at 2014-08-29 15:24:19

For posterity: these dates were a bit misleading because Sebastian changed his Tor version and restarted right before the second one.

comment:4 in reply to:  1 ; Changed 5 years ago by arma

Replying to arma:

I don't think the current behavior is too bad. Seeing the first descriptor starts the directory authorities doing their reachability tests. And usually the relay publishes a second descriptor pretty soon after.

This part turns out to have been wrong too:

/** How frequently will we republish our descriptor because of large (factor
 * of 2) shifts in estimated bandwidth? */
#define MAX_BANDWIDTH_CHANGE_FREQ (20*60)

So it'll wait 20 minutes to publish the second descriptor. We might want to speed that up.

One hack would be to special-case "my current descriptor says bw 0 but now I'm going to say non-zero" and make an exception to the MAX_BANDWIDTH_CHANGE_FREQ check.

Another fix might be to not actually publish the first descriptor if it's going to say 0, and then have some point in the code where we decide the bandwidth tests have both sent and received their cells (I think this is not easy to learn currently), and publish then.

Also, since relays write their bandwidth into their state file, and re-use it on restart, this case actually only matters for relays that just started out.

comment:5 in reply to:  4 Changed 5 years ago by arma

Replying to arma:

One hack would be to special-case "my current descriptor says bw 0 but now I'm going to say non-zero" and make an exception to the MAX_BANDWIDTH_CHANGE_FREQ check.

diff --git a/src/or/router.c b/src/or/router.c
index 2cdbb0c..c81ae62 100644
--- a/src/or/router.c
+++ b/src/or/router.c
@@ -2081,7 +2081,7 @@ check_descriptor_bandwidth_changed(time_t now)
   if ((prev != cur && (!prev || !cur)) ||
       cur > prev*2 ||
       cur < prev/2) {
-    if (last_changed+MAX_BANDWIDTH_CHANGE_FREQ < now) {
+    if (last_changed+MAX_BANDWIDTH_CHANGE_FREQ < now || !prev) {
       log_info(LD_GENERAL,
                "Measured bandwidth has changed; rebuilding descriptor.");
       mark_my_descriptor_dirty("bandwidth has changed");

comment:6 Changed 5 years ago by Sebastian

that seems like a good idea in any case

comment:7 Changed 5 years ago by Sebastian

(it still doesn't get rid of the race condition where a descriptor is included in the consensus with no observed bw. Should we just filter these like we filter hibernating relays?)

Like so:

diff --git a/src/or/dirserv.c b/src/or/dirserv.c
index 52258e8..3e94d33 100644
--- a/src/or/dirserv.c
+++ b/src/or/dirserv.c
@@ -1051,7 +1051,8 @@ format_versions_list(config_line_t *ln)
 }
 
 /** Return 1 if <b>ri</b>'s descriptor is "active" -- running, valid,
- * not hibernating, and not too old. Else return 0.
+ * not hibernating, having observed bw greater 0, and not too old. Else
+ * return 0.
  */
 static int
 router_is_active(const routerinfo_t *ri, const node_t *node, time_t now)
@@ -1061,6 +1062,8 @@ router_is_active(const routerinfo_t *ri, const node_t *node, time_t now)
     return 0;
   if (!node->is_running || !node->is_valid || ri->is_hibernating)
     return 0;
+  if (!ri->bandwidthcapacity)
+      return 0;
   return 1;
 }
Last edited 5 years ago by Sebastian (previous) (diff)

comment:8 Changed 5 years ago by Sebastian

hrm. That patch might need a new consensus method? Meh...

comment:9 Changed 5 years ago by Sebastian

arma points out it doesn't. I made a branch with both patches (bug13000 in my tor repo https://git.torproject.org/sebastian/tor.git)

comment:10 Changed 5 years ago by Sebastian

Status: newneeds_review

comment:11 Changed 5 years ago by nickm

Status: needs_reviewneeds_revision

re 0c1a4ad63901e89373290fb29de318e5ef0c6ab2 -- this potentially breaks the purpose of MAX_BANDWIDTH_CHANGE_FREQ by allowing arbitrarily rapid re-uploads, so long as measured bandwidth switches back and forth from zero a lot. Can we rate-limit this?

The other commit looks okay.

comment:12 Changed 5 years ago by Sebastian

Can our measurement ever be 0? Is that a useful value at all? Maybe we should just never publish a descriptor that says 0 at all. What do you think?

comment:13 Changed 5 years ago by Sebastian

Oh. Also I think your analysis is incorrect, because we can only do this once per 20-minute window. After that, the previous value will be non-zero, and we won't upload the new descriptor, right?

comment:14 Changed 5 years ago by Sebastian

Status: needs_revisionneeds_review

Added a comment as per discussion on IRC; branch bug13000_2.

comment:15 Changed 5 years ago by nickm

Keywords: tor-auth tor-relay added
Milestone: Tor: 0.2.6.x-final
Resolution: implemented
Status: needs_reviewclosed
Type: defectenhancement

How did cdfa37c279064a16efe2333340aa5da6cd4173b2 get in there? That looks like it's for a different ticket.

I've cherry-picked the other two commits to master.

Note: See TracTickets for help on using tickets.