Opened 11 years ago

Closed 9 years ago

Last modified 7 years ago

#911 closed defect (fixed)

Authorities assign Running flag to hibernating relays?

Reported by: arma Owned by:
Priority: Low Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: 0.2.0.31
Severity: Keywords:
Cc: arma, nickm, Sebastian Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

Mikeperry noted that it looks like authorities are assigning the Running
flag if a relay has been reachable within the past however many minutes,
even if the relay published its latest descriptor with "hibernating 1"
in it.

Even if so, it isn't a big deal, since those latest descriptors will claim
a bandwidth of 0, and so clients will avoid them. But still, worth fixing
at some point.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (23)

comment:1 Changed 9 years ago by nickm

Milestone: post 0.2.1.xTor: 0.2.2.x-final

comment:2 Changed 9 years ago by nickm

Description: modified (diff)
Status: newneeds_review

So the fix would be to alter the code that decides how to vote, and never vote "Running" on an server if you have a descriptor for it with is_hibernating set, and the descriptor is published more recently than last_reachable.

I believe the code that matters here is in dirserv_set_router_is_running. I was going to tag this "easy", but the fix is too trivial not to implement right now. See branch "bug911" in my public repository.

comment:3 Changed 9 years ago by Sebastian

Bugfix on xxx and fixes bug 911 missing from the changes entry. Looks good otherwise.

comment:4 Changed 9 years ago by Sebastian

I wonder if we should special-case relays that are affected by bug #919 (detected by Tor version), because they might appear to be Running when they're not?

comment:5 Changed 9 years ago by nickm

Branch updated.

comment:6 Changed 9 years ago by arma

A) Should we put the AssumeReachable check above the router->is_hibernating check?
On the one hand, AssumeReachable is supposed to trump things like whether the relay is reachable. On the other hand, if it publishes a hibernating descriptor, it means it really doesn't want to be used. So I am ok with either way. Therefore I am ok with the current way.

B) Should we allow for any skew here? What if the router has a clock that's 5 minutes off, so it publishes a descriptor for 5 minutes in the future, and we test it three minutes in. In this edge case, we will continue to advertise it as Running for the full 45 minute period. Not the end of the world. On the other hand, if we check with a skew of 5 or 10 minutes, the only false positive is the case where we did the reachability test shortly before the hibernation descriptor showed up. Also not so bad. More generally, if the last descriptor we have from the router is a hibernation descriptor, what do we care about our reachability tests? That approach would resolve Sebastian's point about bug 919 too. Once we get a new descriptor that doesn't claim it's hibernating, we can resume checking then (see commit 4f307e038272).

C) The elses with two lines after them but no { } make me uncomfortable. I know one of them is a comment, so technically it's correct, but still. :)

comment:7 Changed 9 years ago by arma

Speaking of commit 4f307e038272, here's another patch we probably want here:

diff --git a/src/or/routerlist.c b/src/or/routerlist.c
index 8808f56..9a45431 100644
--- a/src/or/routerlist.c
+++ b/src/or/routerlist.c
@@ -3257,6 +3257,11 @@ router_add_to_routerlist(routerinfo_t *router, const char

router->last_reachable = old_router->last_reachable;
router->testing_since = old_router->testing_since;

}

+ if (authdir && !from_cache &&
+ old_router->is_hibernating && !router->is_hibernating) {
+ /* do a new reachability test right now */
+ dirserv_single_reachability_test(time(NULL), router);
+ }

routerlist_replace(routerlist, old_router, router);
if (!from_cache) {

signed_desc_append_to_journal(&router->cache_info,

comment:8 Changed 9 years ago by nickm

Hm. The functionality of the patch is fine, but I think we need to come up with a better place to put it. router_add_to_routerlist() is already a big mess of checks, but at least it doesn't (iirc) launch things elsewhere in the code. Let's not spaghettify our control flow even further.

comment:9 Changed 9 years ago by nickm

Pushed the minor fixes; trying to figure out the best place to put your patch above.

I think that the right thing to do is move the responsibility for launching reachability tests out of router_add_to_routerlist entirely; it is a much better fit for routerlist_descriptors_added(). See the most recent patch in my bug911 branch.

comment:10 Changed 9 years ago by Sebastian

in dirserv_should_launch_reachability_test() shouldn't we test for difference in port or ip, and do the test immediately?

comment:11 Changed 9 years ago by nickm

Arguably, but that's a separate issue from this bug, right?

comment:12 Changed 9 years ago by Sebastian

Yes, but I noticed it when looking at your patch

comment:13 Changed 9 years ago by nickm

Added it as #1899 .

comment:14 Changed 9 years ago by arma

23fdf0b30fd9fdf has contradictory logic between its comment and its code.

I think we should go with what the code says.

Here's a new comment:

/** If we tested a router and found it reachable _at most this long_ after it
 * declared itself hibernating, it probably is still hibernating, but its
 * clock was skewed when it declared its hibernation time. */
#define ALLOW_REACHABILITY_PUBLICATION_SKEW (60*60)

comment:15 in reply to:  14 Changed 9 years ago by arma

Replying to arma:

23fdf0b30fd9fdf has contradictory logic between its comment and its code.

I think we should go with what the code says.

We might also want to pick a better word than ALLOW.

Maybe HIBERNATION_PUBLICATION_SKEW ?

Lots of options.

comment:16 Changed 9 years ago by arma

"we should check wither this router"

comment:17 Changed 9 years ago by arma

+  if (!authdir_mode_handles_descs(get_options(), ri->purpose))
+      return 0;

indenting problem

comment:18 Changed 9 years ago by nickm

Other than that, looks good?

comment:19 Changed 9 years ago by arma

Yes, looks good.

I spent a while looking at routers_update_status_from_consensus_networkstatus() to try to guess if

+      routerinfo_t *old_router =
+        router_get_by_digest(router->cache_info.identity_digest);

will actually do what you want there, but I now believe it will.

comment:20 Changed 9 years ago by nickm

04:00 < nickm> armadev: still around?  I think the comment is actually right.
04:01 < nickm> The comment says IF "at least this long after" THEN "it is 
               probably done hibernating [ and so up]"
04:02 < nickm> The code does IF "no longer than this after" THEN "is is 
               probably still hibernating [and so down]"
04:02 < nickm> these are I think equivalent

Fixed the other issues and merged. Leaving open till you let me know whether I'm right or too sleepy. Please let me know whether to really change, or to close.

comment:21 Changed 9 years ago by arma

Guh. I hate time comparisons.

I just spent a while looking at it. I no longer believe that it is wrong.

That's probably the best you're going to get from me. :) Please do what you think is smart.

comment:22 Changed 9 years ago by nickm

Resolution: Nonefixed
Status: needs_reviewclosed

I'll close the bug. Closing the bug seems smart. :)

comment:23 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.