Mikeperry noted that it looks like authorities are assigning the Running
flag if a relay has been reachable within the past however many minutes,
even if the relay published its latest descriptor with "hibernating 1"
in it.
Even if so, it isn't a big deal, since those latest descriptors will claim
a bandwidth of 0, and so clients will avoid them. But still, worth fixing
at some point.
[Automatically added by flyspray2trac: Operating System: All]
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
So the fix would be to alter the code that decides how to vote, and never vote "Running" on an server if you have a descriptor for it with is_hibernating set, and the descriptor is published more recently than last_reachable.
I believe the code that matters here is in dirserv_set_router_is_running. I was going to tag this "easy", but the fix is too trivial not to implement right now. See branch "bug911" in my public repository.
Trac: Description: Mikeperry noted that it looks like authorities are assigning the Running
flag if a relay has been reachable within the past however many minutes,
even if the relay published its latest descriptor with "hibernating 1"
in it.
Even if so, it isn't a big deal, since those latest descriptors will claim
a bandwidth of 0, and so clients will avoid them. But still, worth fixing
at some point.
[Automatically added by flyspray2trac: Operating System: All]
to
Mikeperry noted that it looks like authorities are assigning the Running
flag if a relay has been reachable within the past however many minutes,
even if the relay published its latest descriptor with "hibernating 1"
in it.
Even if so, it isn't a big deal, since those latest descriptors will claim
a bandwidth of 0, and so clients will avoid them. But still, worth fixing
at some point.
[Automatically added by flyspray2trac: Operating System: All] Status: new to needs_review
I wonder if we should special-case relays that are affected by bug #919 (moved) (detected by Tor version), because they might appear to be Running when they're not?
A) Should we put the AssumeReachable check above the router->is_hibernating check?
On the one hand, AssumeReachable is supposed to trump things like whether the relay is reachable. On the other hand, if it publishes a hibernating descriptor, it means it really doesn't want to be used. So I am ok with either way. Therefore I am ok with the current way.
B) Should we allow for any skew here? What if the router has a clock that's 5 minutes off, so it publishes a descriptor for 5 minutes in the future, and we test it three minutes in. In this edge case, we will continue to advertise it as Running for the full 45 minute period. Not the end of the world. On the other hand, if we check with a skew of 5 or 10 minutes, the only false positive is the case where we did the reachability test shortly before the hibernation descriptor showed up. Also not so bad. More generally, if the last descriptor we have from the router is a hibernation descriptor, what do we care about our reachability tests? That approach would resolve Sebastian's point about bug 919 too. Once we get a new descriptor that doesn't claim it's hibernating, we can resume checking then (see commit 4f307e038272).
C) The elses with two lines after them but no { } make me uncomfortable. I know one of them is a comment, so technically it's correct, but still. :)
Hm. The functionality of the patch is fine, but I think we need to come up with a better place to put it. router_add_to_routerlist() is already a big mess of checks, but at least it doesn't (iirc) launch things elsewhere in the code. Let's not spaghettify our control flow even further.
Pushed the minor fixes; trying to figure out the best place to put your patch above.
I think that the right thing to do is move the responsibility for launching reachability tests out of router_add_to_routerlist entirely; it is a much better fit for routerlist_descriptors_added(). See the most recent patch in my bug911 branch.
23fdf0b30fd9fdf has contradictory logic between its comment and its code.
I think we should go with what the code says.
Here's a new comment:
/** If we tested a router and found it reachable _at most this long_ after it * declared itself hibernating, it probably is still hibernating, but its * clock was skewed when it declared its hibernation time. */#define ALLOW_REACHABILITY_PUBLICATION_SKEW (60*60)
04:00 < nickm> armadev: still around? I think the comment is actually right.04:01 < nickm> The comment says IF "at least this long after" THEN "it is probably done hibernating [ and so up]"04:02 < nickm> The code does IF "no longer than this after" THEN "is is probably still hibernating [and so down]"04:02 < nickm> these are I think equivalent
Fixed the other issues and merged. Leaving open till you let me know whether I'm right or too sleepy. Please let me know whether to really change, or to close.