Tor (relay mode) should check once an hour if his fingerprint is included in the consensus and if that is not the case log a prominent error level entry telling the operator about the problem.
In the past I noticed such a log but apparently it is not done every hour.
I.e. relay dropped out of consensus >4 hours ago, but there is no log entry about it.
Is HeartbeatPeriod (default 6 hours) relevant for that?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
What do you think about having a more serious log entry for such "hard failures"? (relay dropped out of consensus)
I think it would be technically easy. For example using 'warning' instead:
--- a/src/or/status.c +++ b/src/or/status.c @@ -100,7 +100,7 @@ log_heartbeat(time_t now) return -1; /* Something stinks, we won't even attempt this. */ else if (!node_get_by_id(me->cache_info.identity_digest))- log_fn(LOG_NOTICE, LD_HEARTBEAT, "Heartbeat: It seems like we are not "+ log_fn(LOG_WARN, LD_HEARTBEAT, "Heartbeat: It seems like we are not " "in the cached consensus."); }
However I disagree about it being a 'hard failure', IMO is not even a failure, the system is working fine.
And wouldn't it make sense to check if we are in the consensus every hour regardless of HeartbeatPeriod? (is that check that expensive?)
Dunno.
Warning every 6 hours seems sufficient to me - relays only update their consensus every 1-3 hours, right?
It's also worth noting that the most common reason that relays aren't in the consensus - an unreachable ORPort or DirPort - is logged at warning level every 20 minutes.
Hm. Are there any helpful pieces of advice we can give in this case? It would be nice to tell the relay operators what to do about the problem.
Check your relay has bootstrapped.
Check your ORPort and DirPort are reachable externally.
Check your relay can reach a majority of directory authorities. (Or, rather, that a majority of directory authorities can do reachability tests on your relay.)
I think most of the cases where it's the relay operator's fault, there are other more useful and more frequent log messages.
And for the other cases, where there's a Tor bug, or the relay finds itself reachable but some of the directory authorities can't reach the relay, I don't think a log message is going to be a workable way to guide the operator into working through the issue.
(The original ticket title, to have a log line at severity err when a relay doesn't find itself in the consensus, is a non-starter: that's not what err level severity is for. See https://www.torproject.org/docs/faq#LogLevel )
These needs_revision, tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if somebody does the necessary revision.
Trac: Milestone: Tor: 0.3.4.x-final to Tor: unspecified