log error level messages if relay (self) is not in consensus

changed milestone to %Tor: unspecified

added 034-removed-20180328 034-triage-20180328 component::core tor/tor easy milestone::Tor: unspecified points::1 priority::medium severity::normal status::needs-revision type::enhancement labels

Replying to cypherpunks:

Is HeartbeatPeriod (default 6 hours) relevant for that? Yes. The log severity is 'notice'. The manual is very clear.

What do you think about having a more serious log entry for such "hard failures"? (relay dropped out of consensus)

And wouldn't it make sense to check if we are in the consensus every hour regardless of HeartbeatPeriod? (is that check that expensive?)

I'll try to reduce the HeartbeatPeriod to 30minutes to see if that causes a log entry.

Replying to cypherpunks:

What do you think about having a more serious log entry for such "hard failures"? (relay dropped out of consensus) I think it would be technically easy. For example using 'warning' instead:

--- a/src/or/status.c	
+++ b/src/or/status.c	
@@ -100,7 +100,7 @@ log_heartbeat(time_t now)
       return -1; /* Something stinks, we won't even attempt this. */
     else
       if (!node_get_by_id(me->cache_info.identity_digest))
-        log_fn(LOG_NOTICE, LD_HEARTBEAT, "Heartbeat: It seems like we are not "
+        log_fn(LOG_WARN, LD_HEARTBEAT, "Heartbeat: It seems like we are not "
                "in the cached consensus.");
   }

However I disagree about it being a 'hard failure', IMO is not even a failure, the system is working fine.

And wouldn't it make sense to check if we are in the consensus every hour regardless of HeartbeatPeriod? (is that check that expensive?) Dunno.

Trac:
Milestone: N/A to Tor: 0.2.9.x-final
Status: new to needs_review
Keywords: N/A deleted, easy added

Trac:
Keywords: N/A deleted, review-group-8 added

Warning every 6 hours seems sufficient to me - relays only update their consensus every 1-3 hours, right?

It's also worth noting that the most common reason that relays aren't in the consensus - an unreachable ORPort or DirPort - is logged at warning level every 20 minutes.

Trac:
Status: needs_review to merge_ready

Hm. Are there any helpful pieces of advice we can give in this case? It would be nice to tell the relay operators what to do about the problem.

Replying to nickm:

Hm. Are there any helpful pieces of advice we can give in this case? It would be nice to tell the relay operators what to do about the problem.

Check your relay has bootstrapped. Check your ORPort and DirPort are reachable externally. Check your relay can reach a majority of directory authorities. (Or, rather, that a majority of directory authorities can do reachability tests on your relay.)

Okay, then there are actually 3 changes I need here:

harder:

We should only give this as a warning if we would expect that we would be listed. We would not expect to be listed if:
- We stopped hibernating, or started running, so recently that we haven't had a chance to upload a new descriptor to all the authorities.
- The consensus we have is not recent enough that we'd expect any uploads of ours to have taken effect.

easy:

the message should explain what to do, even if it's only "look in your logs for other messages that might explain why"
a changes file

Trac:
Status: merge_ready to needs_revision

Trac:
Keywords: review-group-8 deleted, review-group-9 added

Moving not-reviewed-by-me tickets in review-group-9, and for-0.2.9/0.2.8 tickets, to review-group-10.

Trac:
Keywords: review-group-9 deleted, review-group-10 added

I am fairly sure that these are neither regressions nor major problems. So, deferring from 0.2.9. Please let me know if I'm wrong.

Trac:
Keywords: N/A deleted, nickm-deferred-20161017 added
Milestone: Tor: 0.2.9.x-final to Tor: 0.3.0.x-final

Trac:
Keywords: review-group-10 deleted, N/A added

Trac:
Points: N/A to 1

Trac:
Milestone: Tor: 0.3.0.x-final to Tor: 0.3.1.x-final

I'm not sure what the goal of this ticket is.

I think most of the cases where it's the relay operator's fault, there are other more useful and more frequent log messages.

And for the other cases, where there's a Tor bug, or the relay finds itself reachable but some of the directory authorities can't reach the relay, I don't think a log message is going to be a workable way to guide the operator into working through the issue.

(The original ticket title, to have a log line at severity err when a relay doesn't find itself in the consensus, is a non-starter: that's not what err level severity is for. See https://www.torproject.org/docs/faq#LogLevel )

Trac:
Milestone: Tor: 0.3.1.x-final to Tor: 0.3.2.x-final

Trac:
Keywords: nickm-deferred-20161017 deleted, N/A added

Defer all needs_revision non-spec enhancements to 0.3.3.

Trac:
Milestone: Tor: 0.3.2.x-final to Tor: 0.3.3.x-final

Mark a lot of assigned/needs_revision tickets as 0.3.4. If you think this should happen in 0.3.3 instead, just let me know?

Trac:
Milestone: Tor: 0.3.3.x-final to Tor: 0.3.4.x-final

Trac:
Keywords: N/A deleted, 034-triage-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

Trac:
Keywords: N/A deleted, 034-removed-20180328 added

These needs_revision, tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if somebody does the necessary revision.

Trac:
Milestone: Tor: 0.3.4.x-final to Tor: unspecified

changed time estimate to 8h

moved to tpo/core/tor#18988 (closed)

log error level messages if relay (self) is not in consensus

Child items 0

Activity