Opened 3 years ago

Last modified 8 months ago

#18988 needs_revision enhancement

log error level messages if relay (self) is not in consensus

Reported by: cypherpunks Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: easy, 034-triage-20180328, 034-removed-20180328
Cc: Actual Points:
Parent ID: Points: 1
Reviewer: Sponsor:

Description

Tor (relay mode) should check once an hour if his fingerprint is included in the consensus and if that is not the case log a prominent error level entry telling the operator about the problem.

In the past I noticed such a log but apparently it is not done every hour.

I.e. relay dropped out of consensus >4 hours ago, but there is no log entry about it.

Is HeartbeatPeriod (default 6 hours) relevant for that?

Child Tickets

Change History (24)

comment:1 in reply to:  description Changed 3 years ago by cypherpunks

Replying to cypherpunks:

Is HeartbeatPeriod (default 6 hours) relevant for that?

Yes. The log severity is 'notice'. The manual is very clear.

comment:2 Changed 3 years ago by cypherpunks

What do you think about having a more serious log entry for such "hard failures"? (relay dropped out of consensus)

And wouldn't it make sense to check if we are in the consensus every hour regardless of HeartbeatPeriod? (is that check that expensive?)

I'll try to reduce the HeartbeatPeriod to 30minutes to see if that causes a log entry.

Last edited 3 years ago by cypherpunks (previous) (diff)

comment:3 in reply to:  2 Changed 3 years ago by cypherpunks

Replying to cypherpunks:

What do you think about having a more serious log entry for such "hard failures"? (relay dropped out of consensus)

I think it would be technically easy. For example using 'warning' instead:

--- a/src/or/status.c	
+++ b/src/or/status.c	
@@ -100,7 +100,7 @@ log_heartbeat(time_t now)
       return -1; /* Something stinks, we won't even attempt this. */
     else
       if (!node_get_by_id(me->cache_info.identity_digest))
-        log_fn(LOG_NOTICE, LD_HEARTBEAT, "Heartbeat: It seems like we are not "
+        log_fn(LOG_WARN, LD_HEARTBEAT, "Heartbeat: It seems like we are not "
                "in the cached consensus.");
   }
 

However I disagree about it being a 'hard failure', IMO is not even a failure, the system is working fine.

And wouldn't it make sense to check if we are in the consensus every hour regardless of HeartbeatPeriod? (is that check that expensive?)

Dunno.

comment:4 Changed 2 years ago by nickm

Keywords: easy added
Milestone: Tor: 0.2.9.x-final
Status: newneeds_review

comment:5 Changed 2 years ago by nickm

Keywords: review-group-8 added

comment:6 Changed 2 years ago by teor

Status: needs_reviewmerge_ready

Warning every 6 hours seems sufficient to me - relays only update their consensus every 1-3 hours, right?

It's also worth noting that the most common reason that relays aren't in the consensus - an unreachable ORPort or DirPort - is logged at warning level every 20 minutes.

comment:7 Changed 2 years ago by nickm

Hm. Are there any helpful pieces of advice we can give in this case? It would be nice to tell the relay operators what to do about the problem.

comment:8 in reply to:  7 Changed 2 years ago by teor

Replying to nickm:

Hm. Are there any helpful pieces of advice we can give in this case? It would be nice to tell the relay operators what to do about the problem.

Check your relay has bootstrapped.
Check your ORPort and DirPort are reachable externally.
Check your relay can reach a majority of directory authorities. (Or, rather, that a majority of directory authorities can do reachability tests on your relay.)

comment:9 Changed 2 years ago by nickm

Status: merge_readyneeds_revision

Okay, then there are actually 3 changes I need here:

harder:

  • We should only give this as a warning if we would expect that we would be listed. We would not expect to be listed if:
    • We stopped hibernating, or started running, so recently that we haven't had a chance to upload a new descriptor to all the authorities.
    • The consensus we have is not recent enough that we'd expect any uploads of ours to have taken effect.

easy:

  • the message should explain what to do, even if it's only "look in your logs for other messages that might explain why"
  • a changes file

comment:10 Changed 2 years ago by nickm

Keywords: review-group-9 added; review-group-8 removed

comment:11 Changed 2 years ago by nickm

Keywords: review-group-10 added; review-group-9 removed

Moving not-reviewed-by-me tickets in review-group-9, and for-0.2.9/0.2.8 tickets, to review-group-10.

comment:12 Changed 2 years ago by nickm

Keywords: nickm-deferred-20161017 added
Milestone: Tor: 0.2.9.x-finalTor: 0.3.0.x-final

I am fairly sure that these are neither regressions nor major problems. So, deferring from 0.2.9. Please let me know if I'm wrong.

comment:13 Changed 2 years ago by nickm

Keywords: review-group-10 removed

comment:14 Changed 2 years ago by nickm

Points: 1

comment:15 Changed 23 months ago by nickm

Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final

comment:16 Changed 19 months ago by arma

I'm not sure what the goal of this ticket is.

I think most of the cases where it's the relay operator's fault, there are other more useful and more frequent log messages.

And for the other cases, where there's a Tor bug, or the relay finds itself reachable but some of the directory authorities can't reach the relay, I don't think a log message is going to be a workable way to guide the operator into working through the issue.

comment:17 Changed 19 months ago by arma

(The original ticket title, to have a log line at severity err when a relay doesn't find itself in the consensus, is a non-starter: that's not what err level severity is for. See https://www.torproject.org/docs/faq#LogLevel )

comment:18 Changed 19 months ago by nickm

Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final

comment:19 Changed 19 months ago by nickm

Keywords: nickm-deferred-20161017 removed

comment:20 Changed 15 months ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

Defer all needs_revision non-spec enhancements to 0.3.3.

comment:21 Changed 10 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

Mark a lot of assigned/needs_revision tickets as 0.3.4. If you think this should happen in 0.3.3 instead, just let me know?

comment:22 Changed 9 months ago by nickm

Keywords: 034-triage-20180328 added

comment:23 Changed 9 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:24 Changed 8 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These needs_revision, tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if somebody does the necessary revision.

Note: See TracTickets for help on using tickets.