Opened 4 years ago

Closed 3 years ago

#13762 closed defect (fixed)

Confusing warn about incompatible ei

Reported by: Sebastian Owned by: nickm
Priority: High Milestone: Tor: 0.2.6.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: nickm-patch
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Back in #9812, we added more logging, which is now being triggered by the dirauths. Unfortunately, the message is pretty useless, it produces warnings like this on the dirauths:

[warn] router info incompatible with extra info (reason: Extrainfo published time did not match routerdesc)

without any further information, this isn't actionable. It'd be much better to know which routerdesc and ei are actually a part of the problem, and the exact times that the descriptors we're looking at are claiming.

I'm not calling this a regression because it's a new log message for an issue that existed previously, but dirauth ops are confused by this.

Child Tickets

Change History (24)

comment:1 Changed 4 years ago by nickm

Hm. We warn about publication time mismatches with higher priority than we complain about digest mismatches. So if we have a completely mismatched ei, we will report it as having the wrong publication time. So that's not great.

Also, this warning doesn't take the source into account: it happens even when we're loading the ei from cache, or when we get an ei upload right after an ri upload we rejected, where we would expect the ei not to match.

I think the best thing to do here is to figure out how to warn about where the ei in question came from. If we downloaded it, that's a problem. Any other source is not a problem and we should stop warning about that.

comment:2 Changed 4 years ago by nickm

Priority: normalmajor

But according to #9812, we call this right after connection_dir_client_reached_eof(), which implies that we did download it.

How are we fetching ei documents? Are we doing it by digest? Are the digests right? More logs might help here.

comment:3 Changed 4 years ago by nickm_mobile

Actually, this could be a race condition if it happens because we get a new ri while in the middle of downloading an ei for the old one.

comment:4 Changed 3 years ago by arma

moria1 gets a bazillion of these messages on startup, presumably while reading its cached extrainfo files.

comment:5 Changed 3 years ago by arma

I found moria1's logs useless and unmanageable with this bug.

For now I've fixed ("fixed") it on moria1 with

-    log_warn(LD_DIR,"router info incompatible with extra info (reason: %s)",
+    log_info(LD_DIR,"router info incompatible with extra info (reason: %s)",

comment:6 Changed 3 years ago by nickm

Status: newneeds_review

Branch "ticket13762" makes it so we don't log those when reading from the cache. Does that fix the problem?

comment:7 Changed 3 years ago by nickm

Keywords: nickm-patch added

Add the nickm-patch keyword to a bunch of needs_review tickets.

comment:8 Changed 3 years ago by rl1987

The patch looks correct in a way that it prevents the torrent of warnings when extra info documents are being loaded from cache.

comment:9 Changed 3 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Thanks for the review! Merged to master.

comment:10 Changed 3 years ago by arma

Resolution: fixed
Status: closedreopened

I'm still getting these in moria1's logs, a lot:

Feb 08 07:12:05.015 [warn] router info incompatible with extra info (reason: Extrainfo published time did not match routerdesc)
Feb 08 07:20:12.240 [warn] router info incompatible with extra info (reason: Extrainfo published time did not match routerdesc)
Feb 08 07:23:15.430 [warn] router info incompatible with extra info (reason: Extrainfo published time did not match routerdesc)
Feb 08 07:25:17.504 [warn] router info incompatible with extra info (reason: Extrainfo published time did not match routerdesc)
Feb 08 07:26:18.544 [warn] router info incompatible with extra info (reason: Extrainfo published time did not match routerdesc)
...

Here's our slightly more useful hint:

Feb 08 07:35:27.836 [info] router_load_extrainfo_from_string(): 1 elements to add
Feb 08 07:35:27.836 [warn] router info incompatible with extra info (reason: Extrainfo published time did not match routerdesc)
Feb 08 07:35:27.851 [info] connection_dir_client_reached_eof(): Received 0/1 extra-info documents requested from 171.25.193.9:443

I wonder if I'm noticing that I'm missing it, trying to fetch it from maatuska, not liking it, and repeating.

comment:11 Changed 3 years ago by nickm

Indeed; if so, that's quite bad. Thanks for reporting!

comment:12 Changed 3 years ago by nickm

hmmm. Another issue: extrainfo_insert checks for compatibility with the router whose identity matches the extrainfo's identity. But update_extrainfo_downloads() fetches based on router descriptors in 'routers' and in 'old_routers'. That can't be right!

comment:13 Changed 3 years ago by nickm

Owner: set to nickm
Status: reopenedaccepted

comment:14 Changed 3 years ago by nickm

I think the better fix here in 0.2.6 is not to download extrainfos from old_routers. Nobody seems to have missed them, and doing the more thorough fix seems likely to expose any weird bugs laying around in the extrainfo storage structures.

comment:15 Changed 3 years ago by nickm

false alarm; the routerinfo_t in extrainfo_insert is only used for its identity key. The sd that we look up should be fine in its context.

comment:16 Changed 3 years ago by nickm

Status: acceptedneeds_review

branch "bug13762_diagnostic_redux" adds more log messages to try to see what's up, and check for a possibly scary bug. Shall I merge it?

comment:17 Changed 3 years ago by yawning

lgtm. It's still a bit spammy but nickm notes that knowing if the same id(s) is/are to blame is important for now, so the verbosity is needed while this is being debugged.

comment:18 Changed 3 years ago by nickm

Status: needs_reviewneeds_information

Merged diagnostic patch; waiting for results.

comment:19 Changed 3 years ago by arma

Feb 18 16:10:18.684 [info] update_consensus_router_descriptor_downloads(): 0 router descriptors downloadable. 157 delayed; 6891 present (228 of those were in old_routers); 1 would_reject; 0 wouldnt_use; 0 in progress.
Feb 18 16:10:18.691 [info] update_extrainfo_downloads(): Extrainfo download status: 9 router with no ei, 15280 with present ei, 3 delaying, 0 pending, 0 downloadable.
Feb 18 16:10:18.691 [warn] update_extrainfo_downloads(): Bug: While downloading extrainfo documents, I found 0 inconsistencies in routers and 1 inconsistencies in old_routers.
[...]
Feb 18 16:40:48.835 [info] update_consensus_router_descriptor_downloads(): 0 router descriptors downloadable. 157 delayed; 6891 present (460 of those were in old_routers); 1 would_reject; 0 wouldnt_use; 0 in progress.
Feb 18 16:40:48.842 [info] update_extrainfo_downloads(): Extrainfo download status: 11 router with no ei, 15631 with present ei, 0 delaying, 0 pending, 3 downloadable.
Feb 18 16:40:48.842 [warn] update_extrainfo_downloads(): Bug: While downloading extrainfo documents, I found 0 inconsistencies in routers and 1 inconsistencies in old_routers. [29 similar message(s) suppressed in last 1800 seconds]

comment:20 Changed 3 years ago by arma

Quote from my sentence on irc: "wherever tor computes how many extrainfos are in a fetchable state, it should skip over these type because they are not in a fetchable state." "these type" being the ones that don't have a corresponding relay descriptor and whatever else is going to be needed to verify them once we get them.

comment:21 Changed 3 years ago by nickm

So, the only warning you're seeing now is "Bug: While downloading extrainfo documents, I found 0 inconsistencies in routers and 1 inconsistencies in old_routers."? If that's true, think this tells us more about this bug and what to do about it.

comment:22 Changed 3 years ago by nickm

14:13 < armadev> re 13762 i think the old warning is gone

comment:23 Changed 3 years ago by nickm

Status: needs_informationneeds_review

In that case, bug13762_quiet in my public repo is probably the branch we want.

comment:24 Changed 3 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Merged after review and fixup by like, everybody on #tor-dev.

Note: See TracTickets for help on using tickets.