Opened 7 months ago

Closed 3 months ago

#24991 closed defect (fixed)

relay frequently claiming "missing descriptors for 1/2 of our primary entry guards"

Reported by: alecmuffett Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.3.0.1-alpha
Severity: Minor Keywords: single-onion, guards, logging, easy, 034-triage-20180328, 034-removed-20180328
Cc: starlight@… Actual Points:
Parent ID: #23863 Points: 0.5
Reviewer: Sponsor:

Description

This is the first of a pair of bugs with the same logfiles, so unless requested I'll only attach the logs once, here, unless there's a real need to duplicate them.

I have an EOTK SingleOnion, config attached.

Per the logfile, it is both HiddenServiceSingleHopMode and HiddenServiceNonAnonymousMode.

It's from "dropsafezeahmyho" which is the onion for my personal blog (dropsafe.crypticide.com) which is very low traffic as an Onion, in fact it exists mostly as a test onion.

Issue #1 - after an extended period of uptime, tor claims:

Jan 21 08:45:04.000 [notice] Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 5975/6006).
Jan 21 08:45:04.000 [notice] I learned some more directory information, but not enough to build a circuit: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 5975/6006).

...which should not be happening, because (as a single onion) there are no Guards.

Issue 2 will be described in the next ticket, and linked back to this one.

Attachments: logfile and configfile.

Child Tickets

Attachments (2)

tor.conf (849 bytes) - added by alecmuffett 7 months ago.
config
tor.log (20.9 KB) - added by alecmuffett 7 months ago.
log

Download all attachments as: .zip

Change History (20)

Changed 7 months ago by alecmuffett

Attachment: tor.conf added

config

Changed 7 months ago by alecmuffett

Attachment: tor.log added

log

comment:1 Changed 7 months ago by alecmuffett

Followup is #24992

comment:2 Changed 6 months ago by teor

Keywords: single-onion added; singleonion removed
Milestone: Tor: 0.3.3.x-final
Parent ID: #23863

comment:3 Changed 6 months ago by teor

Keywords: logging easy added
Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final
Points: 0.5
Severity: NormalMinor

Single onion services use multi-hop paths to upload their descriptors, but these paths do not share a guard. (EntryGuards is 0.)

So I think this is an annoying logging bug, rather than anything serious.
If they ever stop building circuits, let us know. That is a problem we should fix.

comment:4 Changed 5 months ago by nickm

Keywords: 034-triage-20180328 added

comment:5 Changed 5 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:6 Changed 4 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

comment:7 Changed 3 months ago by starlight

Version: Tor: 0.3.2.9Tor: 0.3.4.1-alpha

Seeing this with 0.3.4.1-alpha on a guard relay that has no client activity. Something seems wrong about it because the shortage of descriptors is minuscule yet the message indicates a guard appears in the missing set. Performed a DROPGUARDS thinking one of the guards was bum somehow, and the message came back the next day with the fresh set. I wonder if this is a bug in the new differential descriptor logic; why would 50 relays go missing every day as an exceptional event?

DownloadExtraInfo 1

Removed cached-* and diff-cache/* before start.

Tor 0.3.4.1-alpha (git-deb8970a29ef7427) running on Linux with Libevent x.x.x, OpenSSL x.x.x, Zlib x.x.x, Liblzma x.x.x, and Libzstd x.x.x.
Bootstrapped 0%: Starting
Starting with guard context "default"
Bootstrapped 10%: Finishing handshake with directory server
I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
Bootstrapped 45%: Asking for relay descriptors
I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/6493, and can only build 0% of likely paths. (We have 0% of guards bw, 0% of midpoint bw, and 0% of exit bw = 0% of path bw.)
I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/6493, and can only build 0% of likely paths. (We have 0% of guards bw, 0% of midpoint bw, and 0% of exit bw = 0% of path bw.)
Bootstrapped 50%: Loading relay descriptors
Bootstrapped 55%: Loading relay descriptors
Bootstrapped 60%: Loading relay descriptors
Bootstrapped 65%: Loading relay descriptors
Bootstrapped 70%: Loading relay descriptors
Problem bootstrapping. Stuck at 73%: Loading relay descriptors. (No route to host; NOROUTE; count 1; recommendation warn; host X at x.x.x.x:x)
Bootstrapped 75%: Loading relay descriptors
Bootstrapped 80%: Connecting to the Tor network
Bootstrapped 85%: Finishing handshake with first hop
Bootstrapped 90%: Establishing a Tor circuit
Tor has successfully opened a circuit. Looks like client functionality is working.
Bootstrapped 100%: Done
Self-testing indicates your DirPort is reachable from the outside. Excellent. Publishing server descriptor.
Performing bandwidth self-test...done.
Heartbeat: Tor's uptime is 6:00 hours,
Heartbeat: Tor's uptime is 12:00 hours,
Heartbeat: Tor's uptime is 18:00 hours,
Heartbeat: Tor's uptime is 1 day 0:00 hours,
Heartbeat: Tor's uptime is 1 day 6:00 hours,
Heartbeat: Tor's uptime is 1 day 12:00 hours,
Heartbeat: Tor's uptime is 1 day 18:00 hours,
Heartbeat: Tor's uptime is 2 days 0:00 hours,
Heartbeat: Tor's uptime is 2 days 6:00 hours,
Heartbeat: Tor's uptime is 2 days 12:00 hours,
Heartbeat: Tor's uptime is 2 days 18:00 hours,
Heartbeat: Tor's uptime is 3 days 0:00 hours,
Heartbeat: Tor's uptime is 3 days 6:00 hours,
Heartbeat: Tor's uptime is 3 days 12:00 hours,
Heartbeat: Tor's uptime is 3 days 18:00 hours,
Heartbeat: Tor's uptime is 4 days 0:00 hours,
Channel padding timeout scheduled 209531ms in the past. 
Heartbeat: Tor's uptime is 4 days 6:00 hours,
Heartbeat: Tor's uptime is 4 days 12:00 hours,
Heartbeat: Tor's uptime is 4 days 18:00 hours,
Heartbeat: Tor's uptime is 5 days 0:00 hours,
Heartbeat: Tor's uptime is 5 days 6:00 hours,
Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6346/6383).
I learned some more directory information, but not enough to build a circuit: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6346/6383).
We now have enough directory information to build circuits.
Heartbeat: Tor's uptime is 5 days 12:00 hours,
Heartbeat: Tor's uptime is 5 days 18:00 hours,
Heartbeat: Tor's uptime is 6 days 0:00 hours,
Heartbeat: Tor's uptime is 6 days 6:00 hours,
Heartbeat: Tor's uptime is 6 days 12:00 hours,
Heartbeat: Tor's uptime is 6 days 18:00 hours,
Heartbeat: Tor's uptime is 7 days 0:00 hours,
Heartbeat: Tor's uptime is 7 days 6:00 hours,
Heartbeat: Tor's uptime is 7 days 12:00 hours,
Heartbeat: Tor's uptime is 7 days 18:00 hours,
Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6307/6352).
I learned some more directory information, but not enough to build a circuit: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6307/6352).
We now have enough directory information to build circuits.

DROPGUARDS is dangerous; make sure you understand the risks before using it. It may be removed in a future version of Tor.
Starting with guard context "default"

Heartbeat: Tor's uptime is 8 days 0:00 hours,
Heartbeat: Tor's uptime is 8 days 6:00 hours,
Heartbeat: Tor's uptime is 8 days 12:00 hours,
Heartbeat: Tor's uptime is 8 days 18:00 hours,
Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6317/6353).
I learned some more directory information, but not enough to build a circuit: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6317/6353).
We now have enough directory information to build circuits.

comment:8 Changed 3 months ago by starlight

Cc: starlight@… added

comment:9 Changed 3 months ago by starlight

Summary: SingleOnion claims "missing descriptors for 1/2 of our primary entry guards", has no guards, makes no senserelay claims "missing descriptors for 1/2 of our primary entry guards", has no guards, makes no sense

comment:10 Changed 3 months ago by starlight

Summary: relay claims "missing descriptors for 1/2 of our primary entry guards", has no guards, makes no senserelay frequently claiming "missing descriptors for 1/2 of our primary entry guards"

comment:11 Changed 3 months ago by starlight

Perhaps the messages result from a race bug between the microdescriptor refresh tail-end and network reachable evaluation logic.

comment:12 Changed 3 months ago by teor

Status: newneeds_information
Version: Tor: 0.3.4.1-alphaTor: 0.3.0.1-alpha

(This is a bug on 0.3.0.1-alpha, the earliest version with this issue.)

You've stripped the times and the log levels from your log, so it's hard to analyse this bug.

Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6346/6383).
I learned some more directory information, but not enough to build a circuit: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6346/6383).
We now have enough directory information to build circuits.

These logs show that Tor expires the microdescs for one of its guards, then fetches some more microdescs, then finally gets enough to use that guard again. How long did it take to recover?

If it only took a minute or so, then that's ok. Maybe we should downgrade the log levels. But they are useful if the descriptor download fails.

If it took a long time, then maybe we need to make the downloads more aggressive.

comment:13 Changed 3 months ago by starlight

Appears to me recovery is near instantaneous, which provoked the idea it might be a race condition in processing logic:

May 26 19:23:34 Tor[]: Heartbeat: Tor's uptime is 5 days 6:00 hours
May 26 19:25:42 Tor[]: Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6346/6383).
May 26 19:25:42 Tor[]: I learned some more directory information, but not enough to build a circuit: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6346/6383).
May 26 19:25:42 Tor[]: We now have enough directory information to build circuits.
May 29 07:23:34 Tor[]: Heartbeat: Tor's uptime is 7 days 18:00 hours
May 29 09:18:44 Tor[]: Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6307/6352).
May 29 09:18:44 Tor[]: I learned some more directory information, but not enough to build a circuit: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6307/6352).
May 29 09:18:45 Tor[]: We now have enough directory information to build circuits.
May 30 07:23:34 Tor[]: Heartbeat: Tor's uptime is 8 days 18:00 hours
May 30 09:28:45 Tor[]: Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6317/6353).
May 30 09:28:45 Tor[]: I learned some more directory information, but not enough to build a circuit: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6317/6353).
May 30 09:28:46 Tor[]: We now have enough directory information to build circuits.

Could log debug to disk with millisecond resolution if that helps. Is full debug required or would a subsystem or two work? If the latter please indicate which subsystem(s) to log.

comment:14 Changed 3 months ago by starlight

Will log this for tomorrow around the expected time (24hrs + 10min) with

SETCONF LogMessageDomains=1 LogTimeGranularity=1 Log="notice syslog" Log="[*]notice [*]info [general,dir,dirserv,guard,heartbeat,consdiff]debug file logfile$(date '+%M%S')"

If a category is missing or SafeLogging=0 should be set please advise.

comment:15 Changed 3 months ago by teor

That log config looks fine.

Please don't strip the log levels from your logs. Some of those messages have different log levels depending on internal state, and I need the log levels to debug this issue.

I think that directory_info_has_arrived() is being called twice, probably for the consensus and for descriptors. Unless something else is going wrong, that's not a bug.

But we can improve the log message by saying what kind of directory documents we just downloaded.

I'll know when I see the debug logs.

comment:16 Changed 3 months ago by starlight

Problem was no-show today.

Removed [general] from the logging (98.5% of the log size) and will leave it running for a few days to see if it pops. Will update.

comment:17 Changed 3 months ago by starlight

Issue arrived, EDT:

Jun 1 08:04:48 Tor[]: Our directory information is no longer up-to-date enough. . .
Jun 1 08:04:48 Tor[]: I learned some more directory information. . .
Jun 1 08:04:48 Tor[]: We now have enough directory information to build circuits.

Provided debug log off-ticket due to sensitive content and difficulty of obfuscation. Will provide to any interested dev.

Looking it over, seems to me the authorities prepare consensus diff documents a couple of times each hour and about 50 or so generally arrive. Appears some descriptors are overlooked (rather odd odd, perhaps a bug), and curiously the handful of missing descriptors contains one of the relay's guard nodes about once a day. The relay requests the missing/out-of-date descriptors which arrive shortly and repair the deficit. What's strange is after performing DROPGUARDS it kept happening, implying that a significant percentage of guards fall into the "overlooked" group any given day. This is casual observation and could be quite wrong--did not delve into it deeply.

comment:18 Changed 3 months ago by teor

Resolution: fixed
Status: needs_informationclosed

It's not perfect, but this code is operating sufficiently well for our purposes.

Note: See TracTickets for help on using tickets.