Opened 11 years ago

Last modified 7 years ago

#888 closed defect (Fixed)

Notice log flood on directory authority.

Reported by: ioerror Owned by:
Priority: Low Milestone: 0.2.1.x-final
Component: Core Tor/Tor Version: 0.2.1.8-alpha
Severity: Keywords:
Cc: ioerror, nickm, Sebastian Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

When attempting to start my new V3DirectoryAuthority, I have a flood of messages. I get about 700 lines that look like this (~10 per second):

Dec 17 03:14:05.954 [notice] I learned some more directory information, but not enough to build a circuit: We have only 217/1091 usable descriptors.

[Automatically added by flyspray2trac: Operating System: BSD]

Child Tickets

Change History (6)

comment:1 Changed 11 years ago by Sebastian

I just set up fluxe3 as a directory authority, and I can confirm this with removed state and cached* files. Without removing them, Tor starts normally.

When running as relay only, everything is fine.

comment:2 Changed 11 years ago by nickm

08:15 < killerchicken_> something really likes to call

directory_info_has_arrived quite often on directories :)

08:24 < nickm> It would be neat if directory_info_has_arrived had an argument

that told it what, exactly, had arrived, so it could report this
to the user.

08:25 < killerchicken_> I think what's happening sometimes is that it got a new

descriptor, but rejects it because of a timeout or
something, but the function is still called

08:26 < nickm> Plausible, and shouldn't be too hard to fix. Want to give it a

try?

08:28 < killerchicken_> I've been looking at it for a bit now
08:28 < killerchicken_> I get a lot of "I learned some more directory

information, but not enough to build a circuit: We have
no network-status consensus." messages, too.

08:28 < killerchicken_> like, 5-10 per second
08:29 < killerchicken_> at the beginning of startup
08:29 < killerchicken_> I have some debug level logs that I'm investigating

right now.

08:34 < nickm> So, it would be easy enough to fix the code in directory.c so

that it only calls directory_info_has_arrived() if a router
really did show up. But there's a deeper problem that we need
to answer: _why_ are we requesting directory info we don't
actually want to add?

08:35 < killerchicken_> right, just silencing the messages is not what I had in

mind.

08:35 < killerchicken_> So what I'm trying to figure out now is how I a

directoryi differs from an ordinary relay

08:36 < killerchicken_> erm... s/howI am directoryi/a directory/
08:37 < killerchicken_> So the really weird thing is this: I just reran the

test, and all problems went away.

08:37 < killerchicken_> I wonder whether this is related to the CCC's DA being

slow.

08:40 < nickm> Did you re-run it with the same contents of your caches?
08:40 < killerchicken_> with erased caches
08:41 < nickm> interesting.
08:42 < killerchicken_> Jake called me a liar, so I'll recheck :)
08:43 < nickm> So, if this only happens sometimes, the short term answer is

probably to log better.

08:43 < killerchicken_> I think that's the case.
08:43 < nickm> did you have debug or info logs from your first run,where you

saw the strange behavior?

08:43 < killerchicken_> Yes, and I now have a run with the same strange

behaviour again

Looking at killerchicken's logs, it seems that most of the errors he's seeing are "I learned some more directory
information, but not enough to build a circuit: We have no network-status consensus." These are mostly preceded by
messages like:

Dec 17 14:45:24.051 [info] connection_dir_client_reached_eof(): Received server
info (size 1558) from server '86.59.21.38:80'
Dec 17 14:45:24.051 [debug] router_parse_list_from_string(): Read router 'dfkdnf
sdklnf', purpose 'general'
Dec 17 14:45:24.051 [info] router_load_routers_from_string(): 1 elements to add
Dec 17 14:45:24.051 [debug] dirserv_get_status_impl(): 1 fingerprints, 1 digests

known.

Dec 17 14:45:24.051 [info] authdir_wants_to_reject_router(): Publication time for

router with nickname 'dfkdnfsdklnf' is too far (1210 minutes) in the past. Not
adding (Contact , Platform "Tor 0.2.0.31 (r16744) on Windows \"Longhorn\"
Service Pack 2, v.113 [domain controller] {datacenter} {terminal services, single
user} {terminal services}")

Dec 17 14:45:24.052 [notice] I learned some more directory information, but not

enough to build a circuit: We have no network-status consensus.

So it looks like we're downloading a decriptor that we then immediately reject because it was
published too long ago.

Interestingly, we're requesting and rejecting the same old routers over and over!
[238]% grep -i authdir_wants ~/sh_888_debuglog | perl -ne '/with nickname ([^ ]+)/ && print "$1\n";' | sort -n |uniq -c

80 'BassAckwards'

152 'CryptonHooligan'
109 'Dukeman'
542 'Unnamed'

83 'dfkdnfsdklnf'
82 'gamer'

119 'server'

Discounting the Unnamed router, it looks like we are requesting, failing, retrying, ad infinitum. This means
we're failing to treat getting a descriptor that we then could not add as a sign to not retry that descriptor.

comment:3 Changed 11 years ago by nickm

I checked in killerchicken's fix as r17756. It may be better now.

comment:4 Changed 11 years ago by nickm

The symptom hasn't recurred since we fixed one possible cause of it, and I can't find any other possible cause.

Closing as fixed; reopen if this recurs.

comment:5 Changed 11 years ago by nickm

flyspray2trac: bug closed.

comment:6 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.