Since I've upgraded to .7-alpha I am having my log file filled with the above message, which is generated a couple of times a second!
I have, so far, accumulated over 14,000 lines with this crap, which seems to be generated when tor was left idle (unused) overnight. I had to increase my log level (to "warn") in order to get rid of it temporarily, but that isn't good enough, since I cannot see my other (very useful) messages appearing at that level (notice) without having my log filled rapidly.
Here's the logline from the original bug title:
"No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live."
Trac: Username: mr-4
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I've slapped a one-hour rate-limit on the message in 127cb39ffcacbcd99633671e92d299ef53bbac46. That's not the real fix; this message really shouldn't be triggering that much. Mike, any ideas?
Hrmm.. The bug submitter seems to be hitting the rageahol a little too hard this holiday season. A little investigation reveals a few things are incorrect in the bug report's description.
First: The reported logline is info level, not notice. It's possible that the notice line was also triggered, but that's not immediately clear based on the situation description.
Second: I strongly suspect that Tor wasn't idle, but was actually disconnected from the network. This is probably the real bug+repro case here.
So the remaining question is: Does this actually happen for idle clients as reported, or do they have to be disconnected from the network in addition/instead?
Trac: Summary: remove (or scale down a level) obnoxious "No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live." message to Tor seems to build+timeout circuits for hours when offline/idle Description: Since I've upgraded to .7-alpha I am having my log file filled with the above message, which is generated a couple of times a second!
I have, so far, accumulated over 14,000 lines with this crap, which seems to be generated when tor was left idle (unused) overnight. I had to increase my log level (to "warn") in order to get rid of it temporarily, but that isn't good enough, since I cannot see my other (very useful) messages appearing at that level (notice) without having my log filled rapidly.
to
Since I've upgraded to .7-alpha I am having my log file filled with the above message, which is generated a couple of times a second!
I have, so far, accumulated over 14,000 lines with this crap, which seems to be generated when tor was left idle (unused) overnight. I had to increase my log level (to "warn") in order to get rid of it temporarily, but that isn't good enough, since I cannot see my other (very useful) messages appearing at that level (notice) without having my log filled rapidly.
Here's the logline from the original bug title:
"No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live."
First: The reported logline is info level, not notice. It's possible that the notice line was also triggered, but that's not immediately clear based on the situation description.
Are you sure? In master I have:
log_notice(LD_CIRC, "No circuits are opened. Relaxed timeout for " "a circuit with channel state %s to %ldms. " "However, it appears the circuit has timed out anyway. " "%d guards are live. %s",
vs
log_info(LD_CIRC, "No circuits are opened. Relaxing timeout for " "a circuit with channel state %s. %d guards are live.",
The "However, a circuit has timed out anyway" line in the original makes it look like the notice is getting hit. The info one didn't have that. Are you thinking of a different log message?
Doh. I misread it in the title. I guess then the only question that remains is if it is really being idle that causes it. If it's only disconnect, then the fix can be as simple as not logging if your entry guards are not live, but if it is also triggerable via being idle, then the fix is not clear to me yet.
Upgraded to 0.2.4.7-alpha-dev (git-127cb39ffcacbcd9) and I too am seeing those messages, among other things:
Dec 26 23:08:54.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy.[28 similar message(s) suppressed in last 60 seconds]Dec 26 23:08:54.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state opento 60000ms. However, it appears the circuit has timed out anyway. 4 guards are live. [5893 similar message(s) suppressed in last 3600 seconds]
The first message was kinda expected because at the time there were really a lot Tor connections open and the box (powerpc) was somewhay busy. I just put the message in here because it appeared at the same time as the "No circuits are opened" message.
Doh. I misread it in the title. I guess then the only question that remains is if it is really being idle that causes it. If it's only disconnect, then the fix can be as simple as not logging if your entry guards are not live, but if it is also triggerable via being idle, then the fix is not clear to me yet.
As far as I can tell, judging by my iptables logs (number of packets passed through the various chains etc), it is as idle as it could be. There is certainly no other traffic, other than tor's own, which is active.
The period I referred to in my submission was overnight when nothing is using this machine (this is on my dmz zone and I control all the traffic passing to/from it).
Today I conducted a little test and manually shut down all other services which run on that machine and left only tor, plus its internal dns service running.
Surprise, surprise - I started seeing tor's own log filling out pretty rapidly with that message (I changed the destination from being the syslog as I noticed that the level and the code group gets reported when I use tor's own log file - like /var/log/tor/tor.log - compared to syslog where neither the loglevel, nor the code group gets included).
Over to you - let me know if there is anything I can help you with...
For the non-believers (yes, I am looking at you, Mike) I am going to try and attach a log (compressed gz archive), which covers the idle period from last night (~12 hours) so that you can see these messages and the rate at which they are generated (roughly 23,000 lines spewed out by tor in total during that period), so, hopefully, it would help diagnose this problem. Please note that there are also messages related to Bug 7472 in it.
The only details "sanitised" from that log are my bridge details (yes, I connect to tor using them, for various reasons).
Later tonight I am going to apply Nick's rate-limiting patch then reinstall tor and see how it goes.
(Does this still happen with bufferevents disabled? I'm inferring that bufferevents are enabled because one of the warning messages refers to connection.c:3152, which is bufferevent-only code. I don't know how bufferevents could cause this bug, but it would be good to rule them out.)
Do you mean this bug or #7472 (moved) (the connection.c:3152 one)? I am, indeed, building tor with bufferevents enabled. Would you like me to build it (and apply your patch) with bufferevents disabled and report back?
Nick's time-limiting patch fails to apply against .7-alpha so I had to modify it a bit. I am going to attach the modified patch at the end.
Even with the modified patch I couldn't build tor as I was getting the following error during compilation:
src/or/circuituse.c: In function 'circuit_expire_building':
src/or/circuituse.c:484: error: too many arguments to function 'num_live_entry_guards'
make[1]: *** [src/or/circuituse.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory `/builddir/build/BUILD/tor-0.2.4.7-alpha'
make: *** [all] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.5zPBZe (%build)
It seems the problem was with "num_live_entry_guards" which in .7-alpha doesn't take any arguments, but in Nick's patch it is executed as "num_live_entry_guards(0)". I did change that as well.
After rebuilding tor with bufferevents disabled, the situation is much more manageable - I am attaching a short log, covering the overnight period. There are no Bug #7472 (moved) messages in the log either (understandable, since bufferevents has been disabled). If you need anything else from me let me know...
I never disbelieved this was happening, I only stated that the symptoms were unclear. What I still don't understand is why an idle-but-still-connected client would actually be attempting to build tens of thousands of circuits and fail at all of them for seemingly no reason. That makes me think the code I added and the log message here only made another bug visible, and is not the actual cause of the issue...
I never disbelieved this was happening, I only stated that the symptoms were unclear.
Oh, yeah?
Quoting from your topmost comment above:
"The bug submitter seems to be hitting the rageahol a little too hard this holiday season. A little investigation reveals a few things are incorrect in the bug report's description".
Care to explain that comment and what exactly in my bug submission is "incorrect"?
What I still don't understand is why an idle-but-still-connected client would actually be attempting to build tens of thousands of circuits and fail at all of them for seemingly no reason.
You "don't understand"? Hang on a minute...
...this is what you started off with:
"First: The reported logline is info level, not notice. It's possible that the notice line was also triggered, but that's not immediately clear based on the situation description.
Second: I strongly suspect that Tor wasn't idle, but was actually disconnected from the network."
That doesn't read as something you "misunderstood" to me - you were pretty damn certain, though as was later established, pretty wrong on both counts, so I don't know who is the one "hitting the rageahol a little too hard this holiday season"...
That makes me think the code I added and the log message here only made another bug visible, and is not the actual cause of the issue...
A word of advice - next time, take a step back, count to 10 and then post before you go off with all your guns blazing.
Fine people: the important thing here is fixing this bug. I don't care who was right about which stuff when. I care about diagnosing and fixing this bug. Does anyone have a plan for tracking its causes down?
Mike, if the current log info isn't adequate to show what the problem is, what additional info would help.
Who else here has tried reproducing the bug? Did it work? Mr-4, are the repro instructions as simple as "leave a tor client with default config options running overnight" or is it harder than that?
Mr-4, are the repro instructions as simple as "leave a tor client with default config options running overnight" or is it harder than that?
No idea as I am running a single instance of tor on one of my dmz machines and have no real basis of comparison with standard/other tor configurations.
My own torrc isn't the default one, obviously, but I'd like to think that my config doesn't deviate from a "standard" configuration too much, with the exception of, maybe, my ExcludeNodes (which is quite extensive as I block places like US, China and the like) and the fact that I run bridges instead of relying on "normal" tor connections.
I also have a good few MapAddress statements in my torrc as well (15-20 of them). I don't run hidden services and my tor isn't used as a bridge relay or exit node either.
OK, I've just looked at torrc.sample, taking into account all default torrc (client) settings (as they appear in the man page) and compared them to my existing torrc. The differences, as far as this bug submission goes, are:
Hope this helps. I also realised I could have different Log statements to include/exclude specific domains and specify different log levels (didn't know I could do that), so if you require something more specific, just let me know.
I've attached my torrc to this ticket. When these messages appeared I just upgraded to 0.2.4.7-alpha-dev and for some reason my little tor machine was really busy, much more busy than usual, hence the "Your computer is too slow..." messages right before the "No circuits are opened..." message. The "No circuits are opened..." message appeared only 3 times in my log so far:
Dec 26 22:08:54.000
Dec 26 23:08:54.000 [5893 similar message(s) suppressed in last 3600 seconds]
Dec 27 00:08:54.000 [1013 similar message(s) suppressed in last 3600 seconds]
...and has not reappeared ever since. This tor machine has returned to normal workload around that time, so I suspect this message was/will only be triggered during high loads.
I tried running an idle Tor 51c2dd8f8954d52fbaef29d2afb44d5eda47f4da overnight with and empty .torrc, and got some of a similar messages, though not quite so many, and with "2 guards are live" as opposed to 0. The first such message appeared about 2 hours after bootstrapping; then they came about 5-13 per hour.
I've started another idle Tor with a couple of bridges from bridges.torproject.org and no other torrc options; I'll see if this gets me a flood of the messages.
Weird. The one with two bridges in the .torrc gives me:
Jan 01 10:44:26.000 [notice] Bootstrapped 100%: Done.Jan 01 12:29:06.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 4 guards are live.
Four guards? What's up with that? I've only got two bridges.
Weird. The one with two bridges in the .torrc gives me:
[...]
Four guards? What's up with that? I've only got two bridges.
From experience, sometimes tor creates more than 1 "instance" (or connection) to a particular bridge. That usually happens when there is a peak load and there are many requests from the client (i.e. loading a web page containing lots of pictures), so the existing (single) bridge connection isn't enough to cope, so tor creates another one to the same bridge.
If so, I suspect for each opened connection a new guard is needed, though I might be wrong as my understanding of the inner tor workings is close to nil.
On a separate note and commenting on what Christian posted above - the (dmz) machine my tor is running on doesn't have any load issues - even though it is an old beat-up Pentium-based machine, it still takes quite a lot to slow it down, particularly when everything, including tor, is in "idle" state.
Ah, here we go. I left the Tor instance with two bridges running overnight, and eventually got:
Jan 02 08:15:56.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [669 similar message(s) suppressed in last 3600 seconds]Jan 02 09:15:56.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [1753 similar message(s) suppressed in last 3600 seconds]
Getting info logs now to see if the situation becomes clear.
Mike, this is reproducible (see above). Can you start a tor instance with two bridges and info-level logging, leave it idle for a couple of days, and see if you don't get the same messages? If so, it would be cool if either you can figure out what's causing them, OR figure out what log messages to add to help figure out what's causing them.
Trac: Status: new to assigned Owner: N/Ato mikeperry Keywords: N/Adeleted, tor-client 024-deferrable added
I've got the same problem on my Tor 0.2.5.0-alpha-dev, Linux based "000000000000myTOR" middleman relay:
"Mar 14 06:26:02.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [135218 similar message(s) suppressed in last 3600 seconds]
Mar 14 07:26:02.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [146378 similar message(s) suppressed in last 3600 seconds]
Mar 14 08:26:02.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [149349 similar message(s) suppressed in last 3600 seconds]
Mar 14 09:26:02.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [142360 similar message(s) suppressed in last 3600 seconds]"
etc.
SocksPort is set to 0 ("SocksPort 0") so i think it's normal that there are no open clients circuits.
[notice] Tor 0.2.5.0-alpha-dev (git-331e4dcb460b5c91) opening log file....[notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 13 guards are live. [5 similar message(s) suppressed in last 3600 seconds][notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 13 guards are live. [3 similar message(s) suppressed in last 3600 seconds]...No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 13 guards are live. [8 similar message(s) suppressed in last 3600 seconds]
The attached log diff should tell us more information about the type and length of these circuits. Could they perhaps be tunneled directory circuits for fetching the consensus or some other automatic circuit that we've been (accidentally?) relying on the timeout code to clean up for us instead of closing? My hope is that that patch will let us know that much, at least..
Note I added the circuit identifier to the log line in this version of the patch, so we can tell if this is just one circuit hitting the condition over and over again for hours or if there are multiple circuits involved.
I've applied log-circuit-info.diff to yesterday's git checkout and restarted Tor at 23:44 local time (see my tor-notices.log). So far this "No circuits are opened" message occured 11 times in the log (with thousands of messages supressed).
Aha! The damn thing is an already opened 1-hop circuit. It's probably a tunneled dirconn, as I guessed earlier. But this helps us greatly: We certainly shouldn't be trying to relax the timeout on already opened circuits.
I pushed a commit to mikeperry/bug7799 (160a7b73715230c4509e9ca1088bd31ff1f7abaa) to fix this case. More may remain.
Hrmm. Trac doesn't send email about attachments.. If you want to test this without git, apply logfix.diff and log-circuit-info.diff and paste any log messages you still see.
Please also let us know if you test those patches and the problem seems to go away.
I used your git branch (thanks for the mini-howto!) and a newly built Tor is already running for 30min now - I'll report back if there are any log messages.