Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#7799 closed defect (fixed)

Tor seems to build+timeout circuits for hours when offline/idle

Reported by: mr-4 Owned by: mikeperry
Priority: Medium Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.7-alpha
Severity: Keywords: tor-client 024-deferrable
Cc: mikeperry, tor@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by mikeperry)

Since I've upgraded to .7-alpha I am having my log file filled with the above message, which is generated a couple of times a *second*!
\
I have, so far, accumulated over 14,000 lines with this crap, which seems to be generated when tor was left idle (unused) overnight. I had to increase my log level (to "warn") in order to get rid of it temporarily, but that isn't good enough, since I cannot see my other (very useful) messages appearing at that level (notice) without having my log filled rapidly.

Here's the logline from the original bug title:
"No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live."

Child Tickets

Attachments (8)

tor-2.4.7-bug7799.log.gz (52.8 KB) - added by mr-4 7 years ago.
Tor logs covering the idle period from last night
tor-0.2.4.7-mod-bug7799.patch (1.9 KB) - added by mr-4 7 years ago.
Modified time-limiting patch against tor 0.2.4.7-alpha
tor-0.4.7-bug7799-20121230.log (2.3 KB) - added by mr-4 7 years ago.
Overnight tor log
torrc.txt (1.3 KB) - added by Christian 7 years ago.
My torrc
log-circuit-info.diff (1.8 KB) - added by mikeperry 7 years ago.
Log more circuit info for relaxed timeouts
tor-notices.log (40.2 KB) - added by Christian 7 years ago.
logfix.diff (895 bytes) - added by mikeperry 7 years ago.
Alternate patch to use instead of git. Please also use the other patch if you use this directly.
tor_notices_fd10.log (9.1 KB) - added by Christian 7 years ago.
16 hours notices.log running Tor git#bug7799

Download all attachments as: .zip

Change History (45)

comment:1 Changed 7 years ago by nickm

Cc: mikeperry added
Milestone: Tor: 0.2.4.x-final

I've slapped a one-hour rate-limit on the message in 127cb39ffcacbcd99633671e92d299ef53bbac46. That's not the real fix; this message really shouldn't be triggering that much. Mike, any ideas?

comment:2 Changed 7 years ago by mikeperry

Description: modified (diff)
Summary: remove (or scale down a level) obnoxious "No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live." messageTor seems to build+timeout circuits for hours when offline/idle

Hrmm.. The bug submitter seems to be hitting the rageahol a little too hard this holiday season. A little investigation reveals a few things are incorrect in the bug report's description.

First: The reported logline is info level, not notice. It's possible that the notice line was *also* triggered, but that's not immediately clear based on the situation description.

Second: I strongly suspect that Tor wasn't idle, but was actually disconnected from the network. This is probably the real bug+repro case here.

So the remaining question is: Does this actually happen for idle clients as reported, or do they have to be disconnected from the network in addition/instead?

comment:3 in reply to:  2 Changed 7 years ago by nickm

Replying to mikeperry:

[...]

First: The reported logline is info level, not notice. It's possible that the notice line was *also* triggered, but that's not immediately clear based on the situation description.

Are you sure? In master I have:

          log_notice(LD_CIRC,
                 "No circuits are opened. Relaxed timeout for "
                 "a circuit with channel state %s to %ldms. "
                 "However, it appears the circuit has timed out anyway. "
                 "%d guards are live. %s",

vs

          log_info(LD_CIRC,
                 "No circuits are opened. Relaxing timeout for "
                 "a circuit with channel state %s. %d guards are live.",

The "However, a circuit has timed out anyway" line in the original makes it look like the notice is getting hit. The info one didn't have that. Are you thinking of a different log message?

comment:4 Changed 7 years ago by mikeperry

Doh. I misread it in the title. I guess then the only question that remains is if it is really being idle that causes it. If it's only disconnect, then the fix can be as simple as not logging if your entry guards are not live, but if it is also triggerable via being idle, then the fix is not clear to me yet.

comment:5 Changed 7 years ago by Christian

Cc: tor@… added

Upgraded to 0.2.4.7-alpha-dev (git-127cb39ffcacbcd9) and I too am seeing those messages, among other things:

Dec 26 23:08:54.000 [warn] Your computer is too slow to handle this many circuit creation requests! 
Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy.
[28 similar message(s) suppressed in last 60 seconds]

Dec 26 23:08:54.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open
to 60000ms. However, it appears the circuit has timed out anyway. 4 guards are live. 
[5893 similar message(s) suppressed in last 3600 seconds]

The first message was kinda expected because at the time there were really a lot Tor connections open and the box (powerpc) was somewhay busy. I just put the message in here because it appeared at the same time as the "No circuits are opened" message.

comment:6 in reply to:  4 Changed 7 years ago by mr-4

Replying to mikeperry:

Doh. I misread it in the title. I guess then the only question that remains is if it is really being idle that causes it. If it's only disconnect, then the fix can be as simple as not logging if your entry guards are not live, but if it is also triggerable via being idle, then the fix is not clear to me yet.

As far as I can tell, judging by my iptables logs (number of packets passed through the various chains etc), it is as idle as it could be. There is certainly no other traffic, other than tor's own, which is active.

The period I referred to in my submission was overnight when nothing is using this machine (this is on my dmz zone and I control all the traffic passing to/from it).

Today I conducted a little test and manually shut down all other services which run on that machine and left only tor, plus its internal dns service running.

Surprise, surprise - I started seeing tor's own log filling out pretty rapidly with that message (I changed the destination from being the syslog as I noticed that the level and the code group gets reported when I use tor's own log file - like /var/log/tor/tor.log - compared to syslog where neither the loglevel, nor the code group gets included).

Over to you - let me know if there is anything I can help you with...

comment:7 Changed 7 years ago by mr-4

For the non-believers (yes, I am looking at you, Mike) I am going to try and attach a log (compressed gz archive), which covers the idle period from last night (~12 hours) so that you can see these messages and the rate at which they are generated (roughly 23,000 lines spewed out by tor in total during that period), so, hopefully, it would help diagnose this problem. Please note that there are also messages related to Bug 7472 in it.

The only details "sanitised" from that log are my bridge details (yes, I connect to tor using them, for various reasons).

Later tonight I am going to apply Nick's rate-limiting patch then reinstall tor and see how it goes.

Changed 7 years ago by mr-4

Attachment: tor-2.4.7-bug7799.log.gz added

Tor logs covering the idle period from last night

comment:8 Changed 7 years ago by nickm

[505]$ wc -c tor-2.4.7-bug7799.log.gz 
   54016 tor-2.4.7-bug7799.log.gz
[506]$ gzcat tor-2.4.7-bug7799.log.gz  |wc -c
 4308541

Thank goodness for compression. :/

comment:9 Changed 7 years ago by nickm

(Is anybody seeing this who *isn't* building with bufferevents?)

comment:10 Changed 7 years ago by nickm

(Does this still happen with bufferevents disabled? I'm inferring that bufferevents are enabled because one of the warning messages refers to connection.c:3152, which is bufferevent-only code. I don't know how bufferevents could cause _this_ bug, but it would be good to rule them out.)

comment:11 Changed 7 years ago by mr-4

Do you mean this bug or #7472 (the connection.c:3152 one)? I am, indeed, building tor with bufferevents enabled. Would you like me to build it (and apply your patch) with bufferevents disabled and report back?

comment:12 Changed 7 years ago by mr-4

Nick's time-limiting patch fails to apply against .7-alpha so I had to modify it a bit. I am going to attach the modified patch at the end.

Even with the modified patch I couldn't build tor as I was getting the following error during compilation:

src/or/circuituse.c: In function 'circuit_expire_building':
src/or/circuituse.c:484: error: too many arguments to function 'num_live_entry_guards'
make[1]: * [src/or/circuituse.o] Error 1
make[1]:
* Waiting for unfinished jobs....
make[1]: Leaving directory `/builddir/build/BUILD/tor-0.2.4.7-alpha'
make: * [all] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.5zPBZe (%build)

It seems the problem was with "num_live_entry_guards" which in .7-alpha doesn't take any arguments, but in Nick's patch it is executed as "num_live_entry_guards(0)". I did change that as well.

After rebuilding tor with bufferevents disabled, the situation is much more manageable - I am attaching a short log, covering the overnight period. There are no Bug #7472 messages in the log either (understandable, since bufferevents has been disabled). If you need anything else from me let me know...

Changed 7 years ago by mr-4

Modified time-limiting patch against tor 0.2.4.7-alpha

Changed 7 years ago by mr-4

Overnight tor log

comment:13 Changed 7 years ago by mikeperry

I never disbelieved this was happening, I only stated that the symptoms were unclear. What I still don't understand is why an idle-but-still-connected client would actually be attempting to build tens of thousands of circuits and fail at all of them for seemingly no reason. That makes me think the code I added and the log message here only made another bug visible, and is not the actual cause of the issue...

comment:14 in reply to:  13 Changed 7 years ago by mr-4

Replying to mikeperry:

I never disbelieved this was happening, I only stated that the symptoms were unclear.

Oh, yeah?

Quoting from your topmost comment above:

"The bug submitter seems to be hitting the rageahol a little too hard this holiday season. A little investigation reveals a few things are incorrect in the bug report's description".

Care to explain that comment and what exactly in my bug submission is "incorrect"?

What I still don't understand is why an idle-but-still-connected client would actually be attempting to build tens of thousands of circuits and fail at all of them for seemingly no reason.

You "don't understand"? Hang on a minute...

...this is what you started off with:

"First: The reported logline is info level, not notice. It's possible that the notice line was *also* triggered, but that's not immediately clear based on the situation description.

Second: I strongly suspect that Tor wasn't idle, but was actually disconnected from the network."

That doesn't read as something you "misunderstood" to me - you were pretty damn certain, though as was later established, pretty wrong on both counts, so I don't know who is the one "hitting the rageahol a little too hard this holiday season"...

That makes me think the code I added and the log message here only made another bug visible, and is not the actual cause of the issue...

A word of advice - next time, take a step back, count to 10 and then post before you go off with all your guns blazing.

comment:15 Changed 7 years ago by nickm_mobile

Fine people: the important thing here is fixing this bug. I don't care who was right about which stuff when. I care about diagnosing and fixing this bug. Does anyone have a plan for tracking its causes down?

Mike, if the current log info isn't adequate to show what the problem is, what additional info would help.

Who else here has tried reproducing the bug? Did it work? Mr-4, are the repro instructions as simple as "leave a tor client with default config options running overnight" or is it harder than that?

comment:16 in reply to:  15 Changed 7 years ago by mr-4

Replying to nickm_mobile:

Mr-4, are the repro instructions as simple as "leave a tor client with default config options running overnight" or is it harder than that?

No idea as I am running a single instance of tor on one of my dmz machines and have no real basis of comparison with standard/other tor configurations.

My own torrc isn't the default one, obviously, but I'd like to think that my config doesn't deviate from a "standard" configuration too much, with the exception of, maybe, my ExcludeNodes (which is quite extensive as I block places like US, China and the like) and the fact that I run bridges instead of relying on "normal" tor connections.

I also have a good few MapAddress statements in my torrc as well (15-20 of them). I don't run hidden services and my tor isn't used as a bridge relay or exit node either.

comment:17 Changed 7 years ago by mr-4

OK, I've just looked at torrc.sample, taking into account all default torrc (client) settings (as they appear in the man page) and compared them to my existing torrc. The differences, as far as this bug submission goes, are:

ClientOnly 1
BridgeRelay 0
ORPort 0
DirPort 0
ExitPolicy reject *:*
NewCircuitPeriod 600
MaxCircuitDirtiness 600
usemicrodescriptors 1
FascistFirewall 1
AllowInvalidNodes
ProtocolWarnings 1
RefuseUnknownExits 1
StrictNodes 1
FastFirstHopPK 0
ReachableAddresses <bridges_ip_and_port_only>

Hope this helps. I also realised I could have different Log statements to include/exclude specific domains and specify different log levels (didn't know I could do that), so if you require something more specific, just let me know.

comment:18 Changed 7 years ago by nickm

Hm. Christian, did you have similar options set when you saw this behavior?

If I had to guess what's going on here, I'd suspect that using bridges might be necessary to reproduce this, but I'm not too sure.

Changed 7 years ago by Christian

Attachment: torrc.txt added

My torrc

comment:19 Changed 7 years ago by Christian

I've attached my torrc to this ticket. When these messages appeared I just upgraded to 0.2.4.7-alpha-dev and for some reason my little tor machine was _really_ busy, much more busy than usual, hence the "Your computer is too slow..." messages right before the "No circuits are opened..." message. The "No circuits are opened..." message appeared only 3 times in my log so far:

Dec 26 22:08:54.000
Dec 26 23:08:54.000 [5893 similar message(s) suppressed in last 3600 seconds]
Dec 27 00:08:54.000 [1013 similar message(s) suppressed in last 3600 seconds]

...and has not reappeared ever since. This tor machine has returned to normal workload around that time, so I suspect this message was/will only be triggered during high loads.

comment:20 Changed 7 years ago by nickm

I tried running an idle Tor 51c2dd8f8954d52fbaef29d2afb44d5eda47f4da overnight with and empty .torrc, and got some of a similar messages, though not quite so many, and with "2 guards are live" as opposed to 0. The first such message appeared about 2 hours after bootstrapping; then they came about 5-13 per hour.

I've started another idle Tor with a couple of bridges from bridges.torproject.org and no other torrc options; I'll see if this gets me a flood of the messages.

comment:21 Changed 7 years ago by nickm

Weird. The one with two bridges in the .torrc gives me:

Jan 01 10:44:26.000 [notice] Bootstrapped 100%: Done.
Jan 01 12:29:06.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 4 guards are live. 

Four guards? What's up with that? I've only got two bridges.

comment:22 in reply to:  21 Changed 7 years ago by mr-4

Replying to nickm:

Weird. The one with two bridges in the .torrc gives me:
[...]
Four guards? What's up with that? I've only got two bridges.

From experience, sometimes tor creates more than 1 "instance" (or connection) to a particular bridge. That usually happens when there is a peak load and there are many requests from the client (i.e. loading a web page containing lots of pictures), so the existing (single) bridge connection isn't enough to cope, so tor creates another one to the same bridge.

If so, I suspect for each opened connection a new guard is needed, though I might be wrong as my understanding of the inner tor workings is close to nil.

On a separate note and commenting on what Christian posted above - the (dmz) machine my tor is running on doesn't have any load issues - even though it is an old beat-up Pentium-based machine, it still takes quite a lot to slow it down, particularly when everything, including tor, is in "idle" state.

comment:23 Changed 7 years ago by nickm

Ah, here we go. I left the Tor instance with two bridges running overnight, and eventually got:

Jan 02 08:15:56.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live.  [669 similar message(s) suppressed in last 3600 seconds]
Jan 02 09:15:56.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live.  [1753 similar message(s) suppressed in last 3600 seconds]

Getting info logs now to see if the situation becomes clear.

comment:24 Changed 7 years ago by nickm

Keywords: tor-client 024-deferrable added
Owner: set to mikeperry
Status: newassigned

Mike, this is reproducible (see above). Can you start a tor instance with two bridges and info-level logging, leave it idle for a couple of days, and see if you don't get the same messages? If so, it would be cool if either you can figure out what's causing them, OR figure out what log messages to add to help figure out what's causing them.

comment:25 Changed 7 years ago by papillon

I've got the same problem on my Tor 0.2.5.0-alpha-dev, Linux based "000000000000myTOR" middleman relay:

"Mar 14 06:26:02.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [135218 similar message(s) suppressed in last 3600 seconds]
Mar 14 07:26:02.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [146378 similar message(s) suppressed in last 3600 seconds]
Mar 14 08:26:02.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [149349 similar message(s) suppressed in last 3600 seconds]
Mar 14 09:26:02.000 [notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live. [142360 similar message(s) suppressed in last 3600 seconds]"

etc.

SocksPort is set to 0 ("SocksPort 0") so i think it's normal that there are no open clients circuits.

I've build it with:

libevent-2.0.21-stable / openssl-1.0.1e / zlib-1.2.7

comment:26 Changed 7 years ago by cypherpunks

Same here , but tor configured as a client only

[notice] Tor 0.2.5.0-alpha-dev (git-331e4dcb460b5c91) opening log file.
...
[notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 13 guards are live. [5 similar message(s) suppressed in last 3600 seconds]
[notice] No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 13 guards are live. [3 similar message(s) suppressed in last 3600 seconds]
...
No circuits are opened. Relaxed timeout for a circuit with channel state open to 60000ms. However, it appears the circuit has timed out anyway. 13 guards are live. [8 similar message(s) suppressed in last 3600 seconds]

Changed 7 years ago by mikeperry

Attachment: log-circuit-info.diff added

Log more circuit info for relaxed timeouts

comment:27 Changed 7 years ago by mikeperry

The attached log diff should tell us more information about the type and length of these circuits. Could they perhaps be tunneled directory circuits for fetching the consensus or some other automatic circuit that we've been (accidentally?) relying on the timeout code to clean up for us instead of closing? My hope is that that patch will let us know that much, at least..

comment:28 Changed 7 years ago by mikeperry

Status: assignedneeds_information

comment:29 Changed 7 years ago by nickm

Status: needs_informationneeds_revision

That's mergeable IMO if it gets a changes file.

comment:30 Changed 7 years ago by mikeperry

Status: needs_revisionneeds_review

mikeperry/bug7799 (e6de8fb0251392c98276a9d32f201497149965bf).

Note I added the circuit identifier to the log line in this version of the patch, so we can tell if this is just one circuit hitting the condition over and over again for hours or if there are multiple circuits involved.

Changed 7 years ago by Christian

Attachment: tor-notices.log added

comment:31 Changed 7 years ago by Christian

I've applied log-circuit-info.diff to yesterday's git checkout and restarted Tor at 23:44 local time (see my tor-notices.log). So far this "No circuits are opened" message occured 11 times in the log (with thousands of messages supressed).

comment:32 Changed 7 years ago by mikeperry

Aha! The damn thing is an *already* opened 1-hop circuit. It's probably a tunneled dirconn, as I guessed earlier. But this helps us greatly: We certainly shouldn't be trying to relax the timeout on already opened circuits.

I pushed a commit to mikeperry/bug7799 (160a7b73715230c4509e9ca1088bd31ff1f7abaa) to fix this case. More may remain.

Here's how you run that git branch to find out:

git clone https://git.torproject.org/tor.git tor.git
cd tor.git
git remote add mikeperry https://git.torproject.org/mikeperry/tor.git
git fetch mikeperry
git branch --track bug7799 mikeperry/bug7799
git checkout bug7799
./configure
make && make install

Changed 7 years ago by mikeperry

Attachment: logfix.diff added

Alternate patch to use instead of git. Please also use the other patch if you use this directly.

comment:33 Changed 7 years ago by mikeperry

Hrmm. Trac doesn't send email about attachments.. If you want to test this without git, apply logfix.diff and log-circuit-info.diff and paste any log messages you still see.

Please also let us know if you test those patches and the problem seems to go away.

comment:34 Changed 7 years ago by Christian

I used your git branch (thanks for the mini-howto!) and a newly built Tor is already running for 30min now - I'll report back if there are any log messages.

Changed 7 years ago by Christian

Attachment: tor_notices_fd10.log added

16 hours notices.log running Tor git#bug7799

comment:35 Changed 7 years ago by Christian

Tor built from mikeperry/bug7799 runs for 16h now (still with the torrc attached to this ticket) and the messages are gone, yay! Log attached.

comment:36 Changed 7 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Rebased and merged. Thanks, everybody!

comment:37 Changed 7 years ago by Christian

For the record: this Tor instance is still running since March 28 and 2 (two) "No circuits are opened" messages appeared - but only 2 and no messages were suppressed in the log:

Mar 30 09:54:22.000 [notice] No circuits are opened. Relaxed timeout for circuit 295 (a General-purpose client 1-hop circuit
in state doing handshakes with channel state open) to 60000ms. However, it appears the circuit has timed out anyway. 1 guards are live.
Mar 31 05:03:48.000 [notice] No circuits are opened. Relaxed timeout for circuit 383 (a General-purpose client 1-hop circuit
in state doing handshakes with channel state open) to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live.
Note: See TracTickets for help on using tickets.