[...] [notice] Tor 0.2.4.9-alpha-dev (git-ee421e68d5231e39) opening log file.Jan 27 20:13:39.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up.Jan 27 20:13:43.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up.Jan 27 20:13:48.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:13:53.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:14:32.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:14:37.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:14:42.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:14:47.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:15:22.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:15:26.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:15:31.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:15:36.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:28:38.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)Jan 27 20:33:24.000 [err] connection_ap_expire_beginning(): Bug: src/or/connection_edge.c:649: connection_ap_expire_beginning: Assertion circ->purpose == CIRCUIT_PURPOSE_C_GENERAL failed; aborting.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Hm. Is it reproducible, or does it only happen some of the time? If it's reproducible, do you think you could get a stack trace, plus the contents of "*circ" at the point where the assertion occurs? (Just ask if you need some help using gdb)
Hm. Is it reproducible, or does it only happen some of the time?
I'm trying to reproduce it but with no luck atm.
When the bug occurred tor was running since ~ 8 days 6h :
... [notice] Heartbeat: Tor's uptime is 8 days 6:00 hours, with 40 circuits open. ...
I got a look to the entire log file and I found other occurrences of the following messages but tor just continued to work :
... [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up.... [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
I found also other interesting messages :
[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [3617 similar message(s) suppressed in last 600 seconds]...[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [3149 similar message(s) suppressed in last 600 seconds]...[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [11390 similar message(s) suppressed in last 600 seconds]...[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [9166 similar message(s) suppressed in last 600 seconds]...[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [10384 similar message(s) suppressed in last 600 seconds]...[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [10097 similar message(s) suppressed in last 600 seconds]
And I 'completely' forgot I added IsolateDestAddr flag to the DNS listener :
I'm adding myself to Cc because this could be due to the purpose-based timeout changes in circuit_expire_building() (#7341 (moved)) or the path bias probes (which also change circuit purpose, but normally only at circuit_mark_for_close()) in 0.2.4.9. A stack trace would still be helpful, though.
Err no, I lied. Backtrace probably won't help. We need to get the contents of *circ or the actual value of circ->purpose. connection_ap_expire_building() is only called from run_scheduled_events().
#0 0x00007fa7b328e475 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007fa7b32916f0 in *__GI_abort () at abort.c:92
#2 (closed) 0x00000000004cf03c in connection_ap_expire_beginning () at src/or/connection_edge.c:649
#3 (closed) 0x000000000040c418 in run_scheduled_events (now=1360793331) at src/or/main.c:1479
#4 (closed) 0x000000000040cb2f in second_elapsed_callback (timer=0x14ee8e0, arg=0x0) at src/or/main.c:1693
#5 (closed) 0x0000000000574ed7 in periodic_timer_cb (fd=-1, what=1, arg=0x14ee8e0) at src/common/compat_libevent.c:599
#6 (closed) 0x00007fa7b4044fcc in event_base_loop () from /tmp/tor-browser_en-US/Lib/libevent-2.0.so.5
#7 (closed) 0x000000000040d370 in do_main_loop () at src/or/main.c:1989
#8 (closed) 0x000000000040e93e in tor_main (argc=15, argv=0x7fff841a6e58) at src/or/main.c:2701
#9 (closed) 0x0000000000408f4c in main (argc=15, argv=0x7fff841a6e58) at src/or/tor_main.c:30
Plain tor built from tor-0.2.4.9-alpha.tar.gz
on Debian Wheezy.
Problem previously experienced using the binary shipped with
tor-browser-gnu-linux-x86_64-2.4.9-alpha-1-dev-en-US.tar.gz
but replaced with the plain self build binary due to a lack of symbols
in that binary.
Ok. In your case, the assert no longer happens because in 0.2.4.10-alpha we fixed the preceeding NULL by not sending path bias probes on NULL/closed channels, which is what you're inducing by changing gateways.
I am wondering if there are other cases where we could hit this assert, though. Basically any time a stream attempt is made to a slow/unresponsive host where we then give up on the circuit and try to close it (or time it out) before detaching the stream, we could hit this.
To be safe, we might want to expand this assert to allow CIRCUIT_PURPOSE_PATH_BIAS_TESTING as well as CIRCUIT_PURPOSE_C_MEASURE_TIMEOUT? I could imagine cases where circuits in both purposes might end up down this codepath. I'm surprised we haven't seen it before for CIRCUIT_PURPOSE_C_MEASURE_TIMEOUT circs, actually.
Ok. In your case, the assert no longer happens because in 0.2.4.10-alpha we fixed the preceeding NULL by not sending path bias probes on NULL/closed channels, which is what you're inducing by changing gateways.
I am wondering if there are other cases where we could hit this assert, though. Basically any time a stream attempt is made to a slow/unresponsive host where we then give up on the circuit and try to close it (or time it out) before detaching the stream, we could hit this.
If you're worried about this, I'd suggest replacing the assert with a helpful LD_BUG log message and making the code recover from this error condition sanely, rather than trying to replace the assert with another assert.
On my computer Tor 0.2.4.9-alpha crashes with the same error message a few hours after it was started. But I don't have the "outgoing relay cell has n_chan==NULL. Dropping." message in my logs. (I am not the other anonymous user who started the bug report.)
I have logged all events via the control port. The only relevant messages seem to be the following ones which happen less than one second before the crash.
xxxxxxxxx3.16222 < 650 DEBUG circuit_change_purpose(): changing purpose of origin circ n1 from "Hidden service client: Active rendezvous point" (12) to "UNKNOWN_20" (20)xxxxxxxxx3.16225 < 650 CIRC_MINOR n1 PURPOSE_CHANGED xxx;yyy;zzz BUILD_FLAGS=IS_INTERNAL,NEED_CAPACITY PURPOSE=PATH_BIAS_TESTING REND_QUERY=xxx TIME_CREATED=xxx OLD_PURPOSE=HS_CLIENT_REND OLD_HS_STATE=HSCR_JOINEDxxxxxxxxx3.16228 < 650 INFO pathbias_send_usable_probe(): Sending pathbias testing cell to xxx on stream xxx for circ n1. xxxxxxxxx4.12455 < 650 ERR connection_ap_expire_beginning(): Bug: src/or/connection_edge.c:654: connection_ap_expire_beginning: Assertion circ->purpose == CIRCUIT_PURPOSE_C_GENERAL failed; aborting.
If you're worried about this, I'd suggest replacing the assert with a helpful LD_BUG log message and making the code recover from this error condition sanely, rather than trying to replace the assert with another assert.
Done in branch "bug8065" in my public repository; please review?
Ok. In your case, the assert no longer happens because in 0.2.4.10-alpha we fixed the preceeding NULL by not sending path bias probes on NULL/closed channels, which is what you're inducing by changing gateways.
I am wondering if there are other cases where we could hit this assert, though. Basically any time a stream attempt is made to a slow/unresponsive host where we then give up on the circuit and try to close it (or time it out) before detaching the stream, we could hit this.
To be safe, we might want to expand this assert to allow CIRCUIT_PURPOSE_PATH_BIAS_TESTING as well as CIRCUIT_PURPOSE_C_MEASURE_TIMEOUT? I could imagine cases where circuits in both purposes might end up down this codepath. I'm surprised we haven't seen it before for CIRCUIT_PURPOSE_C_MEASURE_TIMEOUT circs, actually.
Hi mike,
you may be interested in my reply to #8136 (moved) - the crash happened on 0.2.4.10-alpha.