Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#8065 closed defect (fixed)

Bug: src/or/connection_edge.c:649: connection_ap_expire_beginning: Assertion circ->purpose == CIRCUIT_PURPOSE_C_GENERAL failed

Reported by: cypherpunks Owned by:
Priority: High Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.9-alpha
Severity: Keywords: tor-client
Cc: mikeperry Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Tor client crashed with the following message :

[...] [notice] Tor 0.2.4.9-alpha-dev (git-ee421e68d5231e39) opening log file.


Jan 27 20:13:39.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up.
Jan 27 20:13:43.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up.
Jan 27 20:13:48.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:13:53.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:14:32.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:14:37.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:14:42.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:14:47.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:15:22.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:15:26.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:15:31.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:15:36.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:28:38.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
Jan 27 20:33:24.000 [err] connection_ap_expire_beginning(): Bug: src/or/connection_edge.c:649: connection_ap_expire_beginning: Assertion circ->purpose == CIRCUIT_PURPOSE_C_GENERAL failed; aborting.

Child Tickets

Change History (17)

comment:1 Changed 7 years ago by nickm

Milestone: Tor: 0.2.4.x-final
Priority: normalmajor

What options did you use to build Tor? Can you get a stack trace on this? Were you opening any connections and/or using Tor at the time?

comment:2 in reply to:  1 Changed 7 years ago by cypherpunks

Replying to nickm:

What options did you use to build Tor? Can you get a stack trace on this? Were you opening any connections and/or using Tor at the time?

CFLAGS += -O3 -fno-delete-null-pointer-checks

	./configure \
		--prefix=/usr \
		--mandir=\$${prefix}/share/man \
		--infodir=\$${prefix}/share/info \
		--localstatedir=/var \
		--sysconfdir=/etc \
		--disable-upnp \
		--disable-nat-pmp \
		--enable-openbsd-malloc 

I was using tor when the crash occurred.

comment:3 Changed 7 years ago by cypherpunks

I forgot to mention I'm using Tor also as local name resolver ( DnsPort 127.0.0.1:53 ) : the crash occurred while trying to resolve a name

comment:4 Changed 7 years ago by nickm

Hm. Is it reproducible, or does it only happen some of the time? If it's reproducible, do you think you could get a stack trace, plus the contents of "*circ" at the point where the assertion occurs? (Just ask if you need some help using gdb)

comment:5 in reply to:  4 Changed 7 years ago by cypherpunks

Replying to nickm:

Hm. Is it reproducible, or does it only happen some of the time?

I'm trying to reproduce it but with no luck atm.

When the bug occurred tor was running since ~ 8 days 6h :

... [notice] Heartbeat: Tor's uptime is 8 days 6:00 hours, with 40 circuits open. ...

I got a look to the entire log file and I found other occurrences of the following messages but tor just continued to work :

... [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up.
... [notice] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)

I found also other interesting messages :

[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [3617 similar message(s) suppressed in last 600 seconds]
...
[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [3149 similar message(s) suppressed in last 600 seconds]
...
[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [11390 similar message(s) suppressed in last 600 seconds]
...
[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [9166 similar message(s) suppressed in last 600 seconds]
...
[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [10384 similar message(s) suppressed in last 600 seconds]
...
[notice] We'd like to launch a circuit to handle a connection, but we already have 32 general-purpose client circuits pending. Waiting until some finish. [10097 similar message(s) suppressed in last 600 seconds]

And I 'completely' forgot I added IsolateDestAddr flag to the DNS listener :

DNSPort 127.0.0.1:53 IsolateDestAddr

comment:6 Changed 7 years ago by mikeperry

Cc: mikeperry added

I'm adding myself to Cc because this could be due to the purpose-based timeout changes in circuit_expire_building() (#7341) or the path bias probes (which also change circuit purpose, but normally only at circuit_mark_for_close()) in 0.2.4.9. A stack trace would still be helpful, though.

comment:7 Changed 7 years ago by mikeperry

Err no, I lied. Backtrace probably won't help. We need to get the contents of *circ or the actual value of circ->purpose. connection_ap_expire_building() is only called from run_scheduled_events().

comment:8 Changed 7 years ago by rransom

See also #8136 (probable duplicate, possibly with more information).

comment:9 Changed 7 years ago by murble

I can reproduce this bug on demand.

I have a network with 2 independent gateways to the internet.

Running the causing some tor traffic then switching gateways
and back again on the host

for instance
ip route change default 10.23.23.1
# wait some time
ip route change default 10.23.23.254

causes:

Feb 13 23:08:41.568 [Warning] circuit_package_relay_cell(): Bug: outgoing relay cell has n_chan==NULL. Dropping.

which a few seconds later results in:

Feb 13 23:08:51.445 [Error] connection_ap_expire_beginning(): Bug: src/or/connection_edge.c:649: connection_ap_expire_beginning: Assertion circ->purpose == CIRCUIT_PURPOSE_C_GENERAL failed; aborting.

backtrace:

#0 0x00007fa7b328e475 in *GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007fa7b32916f0 in *
GI_abort () at abort.c:92
#2 0x00000000004cf03c in connection_ap_expire_beginning () at src/or/connection_edge.c:649
#3 0x000000000040c418 in run_scheduled_events (now=1360793331) at src/or/main.c:1479
#4 0x000000000040cb2f in second_elapsed_callback (timer=0x14ee8e0, arg=0x0) at src/or/main.c:1693
#5 0x0000000000574ed7 in periodic_timer_cb (fd=-1, what=1, arg=0x14ee8e0) at src/common/compat_libevent.c:599
#6 0x00007fa7b4044fcc in event_base_loop () from /tmp/tor-browser_en-US/Lib/libevent-2.0.so.5
#7 0x000000000040d370 in do_main_loop () at src/or/main.c:1989
#8 0x000000000040e93e in tor_main (argc=15, argv=0x7fff841a6e58) at src/or/main.c:2701
#9 0x0000000000408f4c in main (argc=15, argv=0x7fff841a6e58) at src/or/tor_main.c:30

I was unable to reproduce with 0.2.4.10

comment:10 Changed 7 years ago by mikeperry

I assume that backtrace is from within gdb? Can you run the following in gdb once you get the assert:

up
up
print circ->purpose
print ((origin_circuit_t*)circ)->path_state
print ((origin_circuit_t*)circ)->pathbias_probe_id

Maybe throw in a 'print *circ' and/or 'print *((origin_circuit_t*)circ)', but those may contain sensitive data (such as your guard node info).

I'll also need the tor version you're crashing on. Git hash too, if you randomly built from origin/master.

comment:11 Changed 7 years ago by murble

(gdb) up                                                                                                                                                      
#1  0x00007fa7b32916f0 in *__GI_abort () at abort.c:92
92      abort.c: No such file or directory.
(gdb) up
#2  0x00000000004cf03c in connection_ap_expire_beginning () at src/or/connection_edge.c:649
649         tor_assert(circ->purpose == CIRCUIT_PURPOSE_C_GENERAL);
(gdb) print circ->purpose
$1 = 20 '\024'
(gdb) print ((origin_circuit_t*)circ)->path_state
$2 = PATH_STATE_BUILD_SUCCEEDED
(gdb) print ((origin_circuit_t*)circ)->pathbias_probe_id
$3 = 36985
(gdb) print *circ
$4 = {magic = 892424771, n_chan = 0x0, n_circ_id = 0, n_mux = 0x0, n_chan_cells = {head = 0x0, tail = 0x0, n = 0, insertion_times = 0x0}, 
  n_hop = 0x0, streams_blocked_on_n_chan = 0, streams_blocked_on_p_chan = 0, state = 3 '\003', purpose = 20 '\024', package_window = 1000, 
  deliver_window = 1000, n_chan_create_cell = 0x0, timestamp_began = {tv_sec = 1360793321, tv_usec = 553414}, timestamp_created = {
    tv_sec = 1360793230, tv_usec = 441237}, timestamp_dirty = 1360793321, marked_for_close = 0, marked_for_close_file = 0x0, dirreq_id = 0, 
  next = 0x0, next_active_on_n_chan = 0x0, prev_active_on_n_chan = 0x0}
(gdb) print *((origin_circuit_t*)circ)
$5 = {base_ = {magic = 892424771, n_chan = 0x0, n_circ_id = 0, n_mux = 0x0, n_chan_cells = {head = 0x0, tail = 0x0, n = 0, 
      insertion_times = 0x0}, n_hop = 0x0, streams_blocked_on_n_chan = 0, streams_blocked_on_p_chan = 0, state = 3 '\003', purpose = 20 '\024', 
    package_window = 1000, deliver_window = 1000, n_chan_create_cell = 0x0, timestamp_began = {tv_sec = 1360793321, tv_usec = 553414}, 
    timestamp_created = {tv_sec = 1360793230, tv_usec = 441237}, timestamp_dirty = 1360793321, marked_for_close = 0, 
    marked_for_close_file = 0x0, dirreq_id = 0, next = 0x0, next_active_on_n_chan = 0x0, prev_active_on_n_chan = 0x0}, p_streams = 0x1776b20, 
  build_state = 0x11343a0, cpath = 0x11344a0, rend_data = 0x0, remaining_relay_early_cells = 0, is_ancient = 0, has_opened = 1, 
  path_state = PATH_STATE_BUILD_SUCCEEDED, pathbias_probe_id = 36985, pathbias_probe_nonce = 6881800, hs_circ_has_timed_out = 0, 
  relaxed_timeout = 0, hs_service_side_rend_circ_has_been_relaunched = 0, relay_early_commands = "\006\006\001\001\001\001\001\001", 
  relay_early_cells_sent = 8, next_stream_id = 36986, intro_key = 0x0, global_identifier = 8, isolation_values_set = 1, 
  isolation_any_streams_attached = 1, isolation_flags_mixed = 2 '\002', client_proto_type = 6 '\006', client_proto_socksver = 5 '\005', 
  dest_port = 80, client_addr = {family = 2, addr = {dummy_ = 16777343, in_addr = {s_addr = 16777343}, in6_addr = {__in6_u = {
          __u6_addr8 = "\177\000\000\001", '\000' <repeats 11 times>, __u6_addr16 = {127, 256, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {16777343, 0, 0, 
            0}}}}}, dest_address = 0x1138810 "static.bbci.co.uk", session_group = -4, nym_epoch = 0, socks_username_len = 0, 
  socks_password_len = 0 '\000', socks_username = 0x0, socks_password = 0x0, associated_isolated_stream_global_id = 147}

Plain tor built from tor-0.2.4.9-alpha.tar.gz
on Debian Wheezy.

Problem previously experienced using the binary shipped with
tor-browser-gnu-linux-x86_64-2.4.9-alpha-1-dev-en-US.tar.gz
but replaced with the plain self build binary due to a lack of symbols
in that binary.

comment:12 Changed 7 years ago by mikeperry

Ok. In your case, the assert no longer happens because in 0.2.4.10-alpha we fixed the preceeding NULL by not sending path bias probes on NULL/closed channels, which is what you're inducing by changing gateways.

I am wondering if there are other cases where we could hit this assert, though. Basically any time a stream attempt is made to a slow/unresponsive host where we then give up on the circuit and try to close it (or time it out) before detaching the stream, we could hit this.

To be safe, we might want to expand this assert to allow CIRCUIT_PURPOSE_PATH_BIAS_TESTING as well as CIRCUIT_PURPOSE_C_MEASURE_TIMEOUT? I could imagine cases where circuits in both purposes might end up down this codepath. I'm surprised we haven't seen it before for CIRCUIT_PURPOSE_C_MEASURE_TIMEOUT circs, actually.

comment:13 in reply to:  12 ; Changed 7 years ago by nickm

Replying to mikeperry:

Ok. In your case, the assert no longer happens because in 0.2.4.10-alpha we fixed the preceeding NULL by not sending path bias probes on NULL/closed channels, which is what you're inducing by changing gateways.

I am wondering if there are other cases where we could hit this assert, though. Basically any time a stream attempt is made to a slow/unresponsive host where we then give up on the circuit and try to close it (or time it out) before detaching the stream, we could hit this.

If you're worried about this, I'd suggest replacing the assert with a helpful LD_BUG log message and making the code recover from this error condition sanely, rather than trying to replace the assert with another assert.

comment:14 Changed 7 years ago by cypherpunks

On my computer Tor 0.2.4.9-alpha crashes with the same error message a few hours after it was started. But I don't have the "outgoing relay cell has n_chan==NULL. Dropping." message in my logs. (I am not the other anonymous user who started the bug report.)

I have logged all events via the control port. The only relevant messages seem to be the following ones which happen less than one second before the crash.

xxxxxxxxx3.16222 < 650 DEBUG circuit_change_purpose(): changing purpose of origin circ n1 from "Hidden service client: Active rendezvous point" (12) to "UNKNOWN_20" (20)
xxxxxxxxx3.16225 < 650 CIRC_MINOR n1 PURPOSE_CHANGED xxx;yyy;zzz BUILD_FLAGS=IS_INTERNAL,NEED_CAPACITY PURPOSE=PATH_BIAS_TESTING REND_QUERY=xxx TIME_CREATED=xxx OLD_PURPOSE=HS_CLIENT_REND OLD_HS_STATE=HSCR_JOINED
xxxxxxxxx3.16228 < 650 INFO pathbias_send_usable_probe(): Sending pathbias testing cell to xxx on stream xxx for circ n1. 
xxxxxxxxx4.12455 < 650 ERR connection_ap_expire_beginning(): Bug: src/or/connection_edge.c:654: connection_ap_expire_beginning: Assertion circ->purpose == CIRCUIT_PURPOSE_C_GENERAL failed; aborting. 

comment:15 in reply to:  13 Changed 7 years ago by nickm

Status: newneeds_review

Replying to nickm:

If you're worried about this, I'd suggest replacing the assert with a helpful LD_BUG log message and making the code recover from this error condition sanely, rather than trying to replace the assert with another assert.

Done in branch "bug8065" in my public repository; please review?

comment:16 Changed 7 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Mike likes it; Merged it with a revised changes file from mike. Thanks, all!

comment:17 in reply to:  12 Changed 7 years ago by cypherpunks

Replying to mikeperry:

Ok. In your case, the assert no longer happens because in 0.2.4.10-alpha we fixed the preceeding NULL by not sending path bias probes on NULL/closed channels, which is what you're inducing by changing gateways.

I am wondering if there are other cases where we could hit this assert, though. Basically any time a stream attempt is made to a slow/unresponsive host where we then give up on the circuit and try to close it (or time it out) before detaching the stream, we could hit this.

To be safe, we might want to expand this assert to allow CIRCUIT_PURPOSE_PATH_BIAS_TESTING as well as CIRCUIT_PURPOSE_C_MEASURE_TIMEOUT? I could imagine cases where circuits in both purposes might end up down this codepath. I'm surprised we haven't seen it before for CIRCUIT_PURPOSE_C_MEASURE_TIMEOUT circs, actually.

Hi mike,
you may be interested in my reply to #8136 - the crash happened on 0.2.4.10-alpha.

Note: See TracTickets for help on using tickets.