Opened 2 years ago

Closed 19 months ago

Last modified 13 months ago

#16013 closed defect (fixed)

Assertion ENTRY_TO_CONN(conn)->state == AP_CONN_STATE_CIRCUIT_WAIT failed in circuit_get_open_circ_or_launch at circuituse.c:1835

Reported by: cypherpunks Owned by: nickm
Priority: Medium Milestone: Tor: 0.2.6.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Keywords: regression, 026-backport, 2016-bug-retrospective
Cc: adrelanos@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

[Τετ 13. Μαϊ 20:41:28 2015] Tor Software Error - The Tor software encountered an internal bug. Please report the following error message to the Tor developers at bugs.torproject.org: "tor_assertion_failed_(): Bug: circuituse.c:1835: circuit_get_open_circ_or_launch: Assertion ENTRY_TO_CONN(conn)->state == AP_CONN_STATE_CIRCUIT_WAIT failed; aborting.
"
[Τετ 13. Μαϊ 20:41:28 2015] Tor Software Error - The Tor software encountered an internal bug. Please report the following error message to the Tor developers at bugs.torproject.org: "Bug: Assertion ENTRY_TO_CONN(conn)->state == AP_CONN_STATE_CIRCUIT_WAIT failed in circuit_get_open_circ_or_launch at circuituse.c:1835. (Stack trace not available)
"
[Τετ 13. Μαϊ 20:43:13 2015] The Tor Software is not Running - Click "Start Tor" in the Vidalia Control Panel to restart the Tor software. If Tor exited unexpectedly, select the "Advanced" tab above for details about any errors encountered.

Child Tickets

Attachments (5)

messages-obfus.txt (1.9 KB) - added by starlight 23 months ago.
log messages
gdb-trace-obfus.txt (4.1 KB) - added by starlight 23 months ago.
stack trace
logfile_bug16013-obfus.xz (542.1 KB) - added by starlight 23 months ago.
log from 0.2.7.1-alpha Win32 client-only reproduce
try_this_one_for_16013.diff (645 bytes) - added by nickm 23 months ago.
tor-0.2.6.0-bug16013-cand01.patch (646 bytes) - added by starlight 23 months ago.
tweak fixes to patch canidate

Download all attachments as: .zip

Change History (42)

comment:1 follow-up: Changed 2 years ago by yawning

  • Component changed from - Select a component to Tor
  • Status changed from new to needs_information
  • Summary changed from bugs to Assertion ENTRY_TO_CONN(conn)->state == AP_CONN_STATE_CIRCUIT_WAIT failed in circuit_get_open_circ_or_launch at circuituse.c:1835
  • Version set to Tor: unspecified

What version of Tor are you running, on what operating system, and what were you doing at the time?

comment:2 Changed 2 years ago by nickm

Is this with Vidalia? If so, does it happen outside of vidalia?

(The line number seems to match up with tor 0.2.6.latest)

comment:3 Changed 2 years ago by arma

I think people are still waiting for details here.

(Meaning, you seem to be spamming trac and generally making a nuisance of yourself, which makes people uninterested in being helpful. Please consider being helpful instead.)

comment:4 Changed 2 years ago by asn

Καλησπέρα! Μπορείς να διαβάσεις αυτά τα μηνύματα?

Αν ναι, σταμάτα να ανοίγεις νέα bug reports για το ίδιο πράγμα κάθε μέρα! Ευχαριστώ!
Αν προσέξεις, στα άλλα comments σου έχουν κάνει κάποιες ερωτήσεις που θα μας φέρουν πίο κοντά στο να λύσουμε το πρόβλημα σου!

Σε παρακαλώ απάντησε τις εξής ερωτήσεις:

  • Αν έχεις την τελευταία έκδοση και το πρόβλημα συνεχίζεται, τοτε πες μας ποιά έκδοση έχεις, σε ποιο λειτουργικό σύστημα το τρέχεις (Windows, Mac, Linux, ?) και τι έκανες όταν εμφανίστηκε το πρόβλημα.

Μην αγνοήσεις αυτές τις ερωτήσεις! Θέλουμε να λύσουμε το πρόβλημα σου!

Ευχαριστώ!

here is a localized attempt

comment:5 Changed 23 months ago by cypherpunks

I hit this same assert fail, output here:

Jun 04 XX:39:28 host Tor[310]: Received reload signal (hup). Reloading config and resetting internal state.
Jun 04 XX:39:28 host Tor[310]: Read configuration file "/etc/tor/torrc".
Jun 04 XX:45:03 host Tor[310]: New control connection opened from 127.0.0.1.
Jun 04 XX:50:41 host Tor[310]: New control connection opened from 127.0.0.1.
Jun 04 XX:50:42 host Tor[310]: Query '[scrubbed]' didn't have valid rend desc in cache. Failing.
Jun 04 XX:50:42 host Tor[310]: tor_assertion_failed_(): Bug: src/or/circuituse.c:1835: circuit_get_open_circ_or_launch: Assertion ENTRY_TO_CONN(conn)->state == AP_CONN_STATE_CIRCUIT_WAIT failed; aborting.
Jun 04 XX:50:42 host Tor[310]: Bug: Assertion ENTRY_TO_CONN(conn)->state == AP_CONN_STATE_CIRCUIT_WAIT failed in circuit_get_open_circ_or_launch at src/or/circuituse.c:1835. Stack trace:
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(log_backtrace+0x41) [0x2661fa8631]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(tor_assertion_failed_+0x8c) [0x2661fb62fc]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(+0xb7a8f) [0x2661f45a8f]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(connection_ap_handshake_attach_circuit+0x2d9) [0x2661f46529]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(connection_ap_attach_pending+0x79) [0x2661f64159]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(circuit_build_needed_circs+0x38) [0x2661f45448]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(+0x360d8) [0x2661ec40d8]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/lib/libevent-2.0.so.5(event_base_loop+0x806) [0x2badaa5dca6]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(do_main_loop+0x17d) [0x2661ec4bed]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(tor_main+0x16a5) [0x2661ec79f5]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/lib/libc.so.6(__libc_start_main+0xf0) [0x2bad9b8b790]
Jun 04 XX:50:42 host Tor[310]: Bug:     /usr/bin/tor(_start+0x29) [0x2661ec1139]
Jun 04 XX:50:42 host systemd[1]: tor.service: main process exited, code=killed, status=6/ABRT
Jun 04 XX:50:42 host systemd[1]: Unit tor.service entered failed state.
Jun 04 XX:50:42 host systemd[1]: tor.service failed.

When it occurred:

added a new HidServAuth torrc config line for a stealth hs (foo.onion)
ran `kill -SIGHUP $TOR_PID` to force a config reload
ran `torsocks ssh foo.onion`

Arch Linux, x86_64
Tor v0.2.6.8 (git-5c8440b13bde6c88) running on Linux with Libevent 2.0.22-stable, OpenSSL 1.0.2a and Zlib 1.2.8.

edit: should note, possibly I hit the old .onion address because I forgot to update my shell envvar that I store the onion address in first, I don't recall exactly, sorry.
Last edited 23 months ago by cypherpunks (previous) (diff)

comment:6 Changed 23 months ago by yawning

  • Reporter changed from slrsab to cypherpunks

Note: Changing the reporter to cypherpunks so I can obliterate the original submitter since they have chosen to consistently spam trac with duplicate tickets for weeks instead of replying with useful information.

Also thanks for another case that triggers the assert, that should be a starting point.

comment:7 Changed 23 months ago by nickm

  • Keywords regression 026-backport added
  • Milestone set to Tor: 0.2.7.x-final

comment:8 Changed 23 months ago by proper

  • Cc adrelanos@… added

Same issue.

  • Debian wheezy.
  • tor 0.2.6.8-5~d70.wheezy+1
  • Tor v0.2.6.8 (git-69b46cf274f35955) running on Linux with Libevent 2.0.19-stable, OpenSSL 1.0.1e and Zlib 1.2.7.

comment:9 in reply to: ↑ 1 Changed 23 months ago by proper

Replying to yawning:

what were you doing at the time?

Get new circuit using arm.

Replying to nickm:

Is this with Vidalia? If so, does it happen outside of vidalia?

Vidalia not involved.

comment:10 Changed 23 months ago by nickm

I added a log message in commit aab7d666cd51fdccdb735f7fa2c59f93d0c450e9 that should log a little more info if this error recurs. To see it, reproduce the bug with git master (or with 0.2.7.2-alpha when it's out).

Looking at the code and the stack trace, I see that we seem to be calling circuit_get_open_circ_or_launch() from connection_ap_handshake_attach_circuit(). But connection_ap_handshake_attach_circuit() already did tor_assert(base_conn->state == AP_CONN_STATE_CIRCUIT_WAIT);. So all I can figure is that either we're passing a different connection (!?), or the state has changed between the start of connection_ap_handshake_attach_circuit() and the call to circuit_get_open_circ_or_launch().

comment:11 Changed 23 months ago by nickm

Or conceivably, this is in one of the calls to circuit_get_open_circ_or_launch() from within circuit_get_open_circ_or_launch(). But if that's happening, there should be an info-level log message. Maybe also try reproducing the bug with info-level logging?

comment:12 Changed 23 months ago by starlight

I hit it too.

Just started relay with 0.2.6.9 two days ago.
No hidden services locally, bug occurred on client
browsing activity concurrent with moderate relay
activity. Client and Videlia running on
different systems.

Log messages and gdb stack traces attached in
obfuscated form. Have a complete intact core
file and supporting set sysroot bundle
and am willing to provide that privately.
Please contact me at starlight dot 2015q2
at binnacle dot cx.

First reported as #16365 and yawing
identified it as this issue.

Changed 23 months ago by starlight

log messages

Changed 23 months ago by starlight

stack trace

comment:13 Changed 23 months ago by starlight

This bug _might_ be reproducible.

It depends on accessing a particular site
with a somewhat uncommon client configuration.

Same config and site access gives 0.2.4.27
and 0.2.5.12 fits, though eventually it
works. Have fallen back to old version
because 0.2.6.9 fails consistently,
though in a different manner than the
bug-trap so far.

Possibly the site in question is currently
under DDOS and so the opportunity to reproduce
may be time-limited to the attack.

Contact me privately at starlight dot 2015q2
at binnacle dot cx for the details.

comment:14 Changed 23 months ago by nickm

Actually, what would help most would be reproducing it using latest git master, or with 'aab7d666cd51fdccdb735f7fa2c59f93d0c450e9 ' backported to 0.2.6.9, and noting the info-level log messages.

comment:15 Changed 23 months ago by starlight

Sorry but I can afford to put any more
time into this. Don't see any reason
why it can't be reproduced by someone
else with the current code.

Won't post the link triggering the
bug here, but there's nothing problematic
about it for residents of liberal
democracies. Likewise, nothing
troublesome appears in a
strings -a of the core file, which
naturally has comprehensive details
of the state of the daemon when
the failure occurred.

Have gone back to 0.2.5.12 and plan
on staying with at for some time.

comment:16 Changed 23 months ago by starlight

Occurred to me how to characterize the
traffic causing the problem and how it
might be reproduced in test:

Arrange for HS to fulfill 500 simultaneous
requests for about 50-100KB, each via
a separate connection. Set up the
client to use only two hard-coded
guard nodes. Maybe run this a couple
of times and mix in other traffic.

This seems to kick in some anti-DDOS
measure on one or both guards and
the messages

Your Guard is failing more circuits than usual. . .
We'd like to launch a circuit . . . already have 128 . . . pending. Waiting until some finish.

might appear. . .or might not appear.
Might require NumCPUs be set 2 or greater, might not.
Might require MaxClientCircuitsPending 128.

comment:17 follow-up: Changed 23 months ago by starlight

one might also see

Closing stream for '[scrubbed].onion': hidden service is unavailable (try again later).

Changed 23 months ago by starlight

log from 0.2.7.1-alpha Win32 client-only reproduce

comment:18 Changed 23 months ago by starlight

Ran into the problem again under
different setup.

Uploaded info level log file of the bug
reproduced with Win32 0.2.7.1-alpha
in client-only operation. No assertion
trap, but starting at line 132673 (69%)
of the log file several

[warn] Query '[scrubbed]' didn't have valid rend desc in cache. Failing.

messages appear and the relay fails
to function or recover from that point
forward. The above message appears
roughly the same number of times in
log immediately preceding the assertion
trap.

comment:19 Changed 23 months ago by starlight

100% reproducible

[err] circuit_get_open_circ_or_launch(): Bug: Connection state mismatch: wanted AP_CONN_STATE_CIRCUIT_WAIT, but got 6 (waiting for rendezvous desc)
Jun 21 19:22:52.000 [debug] conn_write_callback(): socket 12 wants to write.
Jun 21 19:22:53.000 [debug] circuit_get_open_circ_or_launch(): one on the way!
Jun 21 19:22:53.000 [info] connection_ap_handshake_attach_circuit(): Intro circ 3044306830 present and awaiting ack (rend 3615842360). Stalling. (stream 7 sec old)
Jun 21 19:22:53.000 [debug] circuit_get_open_circ_or_launch(): one on the way!
Jun 21 19:22:53.000 [info] connection_ap_handshake_attach_circuit(): Intro circ 3044306830 present and awaiting ack (rend 3615842360). Stalling. (stream 7 sec old)
Jun 21 19:22:53.000 [debug] conn_write_callback(): socket 12 wants to write.
Jun 21 19:22:54.000 [debug] circuit_get_open_circ_or_launch(): one on the way!
Jun 21 19:22:54.000 [info] connection_ap_handshake_attach_circuit(): Intro circ 3044306830 present and awaiting ack (rend 3615842360). Stalling. (stream 8 sec old)
Jun 21 19:22:54.000 [debug] circuit_get_open_circ_or_launch(): one on the way!
Jun 21 19:22:54.000 [info] connection_ap_handshake_attach_circuit(): Intro circ 3044306830 present and awaiting ack (rend 3615842360). Stalling. (stream 8 sec old)
Jun 21 19:22:54.000 [debug] conn_write_callback(): socket 12 wants to write.
Jun 21 19:22:55.000 [debug] circuit_get_open_circ_or_launch(): one on the way!
Jun 21 19:22:55.000 [info] connection_ap_handshake_attach_circuit(): Intro circ 3044306830 present and awaiting ack (rend 3615842360). Stalling. (stream 9 sec old)
Jun 21 19:22:55.000 [debug] circuit_get_open_circ_or_launch(): one on the way!
Jun 21 19:22:55.000 [info] connection_ap_handshake_attach_circuit(): Intro circ 3044306830 present and awaiting ack (rend 3615842360). Stalling. (stream 9 sec old)
Jun 21 19:22:55.000 [debug] conn_write_callback(): socket 12 wants to write.
Jun 21 19:22:56.000 [debug] conn_read_callback(): socket 12 wants to read.
Jun 21 19:22:56.000 [debug] read_to_chunk(): Read 15 bytes. 15 on inbuf.
Jun 21 19:22:56.000 [info] rend_cache_purge(): Purging HS descriptor cache
Jun 21 19:22:56.000 [info] rend_client_purge_last_hid_serv_requests(): Purging client last-HS-desc-request-time table
Jun 21 19:22:56.000 [debug] conn_write_callback(): socket 12 wants to write.
Jun 21 19:22:56.000 [debug] circuit_find_to_cannibalize(): Hunting for a circ to cannibalize: purpose 9, uptime 0, capacity 1, internal 1
Jun 21 19:22:56.000 [info] circuit_launch_by_extend_info(): Cannibalizing circ 'AAAPolicy' for purpose 9 (Hidden service client: Establishing rendezvous point)
Jun 21 19:22:56.000 [debug] circuit_change_purpose(): changing purpose of origin circ 48 from "General-purpose client" (5) to "Hidden service client: Establishing rendezvous point" (9)
Jun 21 19:22:56.000 [info] rend_client_rendcirc_has_opened(): rendcirc is open
Jun 21 19:22:56.000 [info] rend_client_send_establish_rendezvous(): Sending an ESTABLISH_RENDEZVOUS cell
Jun 21 19:22:56.000 [debug] pathbias_count_use_attempt(): Marked circuit 48 (7.000000/15.000000) as used for guard Pascal8 ($7B2553AEA4A3FC9E1B513D772ECA652C8CC15337).
Jun 21 19:22:56.000 [debug] relay_send_command_from_edge_(): delivering 33 cell forward.
Jun 21 19:22:56.000 [debug] relay_send_command_from_edge_(): Sending a RELAY_EARLY cell; 4 remaining.
Jun 21 19:22:56.000 [debug] circuit_package_relay_cell(): crypting a layer of the relay cell.
Jun 21 19:22:56.000 [debug] circuit_package_relay_cell(): crypting a layer of the relay cell.
Jun 21 19:22:56.000 [debug] circuit_package_relay_cell(): crypting a layer of the relay cell.
Jun 21 19:22:56.000 [debug] append_cell_to_circuit_queue(): Made a circuit active.
Jun 21 19:22:56.000 [debug] scheduler_channel_has_waiting_cells(): Channel 0 at 0x614000087240 went from waiting_for_cells to pending
Jun 21 19:22:56.000 [debug] circuit_get_open_circ_or_launch(): one on the way!
Jun 21 19:22:56.000 [warn] Query '[scrubbed]' didn't have valid rend desc in cache. Failing.
Jun 21 19:22:56.000 [info] circuit_get_open_circ_or_launch(): No intro points for '[scrubbed]': re-fetching service descriptor.
Jun 21 19:22:56.000 [debug] rend_client_refetch_v2_renddesc(): Fetching v2 rendezvous descriptor for service [scrubbed]
Jun 21 19:22:56.000 [debug] directory_initiate_command_rend(): anonymized 1, use_begindir 1.
Jun 21 19:22:56.000 [debug] directory_initiate_command_rend(): Initiating hidden-service v2 descriptor fetch
Jun 21 19:22:56.000 [info] connection_ap_make_link(): Making internal anonymized tunnel to [scrubbed]:9001 ...
Jun 21 19:22:56.000 [debug] connection_add_impl(): new conn type Socks, socket -1, address (Tor_internal), n_conns 10.
Jun 21 19:22:56.000 [debug] extend_info_from_node(): using 141.255.189.161:9001 for OxylEpsilon
Jun 21 19:22:56.000 [debug] circuit_find_to_cannibalize(): Hunting for a circ to cannibalize: purpose 5, uptime 0, capacity 1, internal 1
Jun 21 19:22:56.000 [info] circuit_launch_by_extend_info(): Cannibalizing circ 'redjohn1' for purpose 5 (General-purpose client)
Jun 21 19:22:56.000 [debug] circuit_send_next_onion_skin(): starting to send subsequent skin.
Jun 21 19:22:56.000 [info] circuit_send_next_onion_skin(): Sending extend relay cell.
Jun 21 19:22:56.000 [debug] relay_send_command_from_edge_(): delivering 14 cell forward.
Jun 21 19:22:56.000 [debug] relay_send_command_from_edge_(): Sending a RELAY_EARLY cell; 5 remaining.
Jun 21 19:22:56.000 [debug] circuit_package_relay_cell(): crypting a layer of the relay cell.
Jun 21 19:22:56.000 [debug] circuit_package_relay_cell(): crypting a layer of the relay cell.
Jun 21 19:22:56.000 [debug] circuit_package_relay_cell(): crypting a layer of the relay cell.
Jun 21 19:22:56.000 [debug] append_cell_to_circuit_queue(): Made a circuit active.
Jun 21 19:22:56.000 [debug] scheduler_channel_has_waiting_cells(): Channel 1 at 0x614000086440 went from waiting_for_cells to pending
Jun 21 19:22:56.000 [info] connection_ap_make_link(): ... application connection created and linked.
Jun 21 19:22:56.000 [debug] connection_add_impl(): new conn type Directory, socket -1, address 141.255.189.161, n_conns 11.
Jun 21 19:22:56.000 [info] directory_get_from_hs_dir(): Sending fetch request for v2 descriptor for service 'v4sciq74tvcydqip' with descriptor ID 'soxxwg56bxmusmbapm2uc2xay527nole', auth type 0, and descriptor cookie '[none]' to hidden service directory $93C3E9A094D9B3F01785DB7580101F93D8D19A89~OxylEpsilon at 141.255.189.161
Jun 21 19:22:56.000 [info] connection_ap_handshake_attach_circuit(): Intro (0) and rend (3216239652) circs are not both ready. Stalling conn. (10 sec old)
Jun 21 19:22:56.000 [debug] circuit_get_open_circ_or_launch(): one on the way!
Jun 21 19:22:56.000 [warn] Query '[scrubbed]' didn't have valid rend desc in cache. Failing.
Jun 21 19:22:56.000 [info] circuit_get_open_circ_or_launch(): No intro points for '[scrubbed]': re-fetching service descriptor.
Jun 21 19:22:56.000 [debug] rend_client_refetch_v2_renddesc(): Fetching v2 rendezvous descriptor for service [scrubbed]
Jun 21 19:22:56.000 [debug] directory_initiate_command_rend(): anonymized 1, use_begindir 1.
Jun 21 19:22:56.000 [debug] directory_initiate_command_rend(): Initiating hidden-service v2 descriptor fetch
Jun 21 19:22:56.000 [info] connection_ap_make_link(): Making internal anonymized tunnel to [scrubbed]:9001 ...
Jun 21 19:22:56.000 [debug] connection_add_impl(): new conn type Socks, socket -1, address (Tor_internal), n_conns 12.
Jun 21 19:22:56.000 [debug] extend_info_from_node(): using 85.228.195.5:9001 for hashmaps
Jun 21 19:22:56.000 [debug] circuit_find_to_cannibalize(): Hunting for a circ to cannibalize: purpose 5, uptime 0, capacity 1, internal 1
Jun 21 19:22:56.000 [debug] new_route_len(): Chosen route length 4 (6629/6629 routers suitable).
Jun 21 19:22:56.000 [info] onion_pick_cpath_exit(): Using requested exit node '$943CE107D30C653BC4CAF3840006DB8D1BCA6B29~hashmaps at 85.228.195.5'
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Path is 0 long; we want 4
Jun 21 19:22:56.000 [debug] extend_info_from_node(): using 23.254.166.222:80 for Pascal8
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Chose router $7B2553AEA4A3FC9E1B513D772ECA652C8CC15337~Pascal8 at 23.254.166.222 for hop 1 (exit is hashmaps)
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Path is 1 long; we want 4
Jun 21 19:22:56.000 [debug] choose_good_middle_server(): Contemplating intermediate hop 1: random choice.
Jun 21 19:22:56.000 [debug] router_choose_random_node(): We found 5770 running nodes.
Jun 21 19:22:56.000 [debug] router_choose_random_node(): We removed 0 excludednodes, leaving 5770 nodes.
Jun 21 19:22:56.000 [debug] router_choose_random_node(): We removed 16 excludedsmartlist, leaving 5757 nodes.
Jun 21 19:22:56.000 [debug] compute_weighted_bandwidths(): Generated weighted bandwidths for rule weight as middle node based on weights Wg=0.408700 Wm=1.000000 We=0.000000 Wd=0.000000 with total bw 0
Jun 21 19:22:56.000 [debug] extend_info_from_node(): using 5.9.158.68:9001 for freespeech4thedumb1
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Chose router $7E2F94EA9214EC6CD05A0FC1BA51A0FA30FC83C7~freespeech4thedumb1 at 5.9.158.68 for hop 2 (exit is hashmaps)
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Path is 2 long; we want 4
Jun 21 19:22:56.000 [debug] choose_good_middle_server(): Contemplating intermediate hop 2: random choice.
Jun 21 19:22:56.000 [debug] router_choose_random_node(): We found 5770 running nodes.
Jun 21 19:22:56.000 [debug] router_choose_random_node(): We removed 0 excludednodes, leaving 5770 nodes.
Jun 21 19:22:56.000 [debug] router_choose_random_node(): We removed 62 excludedsmartlist, leaving 5714 nodes.
Jun 21 19:22:56.000 [debug] compute_weighted_bandwidths(): Generated weighted bandwidths for rule weight as middle node based on weights Wg=0.408700 Wm=1.000000 We=0.000000 Wd=0.000000 with total bw 0
Jun 21 19:22:56.000 [debug] extend_info_from_node(): using 213.136.71.21:9001 for maltimore
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Chose router $78C4FCE0851C99A4E05563C6F520C68F28BD916E~maltimore at 213.136.71.21 for hop 3 (exit is hashmaps)
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Path is 3 long; we want 4
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Chose router $943CE107D30C653BC4CAF3840006DB8D1BCA6B29~hashmaps at 85.228.195.5 for hop 4 (exit is hashmaps)
Jun 21 19:22:56.000 [debug] onion_extend_cpath(): Path is complete: 4 steps long
Jun 21 19:22:56.000 [debug] circuit_handle_first_hop(): Looking for firsthop '23.254.166.222:80'
Jun 21 19:22:56.000 [debug] circuit_handle_first_hop(): Conn open. Delivering first onion skin.
Jun 21 19:22:56.000 [debug] circuit_send_next_onion_skin(): First skin; sending create cell.
Jun 21 19:22:56.000 [debug] circuit_get_by_circid_channel_impl(): circuit_get_by_circid_channel_impl() found nothing for circ_id 2423611392, channel ID 0 (0x614000087240)
Jun 21 19:22:56.000 [debug] circuit_deliver_create_cell(): Chosen circID 2423611392.
Jun 21 19:22:56.000 [debug] circuitmux_attach_circuit(): Attaching circuit 2423611392 on channel 0 to cmux 0x6080000d7c20
Jun 21 19:22:56.000 [debug] append_cell_to_circuit_queue(): Made a circuit active.
Jun 21 19:22:56.000 [info] circuit_send_next_onion_skin(): First hop: finished sending CREATE cell to '$7B2553AEA4A3FC9E1B513D772ECA652C8CC15337~Pascal8 at 23.254.166.222'
Jun 21 19:22:56.000 [info] connection_ap_make_link(): ... application connection created and linked.
Jun 21 19:22:56.000 [debug] connection_add_impl(): new conn type Directory, socket -1, address 85.228.195.5, n_conns 13.
Jun 21 19:22:56.000 [info] directory_get_from_hs_dir(): Sending fetch request for v2 descriptor for service 'v4sciq74tvcydqip' with descriptor ID 'soxxwg56bxmusmbapm2uc2xay527nole', auth type 0, and descriptor cookie '[none]' to hidden service directory $943CE107D30C653BC4CAF3840006DB8D1BCA6B29~hashmaps at 85.228.195.5
Jun 21 19:22:56.000 [info] connection_ap_handshake_attach_circuit(): Intro (0) and rend (3216239652) circs are not both ready. Stalling conn. (10 sec old)
Jun 21 19:22:56.000 [err] circuit_get_open_circ_or_launch(): Bug: Connection state mismatch: wanted AP_CONN_STATE_CIRCUIT_WAIT, but got 6 (waiting for rendezvous desc)
Jun 21 19:22:56.000 [err] tor_assertion_failed_(): Bug: src/or/circuituse.c:1841: circuit_get_open_circ_or_launch: Assertion ENTRY_TO_CONN(conn)->state == AP_CONN_STATE_CIRCUIT_WAIT failed; aborting.
Jun 21 19:22:56.000 [err] Bug: Assertion ENTRY_TO_CONN(conn)->state == AP_CONN_STATE_CIRCUIT_WAIT failed in circuit_get_open_circ_or_launch at src/or/circuituse.c:1841. Stack trace:
Jun 21 19:22:56.000 [err] Bug:     /usr/local/lib64/libasan.so.1(__interceptor_backtrace+0x3a) [0x2b1b9500712a]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor(log_backtrace+0x43) [0x2b1b945aac9d]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor(tor_assertion_failed_+0xf6) [0x2b1b945d15df]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor [0x2b1b944a488b]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor(connection_ap_handshake_attach_circuit+0x9bf) [0x2b1b944a6ac5]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor(connection_ap_attach_pending+0x1d2) [0x2b1b944f9680]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor(circuit_build_needed_circs+0x30) [0x2b1b944a450b]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor [0x2b1b94343218]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor [0x2b1b94343a52]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor [0x2b1b94615215]
Jun 21 19:22:56.000 [err] Bug:     /usr/local/lib64/libevent-2.0.so.5(event_base_loop+0x7e6) [0x2b1b96145066]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor [0x2b1b94339d24]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor [0x2b1b94339ec9]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor(do_main_loop+0x356) [0x2b1b943445a8]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor(tor_main+0x1aa) [0x2b1b94345c8e]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor(main+0x19) [0x2b1b94339289]
Jun 21 19:22:56.000 [err] Bug:     /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b1b97f8d9f4]
Jun 21 19:22:56.000 [err] Bug:     /w/gpl/tor-0.2.6.9-asus/src/or/tor [0x2b1b94339149]

comment:20 Changed 23 months ago by starlight

(gdb) info backtrace
#0  raise () from /lib64/libc.so.6
#1  abort () from /lib64/libc.so.6
#2  circuit_get_open_circ_or_launch (conn=conn@entry=0x612000246dc0, desired_circuit_purpose=desired_circuit_purpose@entry=7 '\a', circp=circp@entry=0x7fffed22fe30) at src/or/circuituse.c:1841
#3  connection_ap_handshake_attach_circuit (conn=conn@entry=0x612000246dc0) at src/or/circuituse.c:2425
#4  connection_ap_attach_pending () at src/or/connection_edge.c:729
#5  circuit_build_needed_circs (now=now@entry=1434928976) at src/or/circuituse.c:1126
#6  run_scheduled_events (now=now@entry=1434928976) at src/or/main.c:1562
#7  second_elapsed_callback (timer=<optimized out>, arg=<optimized out>) at src/or/main.c:1755
#8  periodic_timer_cb (fd=<optimized out>, what=<optimized out>, arg=0x603000250870) at src/common/compat_libevent.c:553
#9  event_base_loop () from /usr/local/lib64/libevent-2.0.so.5
#10 run_main_loop_once () at src/or/main.c:2115
#11 run_main_loop_until_done () at src/or/main.c:2156
#12 do_main_loop () at src/or/main.c:2087
#13 tor_main (argc=<optimized out>, argv=0x7fffed2302d8) at src/or/main.c:3105
#14 main (argc=<optimized out>, argv=<optimized out>) at src/or/tor_main.c:30

comment:21 Changed 23 months ago by starlight

  • Status changed from needs_information to needs_revision

comment:22 Changed 23 months ago by nickm

  • Owner set to nickm
  • Status changed from needs_revision to accepted

This is key:

Jun 21 19:22:56.000 [err] circuit_get_open_circ_or_launch(): Bug: Connection state mismatch: wanted AP_CONN_STATE_CIRCUIT_WAIT, but got 6 (waiting for rendezvous desc)

And so is this:

#2  circuit_get_open_circ_or_launch (conn=conn@entry=0x612000246dc0, desired_circuit_purpose=desired_circuit_purpose@entry=7 '\a', circp=circp@entry=0x7fffed22fe30) at src/or/circuituse.c:1841
#3  connection_ap_handshake_attach_circuit (conn=conn@entry=0x612000246dc0) at src/or/circuituse.c:2425

Note that when we first entered connection_ap_handshake_attach_circuit, conn->state was CIRCUIT_WAIT, since otherwise the assertion at the start of connection_ap_handshake_attach_circuit would have failed. Also, the second invocation of circuit_get_open_circ_or_launch is hitting the assertion. The first invocation must have returned 0. Finally, somewhere between the beginning of the first invocation of circuit_get_open_circ_or_launch and the second, we changed the state of the conn to DESC_WAIT.

comment:23 Changed 23 months ago by starlight

should have noted that the trigger appears to be

[info] rend_cache_purge(): Purging HS descriptor cache
[info] rend_client_purge_last_hid_serv_requests(): Purging client last-HS-desc-request-time table

which is caused by a manually issued
control channel signal newnym request

The bug is now easily reproduced; am willing to
add additional debug code and re-test.
Testing is with version 0.2.6.9.

comment:24 Changed 23 months ago by nickm

Interesting. So we purge state, and so we don't have any cached information, but we do have an in-progress circuit and connection.

I bet that this is message is also crucial:

Jun 21 19:22:56.000 [info] circuit_get_open_circ_or_launch(): No intro points for '[scrubbed]': re-fetching service descriptor.

Aha, here's the culprit!

      extend_info = rend_client_get_random_intro(rend_data);
      if (!extend_info) {
        log_info(LD_REND,
                 "No intro points for '%s': re-fetching service descriptor.",
                 safe_str_client(rend_data->onion_address));
        rend_client_refetch_v2_renddesc(rend_data);
        ENTRY_TO_CONN(conn)->state = AP_CONN_STATE_RENDDESC_WAIT;
        return 0;
      }

Changed 23 months ago by nickm

comment:25 follow-up: Changed 23 months ago by nickm

  • Status changed from accepted to needs_review

I have attached a possible patch. Does this fix the issue for you?

Changed 23 months ago by starlight

tweak fixes to patch canidate

comment:26 Changed 23 months ago by starlight

seems to

manged to trigger the message three times with no crash

still seeing several

[warn] Query '[scrubbed]' didn't have valid rend desc in cache. Failing.

immediately before the new message

patched needed a space in the message and
two characters transposed--revision attached

comment:27 Changed 23 months ago by starlight

triggered it two more times in alternate abusive scenario, no problem

comment:28 Changed 23 months ago by starlight

put patched 0.2.6.9 up on the
full-configuration relay

will update ticket in 10 days
unless it blows up before then

comment:29 in reply to: ↑ 17 Changed 22 months ago by teor

Replying to starlight:

one might also see

Closing stream for '[scrubbed].onion': hidden service is unavailable (try again later).

For the record, this appears to be #8902, reproducible by making 60+ client connections to a single HS using the changes to chutney in #14175.

comment:30 in reply to: ↑ 25 Changed 22 months ago by dgoulet

Replying to nickm:

I have attached a possible patch. Does this fix the issue for you?

That snippet of code, shouldn't be at the top of the function because it sounds like if weren't in the WAIT state, there is no point to try to attach and do anymore work? or do we?

I am asking because prior to this check, conn can get modified like for instance conn->chosen_exit_optional = 0. If it doesn't need to be at the top, I would like to know why the added check is valid at this point in the function not before (comment?).

comment:31 Changed 22 months ago by nickm

Good point; I'll add a comment + merge.

comment:32 Changed 22 months ago by nickm

  • Milestone changed from Tor: 0.2.7.x-final to Tor: 0.2.5.x-final

Branch bug16013_025 has the improved code and a nice comment. I've merged it to 0.2.6 and forward; considering for possible backport to 0.2.5.

comment:33 Changed 22 months ago by mMojo

just to add to this bug report, i have used 0.2.6.9 and was getting c.1835 crash bug, switched to v0.2.6.7 and there were 30+ open circuits in my network map, old circuits expiring and new ones creating quickly. my browser was connected only to one server from the beginning of tor session. server owner confirmed that he is unexpectedly receiving 30+ connections from each client. i could connect to that server using tor2web proxies but not directly without using 'new identity' or restarting tor. i have noticed only one server that is 'causing' this problem.

Last edited 22 months ago by mMojo (previous) (diff)

comment:34 Changed 22 months ago by starlight

Note this bug is present in all 0.2.6.x versions
and evidently 0.2.5.x as well (though it never
triggered here when running 0.2.5.12).

You would be best served by building 0.2.6.9 + patch,
building from mainline git 0.2.7.x, or by picking
up 0.2.6.10 when it arrives.

The connection flood you are experiencing is a
separate issue, though it is triggering this
one indirectly by inspiring the invocation
of "new identity" (aka NEWNYM).

comment:35 Changed 22 months ago by mMojo

i have noticed that users on some tor chat servers get disconnected all at the same time or near the same time (in a matter of maybe few minutes).

since this bug appeared just recently but also affecting old v0.2.5.x versions, this makes me to think this bug is somehow exploited-triggered by someone because otherwise how this bug could affect chat users all at once? well, if this is a client issue only

chat server was working fine when i reloaded tor (10-15 seconds), so i guess this is not affecting server.

Last edited 22 months ago by mMojo (previous) (diff)

comment:36 Changed 19 months ago by nickm

  • Milestone changed from Tor: 0.2.5.x-final to Tor: 0.2.6.x-final
  • Resolution set to fixed
  • Status changed from needs_review to closed

Marking as fixed-in-0.2.6; not planning to fix in 0.2.5. Upgrade to 0.2.6 or later if this affects you.

comment:37 Changed 13 months ago by nickm

  • Keywords 2016-bug-retrospective added

Mark more tickets for bug retrospective based on hand-review of changelogs from 0.2.5 onwards.

Note: See TracTickets for help on using tickets.