Opened 5 months ago

Last modified 2 months ago

#28962 new defect

circuits are not both ready. Stalling conn.

Reported by: traumschule Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs, postfreeze-ok, 040-deferred-20190220
Cc: asn, dgoulet Actual Points:
Parent ID: #17835 Points:
Reviewer: Sponsor:

Description (last modified by traumschule)

Using an unstable (wifi) connection my Tor 0.4.0.0-alpha-dev client shows circuits are not both ready. Stalling conn. and most connections hang for some minutes until it let‘s through a bunch and hangs again.

It‘s confusing to see netflow padding while waiting for connections, looks like an instance of #23681.

Relevant code is around src/core/or/circuituse.c:3025.
Log shows a lot of timeouts like:

Dec 30 10:48:14.000 [info] circuit_build_times_get_xm(): Xm mode #0: 325 75
Dec 30 10:48:14.000 [info] circuit_build_times_get_xm(): Xm mode #1: 375 47
Dec 30 10:48:14.000 [info] circuit_build_times_get_xm(): Xm mode #2: 325 75
Dec 30 10:48:14.000 [info] circuit_build_times_set_timeout_worker(): Circuit build measurement period of 1337373ms is more than twice the maximum build time we have ever observed. Capping it to 1143450ms.
Dec 30 10:48:14.000 [info] circuit_build_times_set_timeout(): Set circuit build timeout to 29s (28864.874645ms, 1143450.000000ms, Xm: 336, a: 0.361406, r: 0.282000) based on 1000 circuit times
Dec 30 10:48:14.000 [info] entry_guards_note_guard_success(): Recorded success for primary confirmed guard <scrubbed>
Dec 30 10:48:14.000 [info] circuit_build_no_more_hops(): circuit built!
Dec 30 10:48:14.000 [info] pathbias_count_build_success(): Got success count 209.532891/219.352138 for guard <scrubbed>
Dec 30 10:48:14.000 [info] connection_ap_handshake_attach_circuit(): Intro circ 4177057486 (id: 2028) present and awaiting ACK. Rend circuit 3511196698 (id: 2011). Stalling. (stream 41 sec old)
Dec 30 10:48:14.000 [info] or_state_save(): Saved state to "/var/lib/tor/state"
Dec 30 10:48:14.000 [info] rend_client_introduction_acked(): Got nack for [scrubbed] from [scrubbed]...
Dec 30 10:48:14.000 [info] rend_client_report_intro_point_failure(): 5 options left for [scrubbed].
Dec 30 10:48:14.000 [info] hs_client_reextend_intro_circuit(): Re-extending circ 4177057486, this time to [scrubbed].
Dec 30 10:48:14.000 [info] circuit_send_intermediate_onion_skin(): Sending extend relay cell.
Dec 30 10:48:14.000 [info] connection_ap_handshake_attach_circuit(): ready rend circ 3511196698 (id: 2011) already here. Nointro-ack yet on intro 4177057486 (id: 2028). (stream 41 sec old)
Dec 30 10:48:14.000 [info] connection_ap_handshake_attach_circuit(): Intro 4177057486 (id: 2028) and rend circuit 3511196698 (id: 2011) circuits are not both ready. Stalling conn. (41 sec old)
...
Dec 30 10:49:01.000 [info] connection_ap_handshake_attach_circuit(): Intro 4177057486 (id: 2028) and rend circuit 3511196698 (id: 2011) circuits are not both ready. Stalling conn. (88 sec old)
Dec 30 10:49:02.000 [info] connection_ap_handshake_attach_circuit(): ready rend circ 3511196698 (id: 2011) already here. Nointro-ack yet on intro 4177057486 (id: 2028). (stream 89 sec old)
Dec 30 10:49:02.000 [info] connection_ap_handshake_attach_circuit(): Intro 4177057486 (id: 2028) and rend circuit 3511196698 (id: 2011) circuits are not both ready. Stalling conn. (89 sec old)
Dec 30 10:49:03.000 [info] connection_ap_handshake_attach_circuit(): ready rend circ 3511196698 (id: 2011) already here. Nointro-ack yet on intro 4177057486 (id: 2028). (stream 90 sec old)
Dec 30 10:49:03.000 [info] connection_ap_handshake_attach_circuit(): Intro 4177057486 (id: 2028) and rend circuit 3511196698 (id: 2011) circuits are not both ready. Stalling conn. (90 sec old)
Dec 30 10:49:04.000 [info] channelpadding_send_padding_cell_for_callback(): Sending netflow keepalive on 608 to [scrubbed] ([scrubbed]) after 8689 ms. Delta 1ms
Dec 30 10:49:04.000 [info] connection_ap_handshake_attach_circuit(): ready rend circ 3511196698 (id: 2011) already here. Nointro-ack yet on intro 4177057486 (id: 2028). (stream 91 sec old)
Dec 30 10:49:04.000 [info] connection_ap_handshake_attach_circuit(): Intro 4177057486 (id: 2028) and rend circuit 3511196698 (id: 2011) circuits are not both ready. Stalling conn. (91 sec old)
Dec 30 10:49:05.000 [info] connection_ap_handshake_attach_circuit(): ready rend circ 3511196698 (id: 2011) already here. Nointro-ack yet on intro 4177057486 (id: 2028). (stream 92 sec old)
Dec 30 10:49:05.000 [info] connection_ap_handshake_attach_circuit(): Intro 4177057486 (id: 2028) and rend circuit 3511196698 (id: 2011) circuits are not both ready. Stalling conn. (92 sec old)
Dec 30 10:49:07.000 [info] connection_ap_handshake_attach_circuit(): ready rend circ 3511196698 (id: 2011) already here. Nointro-ack yet on intro 4177057486 (id: 2028). (stream 93 sec old)
Dec 30 10:49:07.000 [info] connection_ap_handshake_attach_circuit(): Intro 4177057486 (id: 2028) and rend circuit 3511196698 (id: 2011) circuits are not both ready. Stalling conn. (93 sec old)

Child Tickets

Attachments (1)

info.scrubbed.log.xz (135.1 KB) - added by traumschule 5 months ago.
info log for Dec 30 with tweaked scrubbing

Download all attachments as: .zip

Change History (14)

comment:1 Changed 5 months ago by traumschule

Above info log also shows Network is unreachable a lot as proof for using a poor wireless connection. On the same computer Tor Browser with Tor 0.3.5.5-alpha using obfs4 shows more tolerance to connection issues.

Comparing notice logs of both they share

Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)

while system tor also has the waiting for rendezvous desc variant and TB‘s tor instead quite often shows:

[WARN] Proxy Client: unable to connect to $address ("general SOCKS server failure")

Is it possible that latest code changes (0.3.5.5-alpha vs 0.4.0-alpha) made tor less robust or are PT connections better in handling dropped packets?

system tor

Dec 30 07:02:12.000 [notice] Tor 0.4.0.0-alpha-dev opening new log file.
Dec 30 07:04:03.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:04:03.000 [notice] Our circuit 0 (id: 1257) died due to an invalid selected path, purpose General-purpose client. This may be a torrc configuration issue, or a bug.
Dec 30 07:05:40.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit)
Dec 30 07:08:49.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:11:40.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:42. Giving up. (waiting for circuit)
Dec 30 07:13:02.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:14:21.000 [notice] Heartbeat: Tor's uptime is 17:59 hours, with 2 circuits open. I've sent 15.65 MB and received 177.38 MB.
Dec 30 07:14:21.000 [notice] Average packaged cell fullness: 16.099%. TLS write overhead: 5%
Dec 30 07:15:48.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit)
Dec 30 07:17:29.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for rendezvous desc)
Dec 30 07:17:36.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:19:30.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for rendezvous desc)
Dec 30 07:21:45.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:21:48.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:42. Giving up. (waiting for circuit)
Dec 30 07:25:54.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:25:55.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit)
Dec 30 07:26:31.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for rendezvous desc)
Dec 30 07:28:31.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for rendezvous desc)
Dec 30 07:30:33.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:31:55.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:42. Giving up. (waiting for circuit)
Dec 30 07:34:41.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:35:31.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for rendezvous desc)
Dec 30 07:36:02.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit)
Dec 30 07:37:31.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for rendezvous desc)
Dec 30 07:39:49.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:42:02.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:42. Giving up. (waiting for circuit)
Dec 30 07:44:00.000 [warn] Failed to find node for hop #1 of our path. Discarding this circuit.
Dec 30 07:44:32.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for rendezvous desc)
Dec 30 07:50:46.000 [notice] Your system clock just jumped 288 seconds forward; assuming established circuits no longer work.
Dec 30 07:50:46.000 [notice] Tried for 397 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit)
Dec 30 07:50:46.000 [notice] Tried for 374 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for rendezvous desc)
Dec 30 09:32:02.000 [notice] We tried for 15 seconds to connect to '[scrubbed]' using exit $relay at $address. Retrying on a new circuit.
Dec 30 10:28:24.000 [notice] We tried for 15 seconds to connect to '[scrubbed]' using exit $relay at $address. Retrying on a new circuit.
Dec 30 10:29:49.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit)
Dec 30 10:30:19.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)
Dec 30 10:30:24.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)
Dec 30 10:30:29.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)
Dec 30 10:30:39.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)
Dec 30 10:37:59.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)
Dec 30 10:38:05.000 [notice] Rend stream is 120 seconds late. Giving up on address '[scrubbed].onion'.
Dec 30 10:38:19.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)
Dec 30 10:49:33.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)
Dec 30 11:10:19.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)

tor browser

12/30/18, 12:37:31.860 [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again. 
12/30/18, 12:37:31.860 [WARN] Proxy Client: unable to connect to $address:13764 ("$relay) 
12/30/18, 12:37:32.198 [WARN] Proxy Client: unable to connect to $address:10527 ("$relay) 
12/30/18, 12:37:32.198 [WARN] Proxy Client: unable to connect to $address:443 ("$relay) 
12/30/18, 12:37:32.638 [NOTICE] Delaying directory fetches: No running bridges 
12/30/18, 12:38:02.637 [NOTICE] Tried for 120 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit) 
12/30/18, 12:38:02.742 [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again. 
12/30/18, 12:38:02.751 [WARN] Proxy Client: unable to connect to $address:13764 ("$relay) 
12/30/18, 12:38:03.852 [WARN] Proxy Client: unable to connect to $address:10527 ("$relay) 
12/30/18, 12:38:04.850 [WARN] Proxy Client: unable to connect to $address:443 ("$relay) 
12/30/18, 12:38:05.638 [NOTICE] Delaying directory fetches: No running bridges 
12/30/18, 12:39:31.637 [NOTICE] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for circuit) 
12/30/18, 12:40:02.636 [NOTICE] Tried for 120 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit) 
12/30/18, 12:42:02.345 [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again. 
12/30/18, 12:42:02.775 [WARN] Proxy Client: unable to connect to $address:13764 ("$relay) 
12/30/18, 12:42:03.358 [WARN] Proxy Client: unable to connect to $address:10527 ("$relay) 
12/30/18, 12:42:03.358 [WARN] Proxy Client: unable to connect to $address:443 ("$relay) 
12/30/18, 12:42:03.637 [NOTICE] Delaying directory fetches: No running bridges 
12/30/18, 12:44:02.644 [NOTICE] Tried for 120 seconds to get a connection to [scrubbed]:42. Giving up. (waiting for circuit) 
12/30/18, 12:50:46.689 [NOTICE] Your system clock just jumped 289 seconds forward; assuming established circuits no longer work. 
12/30/18, 12:50:46.842 [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again. 
12/30/18, 12:50:47.116 [WARN] Proxy Client: unable to connect to $address:13764 ("$relay) 
12/30/18, 12:50:47.849 [WARN] Proxy Client: unable to connect to $address:443 ("$relay) 
12/30/18, 12:50:47.849 [WARN] Proxy Client: unable to connect to $address:10527 ("$relay) 
12/30/18, 12:50:48.636 [NOTICE] Delaying directory fetches: No running bridges 
12/30/18, 12:52:46.632 [NOTICE] Tried for 120 seconds to get a connection to [scrubbed]:22. Giving up. (waiting for circuit) 
12/30/18, 12:52:46.726 [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again. 
12/30/18, 12:52:47.728 [NOTICE] Tried for 120 seconds to get a connection to [scrubbed]:993. Giving up. (waiting for circuit) 
12/30/18, 13:01:36.187 [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $relay at $address Retrying on a new circuit. 
12/30/18, 13:04:45.187 [NOTICE] Delaying directory fetches: No running bridges 
12/30/18, 13:05:13.455 [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again. 
12/30/18, 14:43:59.659 [NOTICE] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up. 
12/30/18, 14:58:54.968 [NOTICE] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up. 
12/30/18, 14:59:12.218 [NOTICE] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up. 
12/30/18, 14:59:26.990 [WARN] Your application (using socks5 to port 42) gave Tor a malformed hostname: [scrubbed]. Rejecting the connection. 
12/30/18, 15:01:49.203 [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $relay at $address Retrying on a new circuit. 
12/30/18, 15:02:04.276 [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $relay at $address Retrying on a new circuit. 
12/30/18, 15:02:19.256 [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $relay at $address Retrying on a new circuit. 
12/30/18, 15:02:36.288 [WARN] Your application (using socks5 to port 42) gave Tor a malformed hostname: [scrubbed]. Rejecting the connection. 
12/30/18, 15:02:40.167 [WARN] Your application (using socks5 to port 42) gave Tor a malformed hostname: [scrubbed]. Rejecting the connection. 
12/30/18, 15:07:27.238 [NOTICE] new bridge descriptor 'frosty' (fresh): $relay at $address 
12/30/18, 15:20:32.719 [WARN] Proxy Client: unable to connect to $address:443 ("$relay) 
12/30/18, 15:27:10.202 [NOTICE] Heartbeat: Tor's uptime is 4 days 6:01 hours, with 5 circuits open.
12/30/18, 15:27:10.203 [NOTICE] Average packaged cell fullness: 27.161%. TLS write overhead: 5% 
12/30/18, 15:35:28.328 [NOTICE] We tried for 15 seconds to connect to '[scrubbed]' using exit $relay at $address Retrying on a new circuit. 
12/30/18, 15:35:54.328 [NOTICE] Tor has not observed any network activity for the past 92 seconds. Disabling circuit build timeout recording. 
12/30/18, 15:35:55.562 [NOTICE] Tor now sees network activity. Restoring circuit build timeout recording. Network was down for 93 seconds during 1 circuit attempts. 
12/30/18, 15:39:47.558 [NOTICE] new bridge descriptor 'ndnop3' (fresh): $relay at $address 
12/30/18, 15:47:26.543 [WARN] Proxy Client: unable to connect to $address:80 ("$relay) 

(timestamps differ by 5 hours)

comment:2 Changed 5 months ago by nickm

Cc: asn dgoulet added
Keywords: tor-hs added
Milestone: Tor: unspecifiedTor: 0.4.0.x-final

Changed 5 months ago by traumschule

Attachment: info.scrubbed.log.xz added

info log for Dec 30 with tweaked scrubbing

comment:3 Changed 5 months ago by traumschule

Description: modified (diff)

Removing wrong assumption.

comment:4 Changed 5 months ago by cypherpunks2

Using an unstable (wifi) connection
info.scrubbed.log.xz​

no route to host/net 576 times

comment:5 in reply to:  4 Changed 5 months ago by traumschule

Replying to cypherpunks2:

Using an unstable (wifi) connection
info.scrubbed.log.xz​

no route to host/net 576 times

Yes, my guess of what's happening is that packets are lost through interference because of the distances to the router. While system tor got into a strange state, at the same time Tor Browser using a PT is usable without major issues. Either this is a regression in tor or PTs (or obgs4 in particular) are just more fault tolerant.
Currently trying to reproduce it with the branch in comment:4:issue:28669 (not entirely this is related though).

comment:6 Changed 5 months ago by cypherpunks3

packets are lost through interference because of the distances to the router.

I don't think so.
"no route to host" is about link loss (wifi, isp, uplink, ..)

comment:7 in reply to:  6 Changed 5 months ago by traumschule

Thanks for checking the log!

Replying to cypherpunks3:

packets are lost through interference because of the distances to the router.

I don't think so.
"no route to host" is about link loss (wifi, isp, uplink, ..)

Let me try again: The connection to the wireless router was lost and has been reestablished by network-manager quite often because of lost packets during that time.
All that happened while i was constantly using the Browser and was connected via ssh.

Unfortunately i have no info log for Tor Browser of that day but above log shows to me that Tor Browser has less issues and it was able to open all webpages during the period in question.

Does that make more sense to you? What do you suggest that could be done here?

comment:8 Changed 5 months ago by cypherpunks3

tor set non-blocking sockets
obfs4proxy set blocking sockets

comment:9 Changed 5 months ago by cypherpunks2

What do you suggest that could be done here?

Try option

Schedulers Vanilla

comment:10 Changed 4 months ago by nickm

Keywords: postfreeze-ok added

Mark some tickets as postfreeze-ok, to indicate that I think they are okay to accept in 0.4.0 post-freeze. Does not indicate that they are all necessary to do postfreeze.

comment:11 Changed 3 months ago by nickm

Keywords: 040-deferred-20190220 added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

Deferring 51 tickets from 0.4.0.x-final. Tagging them with 040-deferred-20190220 for visibility. These are the tickets that did not get 040-must, 040-can, or tor-ci.

comment:12 Changed 2 months ago by teor

Parent ID: #17835

This is a known bug in Tor's guard code: it does not check that guards are worse than the average before blocking them.
We might fix it as a consequence of #17835.

comment:13 Changed 2 months ago by teor

The v2 case of #29697 might be fixed if we fix this issue, but it may also need some other fixes.

Note: See TracTickets for help on using tickets.