#28669 closed defect (fixed)

Bug: ../src/feature/hs/hs_client.c:280: retry_all_socks_conn_waiting_for_desc

Reported by: traumschule Owned by:
Priority: Medium Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs
Cc: Actual Points:
Parent ID: Points:
Reviewer: asn Sponsor:

Description

After suspend tor 0.4.0.0-alpha-dev-20181118T011041Z-1~d99.buster+1 had this issue:

Nov 26 00:00:07.000 [notice] Tor 0.4.0.0-alpha-dev opening new log file.
Nov 26 10:37:43.000 [notice] {GENERAL} Your system clock just jumped 35475 seconds forward; assuming established circuits no longer work.
Nov 26 10:38:38.000 [warn] {CIRC} Failed to find node for hop #1 of our path. Discarding this circuit.
Nov 26 10:38:39.000 [notice] {CIRC} Our circuit 0 (id: 202) died due to an invalid selected path, purpose Hidden service client: Fetching HS descriptor. This may be a torrc configuration issue, or a bug.
Nov 26 10:41:00.000 [warn] {BUG} tor_bug_occurred_(): Bug: ../src/feature/hs/hs_client.c:280: retry_all_socks_conn_waiting_for_desc: Non-fatal assertion !(status == HS_CLIENT_FETCH_HAVE_DESC) failed. (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug: Non-fatal assertion !(status == HS_CLIENT_FETCH_HAVE_DESC) failed in retry_all_socks_conn_waiting_for_desc at ../src/feature/hs/hs_client.c:280. Stack trace: (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(log_backtrace_impl+0x5e) [0x6d4c4e] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(tor_bug_occurred_+0xbd) [0x6d042d] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(hs_client_dir_info_changed+0xce) [0x5da00e] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(router_dir_info_changed+0x34) [0x5ffbc4] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(networkstatus_set_current_consensus+0x41f) [0x5f7fef] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(connection_dir_reached_eof+0xf7f) [0x5c006f] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(connection_handle_read+0x829) [0x53c679] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(+0x568f9) [0x5428f9] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/lib/i386-linux-gnu/libevent-2.1.so.6(+0x2091b) [0xb7e3291b] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/lib/i386-linux-gnu/libevent-2.1.so.6(event_base_loop+0x4d1) [0xb7e333b1] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(tor_libevent_run_event_loop+0x30) [0x672ab0] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(do_main_loop+0xc0) [0x543d40] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(tor_run_main+0x141a) [0x530f9a] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(tor_main+0x3f) [0x52e35f] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(main+0x32) [0x52ded2] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0xb774c9a1] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:01.000 [warn] {BUG} Bug:     /usr/bin/tor(+0x41f2e) [0x52df2e] (on Tor 0.4.0.0-alpha-dev )
<repeated once>
Nov 26 10:41:11.000 [notice] {APP} Tried for 120 seconds to get a connection to 5gdvpfoh6kb2iqbizb37lzk2ddzrwa47m6rpdueg2m656fovmbhoptqd:993. Giving up. (waiting for rendezvous desc)
Nov 26 10:41:36.000 [warn] {BUG} tor_bug_occurred_(): Bug: ../src/feature/hs/hs_client.c:280: retry_all_socks_conn_waiting_for_desc: Non-fatal assertion !(status == HS_CLIENT_FETCH_HAVE_DESC) failed. (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug: Non-fatal assertion !(status == HS_CLIENT_FETCH_HAVE_DESC) failed in retry_all_socks_conn_waiting_for_desc at ../src/feature/hs/hs_client.c:280. Stack trace: (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(log_backtrace_impl+0x5e) [0x6d4c4e] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(tor_bug_occurred_+0xbd) [0x6d042d] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(hs_client_dir_info_changed+0xce) [0x5da00e] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(router_dir_info_changed+0x34) [0x5ffbc4] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(microdescs_add_list_to_cache+0x2bc) [0x5f1fec] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(microdescs_add_to_cache+0x2ce) [0x5f23ce] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(connection_dir_reached_eof+0x1666) [0x5c0756] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(connection_handle_read+0x829) [0x53c679] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(+0x568f9) [0x5428f9] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/lib/i386-linux-gnu/libevent-2.1.so.6(+0x2091b) [0xb7e3291b] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/lib/i386-linux-gnu/libevent-2.1.so.6(event_base_loop+0x4d1) [0xb7e333b1] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(tor_libevent_run_event_loop+0x30) [0x672ab0] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(do_main_loop+0xc0) [0x543d40] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(tor_run_main+0x141a) [0x530f9a] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(tor_main+0x3f) [0x52e35f] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(main+0x32) [0x52ded2] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0xb774c9a1] (on Tor 0.4.0.0-alpha-dev )
Nov 26 10:41:36.000 [warn] {BUG} Bug:     /usr/bin/tor(+0x41f2e) [0x52df2e] (on Tor 0.4.0.0-alpha-dev )
<repeated once>
Nov 26 10:42:38.000 [notice] {APP} Tried for 120 seconds to get a connection to jukrlvyhgguiedqswc5lehrag2fjunfktouuhi4wozxhb6heyzvshuyd:5222. Giving up. (waiting for rendezvous desc)
  • jukrlvyhgguiedqswc5lehrag2fjunfktouuhi4wozxhb6heyzvshuyd is Riseup's XMPP server
  • 5gdvpfoh6kb2iqbizb37lzk2ddzrwa47m6rpdueg2m656fovmbhoptqd is Riseup's mail server

Didn't have the chance to reproduce it yet.

Child Tickets

Attachments (2)

28669-info.scrubbed.log.xz (32.3 KB) - added by traumschule 11 months ago.
It happened again, this time without clock jump. Probably worth to think it together with my wifi issues in #28962.
28669-patched.info.scrubbed.log.xz (41.7 KB) - added by traumschule 10 months ago.
Great, branch ticket28669_035_01 fixes the bug. Still 562 appearenses of circuits are not both ready. Stalling conn. (#28962)

Download all attachments as: .zip

Change History (13)

comment:1 Changed 12 months ago by traumschule

Related: #23588, #24610, #27410

comment:2 Changed 12 months ago by dgoulet

Keywords: tor-hs added
Milestone: Tor: 0.4.0.x-final

Seems #27410 is back, we didn't get it right (or maybe we did partially).

comment:3 Changed 12 months ago by dgoulet

I've stated that theory in part in #27410 but now I do think it is indeed the code path creating this issue:

  1. Tor gets a descriptor and tries to connect to it but intro points all fail so the descriptor ends up unusable that is hs_client_any_intro_points_usable() returns false.
  1. A SOCKS request is made to the .onion and because the intro points aren't usuable, a refetch is triggered, see connection_ap_handle_onion() which put the connection in AP_CONN_STATE_RENDDESC_WAIT
  1. While (2) is happening, computer goes in suspend mode for more than 30 minutes.
  1. When waking up, the rend_cache_failure_clean() callback is triggered which makes all the intro point usable all the sudden.
  1. Everything is stalled until we get a live consensus which, when it arrives, the HS subsystem is informed with hs_client_dir_info_changed() that calls retry_all_socks_conn_waiting_for_desc() (from the stacktrace we have in the ticket).
  1. And boom, for hs_client_refetch_hsdesc() to return HS_CLIENT_FETCH_HAVE_DESC, we need a descriptor in the cache with usable intro points. We have that because (4) made the intro points usable and the descriptor from (1) is still in the cache. The BUG() is hit because we still have that pending SOCKS connection that is waiting for its descriptor from (2).

The solution is that for this particular condition where we do have a descriptor in our cache but we have a connection waiting for a descriptor, instead of BUG(), we need to mark it as "pending for a circuit" like we do in hs_client_desc_has_arrived() which will trigger the connection to be attached to a circuit and thus the HS dance to start.

Last edited 12 months ago by dgoulet (previous) (diff)

comment:4 Changed 11 months ago by dgoulet

Status: newneeds_review

Branch: ticket28669_035_01
PR: https://github.com/torproject/tor/pull/563

comment:5 Changed 11 months ago by dgoulet

Reviewer: asn

comment:6 Changed 11 months ago by asn

Patch LGTM in principle. However, the described scenario in comment:3 seems pretty implausible, particularly since the original poster gets the BUG with two onion services. What's the probability that points 1 and 2 happened for two onion services before the laptop went into suspend? Maybe if the internet went down for some time before the suspend.

As I said, the patch LGTM, but it would be great if we could reproduce this to make sure that the bug occurs and the fix fixes it, since this is the second time we are trying to fix this bug. We could repro it in theory with shutting down the net, and using faketime or something.

Changed 11 months ago by traumschule

Attachment: 28669-info.scrubbed.log.xz added

It happened again, this time without clock jump. Probably worth to think it together with my wifi issues in #28962.

comment:7 in reply to:  3 Changed 11 months ago by traumschule

Replying to dgoulet:

I've stated that theory in part in #27410 but now I do think it is indeed the code path creating this issue:

  1. Tor gets a descriptor and tries to connect to it but intro points all fail so the descriptor ends up unusable that is hs_client_any_intro_points_usable() returns false.
  1. A SOCKS request is made to the .onion and because the intro points aren't usuable, a refetch is triggered, see connection_ap_handle_onion() which put the connection in AP_CONN_STATE_RENDDESC_WAIT
  1. While (2) is happening, computer goes in suspend mode for more than 30 minutes.
  1. When waking up, the rend_cache_failure_clean() callback is triggered which makes all the intro point usable all the sudden.
  1. Everything is stalled until we get a live consensus which, when it arrives, the HS subsystem is informed with hs_client_dir_info_changed() that calls retry_all_socks_conn_waiting_for_desc() (from the stacktrace we have in the ticket).
  1. And boom, for hs_client_refetch_hsdesc() to return HS_CLIENT_FETCH_HAVE_DESC, we need a descriptor in the cache with usable intro points. We have that because (4) made the intro points usable and the descriptor from (1) is still in the cache. The BUG() is hit because we still have that pending SOCKS connection that is waiting for its descriptor from (2).

The solution is that for this particular condition where we do have a descriptor in our cache but we have a connection waiting for a descriptor, instead of BUG(), we need to mark it as "pending for a circuit" like we do in hs_client_desc_has_arrived() which will trigger the connection to be attached to a circuit and thus the HS dance to start.

Contrary to above case this one isn't caused by a clock jump but a weak connection (#28962) causes Bugs around 00:16:05.000, 00:16:59.000 and 00:17:22.000.
can_client_refetch_desc() gives a new hint which additionally relates it to #24610 and #23764.
rend_cache_failure_clean() isn't triggered and connection_ap_handle_onion() appears after the first bug at 00:16:38.000.

Dec 29 00:15:55.000 [info] can_client_refetch_desc(): Can't fetch descriptor for service [scrubbed] because we are missing a live consensus. Stalling connection.
...
Dec 29 00:15:56.000 [info] circuit_expire_building(): No circuits are opened. Relaxing timeout for circuit 16132 (a General-purpose client 3-hop circuit in state doing handshakes with channel state open).
Dec 29 00:15:56.000 [info] circuit_expire_building(): No circuits are opened. Relaxing timeout for circuit 16140 (a Hidden service client: Establishing rendezvous point 3-hop circuit in state doing handshakes with channel state open).
...
Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are opened. Relaxing timeout for circuit 16123 (a Hidden service client: Connecting to intro point 4-hop circuit in state doing handshakes with channel state open).
Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are opened. Relaxing timeout for circuit 16137 (a General-purpose client 3-hop circuit in state doing handshakes with channel state open).
Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are opened. Relaxing timeout for circuit 16143 (a Hidden service client: Establishing rendezvous point 3-hop circuit in state doing handshakes with channel state open).
Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are opened. Relaxing timeout for circuit 16144 (a Hidden service client: Establishing rendezvous point 3-hop circuit in state doing handshakes with channel state open).
Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are opened. Relaxing timeout for circuit 16145 (a Hidden service client: Establishing rendezvous point 3-hop circuit in state doing handshakes with channel state open).
...
Dec 29 00:16:05.000 [info] can_client_refetch_desc(): We would fetch a v3 hidden service descriptor but we already have a usable descriptor.
Dec 29 00:16:05.000 [warn] tor_bug_occurred_(): Bug: ../src/feature/hs/hs_client.c:280: retry_all_socks_conn_waiting_for_desc: Non-fatal assertion !(status == HS_CLIENT_FETCH_HAVE_DESC) failed. (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug: Non-fatal assertion !(status == HS_CLIENT_FETCH_HAVE_DESC) failed in retry_all_socks_conn_waiting_for_desc at ../src/feature/hs/hs_client.c:280. Stack trace: (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(log_backtrace_impl+0x65) [0x643225] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(tor_bug_occurred_+0xbd) [0x63ea4d] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(hs_client_dir_info_changed+0xad) [0x54581d] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(router_dir_info_changed+0x34) [0x56cc84] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(networkstatus_set_current_consensus+0x3f4) [0x563f14] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(connection_dir_reached_eof+0xcc0) [0x52b510] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(connection_handle_read+0x808) [0x4aa0b8] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(+0x5a629) [0x4b0629] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/lib/i386-linux-gnu/libevent-2.0.so.5(event_base_loop+0x7d1) [0xb769f681] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(tor_libevent_run_event_loop+0x30) [0x5df3d0] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(do_main_loop+0xc5) [0x4b1a95] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(run_tor_main_loop+0x1ce) [0x49d43e] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(tor_run_main+0x11c5) [0x49e7f5] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(tor_main+0x3f) [0x49b97f] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(main+0x35) [0x49b4d5] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf6) [0xb708d286] (on Tor 0.4.0.0-alpha-dev )
Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(+0x45531) [0x49b531] (on Tor 0.4.0.0-alpha-dev )
...
Dec 29 00:16:38.000 [info] connection_ap_handle_onion(): Got a hidden service request for ID '[scrubbed]'
Dec 29 00:16:38.000 [info] rep_hist_note_used_internal(): New port prediction added. Will continue predictive circ building for 2851 more seconds.
Dec 29 00:16:38.000 [info] connection_ap_handle_onion(): Descriptor is here. Great.
Dec 29 00:16:38.000 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.
Dec 29 00:16:38.000 [info] connection_ap_handshake_attach_circuit(): Intro circ 3827669459 (id: 16167) present and awaiting ACK. Rend circuit 3158169261 (id: 16158). Stalling. (stream 100 sec old)
Dec 29 00:16:38.000 [info] connection_ap_handshake_attach_circuit(): pending-join circ 3976548034 (id: 16156) already here, with intro ack. Stalling. (stream 0 sec old)
...
Dec 29 00:17:55.000 [info] connection_ap_handle_onion(): Found usable descriptor in cache for [scrubbed]. Not fetching..
Dec 29 00:17:55.000 [info] rep_hist_note_used_internal(): New port prediction added. Will continue predictive circ building for 2851 more seconds.
Dec 29 00:17:55.000 [info] connection_ap_handle_onion(): Descriptor is here. Great.

/** Launch a connection to a hidden service directory to fetch a hidden
 * service descriptor using <b>identity_pk</b> to get the necessary keys.
 *
 * A hs_client_fetch_status_t code is returned. */
int
hs_client_refetch_hsdesc(const ed25519_public_key_t *identity_pk)
{
  hs_client_fetch_status_t status;

  tor_assert(identity_pk);

  if (!can_client_refetch_desc(identity_pk, &status)) {
    return status;
  }
/* Return true iff the client can fetch a descriptor for this service public
 * identity key and status_out if not NULL is untouched. If the client can
 * _not_ fetch the descriptor and if status_out is not NULL, it is set with
 * the fetch status code. */
static unsigned int
can_client_refetch_desc(const ed25519_public_key_t *identity_pk,
                        hs_client_fetch_status_t *status_out)
{
...
  /* Without a live consensus we can't do any client actions. It is needed to
   * compute the hashring for a service. */
  if (!networkstatus_get_live_consensus(approx_time())) {
    log_info(LD_REND, "Can't fetch descriptor for service %s because we "
                      "are missing a live consensus. Stalling connection.",
             safe_str_client(ed25519_fmt(identity_pk)));
    status = HS_CLIENT_FETCH_MISSING_INFO;
    goto cannot;
  }
...
  /* Check if fetching a desc for this HS is useful to us right now */
  {
    const hs_descriptor_t *cached_desc = NULL;
    cached_desc = hs_cache_lookup_as_client(identity_pk);
    if (cached_desc && hs_client_any_intro_points_usable(identity_pk,
                                                         cached_desc)) {
      log_info(LD_GENERAL, "We would fetch a v3 hidden service descriptor "
                           "but we already have a usable descriptor.");
      status = HS_CLIENT_FETCH_HAVE_DESC;
      goto cannot;
    }
  }
...
 cannot:
  if (status_out) {
    *status_out = status;
  }
  return 0;

Also nearly hidden and probably unrelated the pathbias Bug (#27228) is lingering:

Dec 29 00:16:26.000 [info] pathbias_should_count(): Bug: Circuit 16109 is now being counted despite being ignored in the past. Purpose is Measuring circuit timeout, path state is new (on Tor 0.4.0.0-alpha-dev )

comment:8 Changed 11 months ago by asn

Given the latest information from traumschule, I have more doubts that the comment:3 theory is what's causing the issue here.

Here is another theory from combining the two log files:

0) Everything was going great with Tor. It was connecting to a few HSes (both v2 and v3s) just fine until 14:04:28.000. And then the clock skew happened: Dec 29 00:10:43.000 [notice] Your system clock just jumped 36372 seconds forward; assuming established circuits no longer work..

1) After the clock skew, Tor no longer has a live consensus, but it still wants to connect to the HSv3s it has the descriptors for.

2) However, hs_cache_lookup_as_client() cannot see the cached descriptors because it calls cached_client_descriptor_has_expired which claims that the descriptor is expired if there is no live consensus! [info] can_client_refetch_desc(): Can't fetch descriptor for service $onion_service. Stalling connection

2) The SOCKS connection is now stalled with AP_CONN_STATE_RENDDESC_WAIT because of (1).

4) Then after a bit we get a live consensus and retry_all_socks_conn_waiting_for_desc() gets called, this brings up the SOCKS connection from (2) which is still blocked at AP_CONN_STATE_RENDDESC_WAIT. However, now the descriptor is no longer expired because we have a live consensus, and hence there is a valid HS desc in the cache and hence the bug triggers.


The same issue which is that a descriptor cannot be seen when the consensus is not live, is probably also the issue of the original poster, so perhaps we can simplify David's patch by not introducing mark_conn_as_waiting_for_circuit() and calling it when this branch triggers. But we still want to remove the Bug() from there.

Or is there something more elegant to do here?

Removing from needs_revision to figure out next steps.

comment:9 Changed 10 months ago by traumschule

Follow-up: #28992

Changed 10 months ago by traumschule

Great, branch ticket28669_035_01 fixes the bug. Still 562 appearenses of circuits are not both ready. Stalling conn. (#28962)

comment:10 Changed 10 months ago by asn

Status: needs_reviewmerge_ready

Let's call this merge_ready since it fixes the issue and the patch looks legit.

comment:11 Changed 10 months ago by nickm

Milestone: Tor: 0.4.0.x-finalTor: 0.3.5.x-final
Resolution: fixed
Status: merge_readyclosed

Merged to 0.3.5 and forward.

Note: See TracTickets for help on using tickets.