Bootstrapping Tor client hangs up on its chosen guards

Closed (moved) Issue created 11 years ago by Roger Dingledine

Here are some log snippets from a freshly bootstrapping Tor client:

Sep 27 03:24:11.369 [info] add_an_entry_guard(): Chose $3DD2523F1B241F01D54818F327714CDA7F54238B~wannabe2 at 144.76.110.103 as new entry guard.
Sep 27 03:24:11.369 [info] log_entry_guards(): wannabe2 [3DD2523F1B241F01D54818F327714CDA7F54238B] (up never-contacted)
Sep 27 03:24:11.371 [info] add_an_entry_guard(): Chose $821FAFA24651419FAE76A5C80BA79EA7E0743C4D=servbr3 at 212.83.158.171 as new entry guard.
Sep 27 03:24:11.371 [info] log_entry_guards(): wannabe2 [3DD2523F1B241F01D54818F327714CDA7F54238B] (up never-contacted),servbr3 [821FAFA24651419FAE76A5C80BA79EA7E0743C4D] (up never-contacted)
Sep 27 03:24:11.373 [info] add_an_entry_guard(): Chose $CFB4CB595C2504444E8F8F76CAC42F0C97200734=SECxFreeBSD64 at 195.169.125.226 as new entry guard.
Sep 27 03:24:11.373 [info] log_entry_guards(): wannabe2 [3DD2523F1B241F01D54818F327714CDA7F54238B] (up never-contacted),servbr3 [821FAFA24651419FAE76A5C80BA79EA7E0743C4D] (up never-contacted),SECxFreeBSD64 [CFB4CB595C2504444E8F8F76CAC42F0C97200734] (up never-contacted)

So now I have my three guards, and they're all considered up (the consensus said so) and never-contacted (that's how they start out).

Sep 27 03:24:11.362 [info] launch_descriptor_downloads(): Launching 45 requests
 for 4084 microdescs, 92 at a time

Then choose_random_entry_impl() runs this code:

      if (!entry->made_contact) {
        /* Always start with the first not-yet-contacted entry
         * guard. Otherwise we might add several new ones, pick
         * the second new one, and now we've expanded our entry
         * guard list without needing to. */
        goto choose_and_finish;
      }

so all 45 of those microdescriptor requests decide to ask wannabe2. It queues its 45 directory attempts.

So it connects to wannabe2:

Sep 27 03:24:11.373 [debug] directory_initiate_command_rend(): Initiating microdescriptor fetch
Sep 27 03:24:11.373 [info] connection_ap_make_link(): Making internal direct tunnel to 144.76.110.103:443 ...
[...]
Sep 27 03:24:11.373 [debug] circuit_handle_first_hop(): Looking for firsthop '144.76.110.103:443'
Sep 27 03:24:11.373 [info] circuit_handle_first_hop(): Next router is $3DD2523F1B241F01D54818F327714CDA7F54238B~3DD2523F1B241F01D54 at 144.76.110.103: Not connected. Connecting.
[...]
Sep 27 03:24:11.373 [debug] connection_connect(): Connecting to "144.76.110.103":443.
Sep 27 03:24:11.373 [debug] connection_connect(): Connection to "144.76.110.103":443 in progress (sock 13).
Sep 27 03:24:11.373 [debug] connection_add_impl(): new conn type OR, socket 13, address 144.76.110.103, n_conns 7.
[...]
Sep 27 03:24:11.373 [debug] circuit_handle_first_hop(): connecting in progress (or finished). Good.
[...]
Sep 27 03:24:11.484 [debug] connection_or_finished_connecting(): OR connect() to router at 144.76.110.103:443 finished.
Sep 27 03:24:11.485 [debug] connection_tls_start_handshake(): starting TLS handshake on fd 13
[...]
Sep 27 03:24:11.937 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 144.76.110.103:443; Waiting for CERTS cell
Sep 27 03:24:11.937 [debug] connection_or_process_cells_from_inbuf(): 13: starting, inbuf_datalen 1473 (0 pending in tls object).
Sep 27 03:24:11.938 [info] channel_tls_process_certs_cell(): Got some good certificates from 144.76.110.103:443: Authenticated it.
[...]
Sep 27 03:24:11.938 [debug] circuit_n_chan_done(): chan to NULL/144.76.110.103:443, status=1
Sep 27 03:24:11.938 [debug] circuit_n_chan_done(): Found circ, sending create cell.
[...]
Sep 27 03:24:11.938 [info] circuit_send_next_onion_skin(): First hop: finished sending CREATE_FAST cell to '$3DD2523F1B241F01D54818F327714CDA7F54238B~wannabe2 at 144.76.110.103'
Sep 27 03:24:11.938 [info] channel_tls_process_netinfo_cell(): Got good NETINFO cell from 144.76.110.103:443; OR connection is now open, using protocol version 4. Its ID digest is 3DD2523F1B241F01D54818F327714CDA7F54238B. Our address is apparently 76.99.61.6.

But wait, what's that? We send our CREATE_FAST cells before we get our NETINFO cell? Not sure if that is a bug.

In any case, time passes, and we give up on all those circuits:

Sep 27 03:25:12.150 [notice] No circuits are opened. Relaxed timeout for circuit 3 (a General-purpose client 1-hop circuit in state doing handshakes with channel state open) to 60000ms. However, it appears the circuit has timed out anyway. 0 guards are live.
Sep 27 03:25:12.150 [info] circuit_expire_building(): Abandoning circ 3 144.76.110.103:443:-2147451209 (state 0,0:doing handshakes, purpose 5, len 1)
Sep 27 03:25:12.150 [info] internal circ (length 1, last hop 3DD2523F1B241F01D54818F327714CDA7F54238B): $3DD2523F1B241F01D54818F327714CDA7F54238B(waiting for keys)
[...]
Sep 27 03:25:12.150 [info] circuit_build_failed(): Our circuit failed to get a response from the first hop (144.76.110.103:443). I'm going to try to rotate to a better connection.
Sep 27 03:25:12.150 [info] entry_guard_register_connect_status(): Unable to connect to entry guard 'wannabe2' (3DD2523F1B241F01D54818F327714CDA7F54238B). Marking as unreachable.

and then we give up on the 45 directory requests:

Sep 27 03:25:12.150 [info] connection_ap_fail_onehop(): Closing one-hop stream to '$3DD2523F1B241F01D54818F327714CDA7F54238B/144.76.110.103' because the OR conn just failed.
Sep 27 03:25:12.150 [info] connection_ap_fail_onehop(): Closing one-hop stream to '$3DD2523F1B241F01D54818F327714CDA7F54238B/144.76.110.103' because the OR conn just failed.
[and 43 more]

So let's try again! We marked our first guard down so now we'll ask our second guard:

Sep 27 03:25:14.151 [debug] launch_descriptor_downloads(): There are enough downloadable microdescs to launch requests.
Sep 27 03:25:14.151 [info] launch_descriptor_downloads(): Launching 45 requests for 4084 microdescs, 92 at a time

Sep 27 03:25:14.159 [info] onion_pick_cpath_exit(): Using requested exit node '$821FAFA24651419FAE76A5C80BA79EA7E0743C4D~821FAFA24651419FAE7 at 212.83.158.171'
[...]
Sep 27 03:25:14.160 [info] circuit_handle_first_hop(): Next router is $821FAFA24651419FAE76A5C80BA79EA7E0743C4D~821FAFA24651419FAE7 at 212.83.158.171: Not connected. Connecting.
[...]
Sep 27 03:25:14.160 [debug] connection_connect(): Connecting to "212.83.158.171":443.
Sep 27 03:25:14.160 [debug] connection_connect(): Connection to "212.83.158.171":443 in progress (sock 13).
Sep 27 03:25:14.160 [debug] connection_add_impl(): new conn type OR, socket 13, address 212.83.158.171, n_conns 6.
[...]
Sep 27 03:25:14.261 [debug] connection_or_finished_connecting(): OR connect() to router at 212.83.158.171:443 finished.
[...]
Sep 27 03:25:14.683 [info] channel_tls_process_versions_cell(): Negotiated version 4 with 212.83.158.171:443; Waiting for CERTS cell
[...]
Sep 27 03:25:14.683 [info] channel_tls_process_certs_cell(): Got some good certificates from 212.83.158.171:443: Authenticated it.

And here's where it gets good:

Sep 27 03:25:14.683 [info] entry_guard_register_connect_status(): Connected to new entry guard 'servbr3' (821FAFA24651419FAE76A5C80BA79EA7E0743C4D). Marking earlier entry guards up. 0/3 entry guards usable/new.
Sep 27 03:25:14.683 [info] log_entry_guards(): wannabe2 [3DD2523F1B241F01D54818F327714CDA7F54238B] (up made-contact),servbr3 [821FAFA24651419FAE76A5C80BA79EA7E0743C4D] (up made-contact),SECxFreeBSD64 [CFB4CB595C2504444E8F8F76CAC42F0C97200734] (up never-contacted)
Sep 27 03:25:14.683 [debug] channel_do_open_actions(): New entry guard was reachable, but closing this connection so we can retry the earlier entry guards.
Sep 27 03:25:14.683 [debug] circuit_n_chan_done(): chan to NULL/212.83.158.171:443, status=0
Sep 27 03:25:14.683 [info] circuit_n_chan_done(): Channel failed; closing circ.
Sep 27 03:25:14.683 [info] circuit_build_failed(): Our circuit died before the first hop with no connection
Sep 27 03:25:14.683 [info] entry_guard_register_connect_status(): Unable to connect to entry guard 'servbr3' (821FAFA24651419FAE76A5C80BA79EA7E0743C4D). Marking as unreachable.

And then we close all the directory requests that we'd queued for it because we closed the connection:

Sep 27 03:25:14.683 [info] connection_ap_fail_onehop(): Closing one-hop stream to '$821FAFA24651419FAE76A5C80BA79EA7E0743C4D/212.83.158.171' because the OR conn just failed.
[and 44 more]

And then my client launches 45 new directory requests, which spread over the three available guards and things bootstrap ok.

It's the "New entry guard was reachable, but closing this connection so we can retry the earlier entry guards." behavior that is the bug here. I think the fix might be to mark newly selected guards as already-contacted when you first pick them at the beginning of time.

Bootstrapping Tor client hangs up on its chosen guards

Linked items ... 0

Activity