Opened 6 months ago

Last modified 21 hours ago

#27308 assigned enhancement

report bootstrap phase when we actually start, not just unblock something

Reported by: catalyst Owned by: catalyst
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: usability, ux, ux-team, bootstrap, 035-roadmap-subtask, 035-triaged-in-20180711, s8-bootstrap, bootstrap-arch, 040-deferred-20190220
Cc: brade, mcs, linda, isabela, iry, intrigeri Actual Points:
Parent ID: #28018 Points:
Reviewer: Sponsor: Sponsor19-can

Description (last modified by catalyst)

Right now many bootstrap events get reported when the preceding task has completed. This makes it somewhat harder to tell what has gone wrong if bootstrap progress stalls.

[edit: The following isn't necessarily the best way to fix this. It might be better to figure out how to report starting something when actually starting it.]

We should add completion milestones to bootstrap reporting. This makes bootstrap reporting more future-proof. If in the future we add a time-consuming task with (no bootstrap reporting) between two existing bootstrap tasks, it will be a little more obvious what's going on.

For example, say we have task X followed by task Z, but then we add a lengthy task Y without adding bootstrap reporting to it. In the old scheme without completion milestones, if Y stalls, the user sees:

  • starting X
  • starting Z
  • [hang]

The user thinks Z has already started when no such thing has happened because Y is still in progress. If we add completion milestones, the user will see:

  • starting X
  • finished X
  • starting Z
  • finishing Z

in a normal bootstrap. If something gets stuck in task Y, the user will see:

  • starting X
  • finished X
  • [hang]

This will make it more clear that something got stuck in between tasks.

In a one-line display like Tor Launcher, the completion milestones will normally flash by quickly and not be very visible to users. Completion milestones might make the NOTICE logs a bit more verbose.

Child Tickets

TicketTypeStatusOwnerSummary
#28926enhancementnewimprove reporting of directory bootstrap progress

Change History (12)

comment:1 Changed 6 months ago by catalyst

Type: defectenhancement

comment:2 Changed 6 months ago by catalyst

The following bootstrap milestones (maybe not an exhaustive list) get reported by the preceding step, not by code that actually starts the next milestone:

  • BOOTSTRAP_STATUS_REQUESTING_STATUS (20)
  • BOOTSTRAP_STATUS_HANDSHAKE (-2)
  • BOOTSTRAP_STATUS_REQUESTING_DESCRIPTORS (45)
  • BOOTSTRAP_STATUS_CONN_OR (80)

BOOTSTRAP_STATUS_REQUESTING_STATUS is reported by circuit_build_no_more_hops() when it observes that a single-hop circuit has completed.

BOOTSTRAP_STATUS_HANDSHAKE is reported by connection_or_finished_connecting() when it probably should be reported by connection_tls_start_handshake().

BOOTSTRAP_STATUS_REQUESTING_DESCRIPTORS is reported by update_router_have_minimum_dir_info(), which doesn't actually initiate the descriptor downloads.

BOOTSTRAP_STATUS_CONN_OR is reported by update_router_have_minimum_dir_info() when it probably should actually be reported by the connection launch code in circuit_handle_first_hop().

comment:3 Changed 5 months ago by catalyst

Milestone: Tor: 0.3.5.x-finalTor: 0.3.6.x-final

Move my post-freeze 0.3.5 items to 0.3.6.

comment:4 Changed 4 months ago by nickm

Parent ID: #22266#28018

comment:5 Changed 4 months ago by catalyst

Sponsor: Sponsor8Sponsor8-can

comment:6 Changed 4 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:7 Changed 2 months ago by catalyst

rough notes -- phases marked with * are in some way implicit, probably in a way that gives a misleading idea of what's going on or what's gone wrong.

-1 undef

-2 handshake
   this gets decoded to handshake_dir or handshake_or depending on
   whether we have a consensus yet. we should fix this, because it
   causes usability problems.

00 starting

05 conn_dir *
   circuit_handle_first_hop in circuitbuild.c, if it's about to launch
   a one-hop circuit

10 handshake_dir *
   decoded in control_event_bootstrap from handshake -- implicit in
   that it doesn't correspond to the actual TCP connect

15 onehop_create
   circuit_send_first_onion_skin in circuitbuild.c -- assumes
   directory lookups are the only reason to build one-hop circuits

20 requesting_status *
   circuit_build_no_more_hops in circuitbuild.c
   -- this implicit and doesn't mean the request actually got sent; it
   just means the circuit is built. the begin_dir might not have been
   sent yet, etc.

25 loading_status *
   connection_edge_process_relay_cell_not_open -- probably actually
   the completion of a begin_dir command

40 loading_keys *
   connection_edge_process_relay_cell_not_open -- same as
   loading_status but infers from consensus_is_waiting_for_certs

45 requesting_descriptors *
   update_router_have_minimum_dir_info in nodelist.c -- not the actual
   request but actually that we noticed that we need more descriptors

50 loading_descriptors
   connection_edge_process_relay_cell_not_open in relay.c -- similar
   to loading_status and loading_keys

   load_downloaded_routers in dirclient.c -- if
   router_load_routers_from_string says it added new routers

   handle_response_fetch_certificate in dirclient.c --
   microdescs_add_to_cache says it added new ones

80 conn_or *
   update_router_have_minimum_dir_info in nodelist.c -- basicallly
   once we have the minimum amount of directory info to build
   circuits, but it doesn't mean we've necessarily launched any yet

85 handshake_or *
   decoded in control_event_bootstrap from handshake

90 circuit_create
   circuit_send_first_onion_skin in circuitbuild.c, when it's building
   a circuit that's not one-hop

100 done
   circuit_build_no_more_hops in circuitbuild.c

comment:8 Changed 2 months ago by catalyst

We probably should scope this a bit better. I think there are two categories of "previous phase reports progress on starting the following phase":

  1. reporting our best guess as to what the next asynchronous progress will occur next that's been unblocked by what just happened
  2. reporting progress of the next phase when we're about to call into code that almost certainly begins the phase we're reporting that we're starting

(1) is more important to fix, and #27167 fixes some of them.
(2) is less important to fix, because the distinction is only important to someone who's familiar with the code. In these cases, it's very unlikely that progress will be interrupted or blocked before the actual work begins on the next phase.

comment:9 Changed 2 months ago by catalyst

Description: modified (diff)
Summary: add completion milestones to bootstrapreport bootstrap phase when we actually start, not just unblock something

I'm no longer convinced that adding completion milestones is the best way to fix this class of problems. I think we want to report starting something when we've actually started it, not when we think we've unblocked it from starting.

I've been functionally using this as a ticket for tracking where we report misleading implicit progress in bootstrap phases.

Edited the summary and description accordingly.

comment:10 Changed 6 weeks ago by catalyst

Sponsor: Sponsor8-canSponsor19-can

comment:11 Changed 6 weeks ago by catalyst

Keywords: bootstrap-arch added

comment:12 Changed 21 hours ago by nickm

Keywords: 040-deferred-20190220 added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

Deferring 51 tickets from 0.4.0.x-final. Tagging them with 040-deferred-20190220 for visibility. These are the tickets that did not get 040-must, 040-can, or tor-ci.

Note: See TracTickets for help on using tickets.