Right now many bootstrap events get reported when the preceding task has completed. This makes it somewhat harder to tell what has gone wrong if bootstrap progress stalls.
[edit: The following isn't necessarily the best way to fix this. It might be better to figure out how to report starting something when actually starting it.]
We should add completion milestones to bootstrap reporting. This makes bootstrap reporting more future-proof. If in the future we add a time-consuming task with (no bootstrap reporting) between two existing bootstrap tasks, it will be a little more obvious what's going on.
For example, say we have task X followed by task Z, but then we add a lengthy task Y without adding bootstrap reporting to it. In the old scheme without completion milestones, if Y stalls, the user sees:
starting X
starting Z
[hang]
The user thinks Z has already started when no such thing has happened because Y is still in progress. If we add completion milestones, the user will see:
starting X
finished X
starting Z
finishing Z
in a normal bootstrap. If something gets stuck in task Y, the user will see:
starting X
finished X
[hang]
This will make it more clear that something got stuck in between tasks.
In a one-line display like Tor Launcher, the completion milestones will normally flash by quickly and not be very visible to users. Completion milestones might make the NOTICE logs a bit more verbose.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
The following bootstrap milestones (maybe not an exhaustive list) get reported by the preceding step, not by code that actually starts the next milestone:
BOOTSTRAP_STATUS_REQUESTING_STATUS (20)
BOOTSTRAP_STATUS_HANDSHAKE (-2)
BOOTSTRAP_STATUS_REQUESTING_DESCRIPTORS (45)
BOOTSTRAP_STATUS_CONN_OR (80)
BOOTSTRAP_STATUS_REQUESTING_STATUS is reported by circuit_build_no_more_hops() when it observes that a single-hop circuit has completed.
BOOTSTRAP_STATUS_HANDSHAKE is reported by connection_or_finished_connecting() when it probably should be reported by connection_tls_start_handshake().
BOOTSTRAP_STATUS_REQUESTING_DESCRIPTORS is reported by update_router_have_minimum_dir_info(), which doesn't actually initiate the descriptor downloads.
BOOTSTRAP_STATUS_CONN_OR is reported by update_router_have_minimum_dir_info() when it probably should actually be reported by the connection launch code in circuit_handle_first_hop().
rough notes -- phases marked with * are in some way implicit, probably in a way that gives a misleading idea of what's going on or what's gone wrong.
-1 undef-2 handshake this gets decoded to handshake_dir or handshake_or depending on whether we have a consensus yet. we should fix this, because it causes usability problems.00 starting05 conn_dir * circuit_handle_first_hop in circuitbuild.c, if it's about to launch a one-hop circuit10 handshake_dir * decoded in control_event_bootstrap from handshake -- implicit in that it doesn't correspond to the actual TCP connect15 onehop_create circuit_send_first_onion_skin in circuitbuild.c -- assumes directory lookups are the only reason to build one-hop circuits20 requesting_status * circuit_build_no_more_hops in circuitbuild.c -- this implicit and doesn't mean the request actually got sent; it just means the circuit is built. the begin_dir might not have been sent yet, etc.25 loading_status * connection_edge_process_relay_cell_not_open -- probably actually the completion of a begin_dir command40 loading_keys * connection_edge_process_relay_cell_not_open -- same as loading_status but infers from consensus_is_waiting_for_certs45 requesting_descriptors * update_router_have_minimum_dir_info in nodelist.c -- not the actual request but actually that we noticed that we need more descriptors50 loading_descriptors connection_edge_process_relay_cell_not_open in relay.c -- similar to loading_status and loading_keys load_downloaded_routers in dirclient.c -- if router_load_routers_from_string says it added new routers handle_response_fetch_certificate in dirclient.c -- microdescs_add_to_cache says it added new ones80 conn_or * update_router_have_minimum_dir_info in nodelist.c -- basicallly once we have the minimum amount of directory info to build circuits, but it doesn't mean we've necessarily launched any yet85 handshake_or * decoded in control_event_bootstrap from handshake90 circuit_create circuit_send_first_onion_skin in circuitbuild.c, when it's building a circuit that's not one-hop100 done circuit_build_no_more_hops in circuitbuild.c
We probably should scope this a bit better. I think there are two categories of "previous phase reports progress on starting the following phase":
reporting our best guess as to what the next asynchronous progress will occur next that's been unblocked by what just happened
reporting progress of the next phase when we're about to call into code that almost certainly begins the phase we're reporting that we're starting
(1) is more important to fix, and #27167 (moved) fixes some of them.
(2) is less important to fix, because the distinction is only important to someone who's familiar with the code. In these cases, it's very unlikely that progress will be interrupted or blocked before the actual work begins on the next phase.
I'm no longer convinced that adding completion milestones is the best way to fix this class of problems. I think we want to report starting something when we've actually started it, not when we think we've unblocked it from starting.
I've been functionally using this as a ticket for tracking where we report misleading implicit progress in bootstrap phases.
Edited the summary and description accordingly.
Trac: Description: Right now many bootstrap events get reported when the preceding task has completed. This makes it somewhat harder to tell what has gone wrong if bootstrap progress stalls.
We should add completion milestones to bootstrap reporting. This makes bootstrap reporting more future-proof. If in the future we add a time-consuming task with (no bootstrap reporting) between two existing bootstrap tasks, it will be a little more obvious what's going on.
For example, say we have task X followed by task Z, but then we add a lengthy task Y without adding bootstrap reporting to it. In the old scheme without completion milestones, if Y stalls, the user sees:
starting X
starting Z
[hang]
The user thinks Z has already started when no such thing has happened because Y is still in progress. If we add completion milestones, the user will see:
starting X
finished X
starting Z
finishing Z
in a normal bootstrap. If something gets stuck in task Y, the user will see:
starting X
finished X
[hang]
This will make it more clear that something got stuck in between tasks.
In a one-line display like Tor Launcher, the completion milestones will normally flash by quickly and not be very visible to users. Completion milestones might make the NOTICE logs a bit more verbose.
to
Right now many bootstrap events get reported when the preceding task has completed. This makes it somewhat harder to tell what has gone wrong if bootstrap progress stalls.
[edit: The following isn't necessarily the best way to fix this. It might be better to figure out how to report starting something when actually starting it.]
We should add completion milestones to bootstrap reporting. This makes bootstrap reporting more future-proof. If in the future we add a time-consuming task with (no bootstrap reporting) between two existing bootstrap tasks, it will be a little more obvious what's going on.
For example, say we have task X followed by task Z, but then we add a lengthy task Y without adding bootstrap reporting to it. In the old scheme without completion milestones, if Y stalls, the user sees:
starting X
starting Z
[hang]
The user thinks Z has already started when no such thing has happened because Y is still in progress. If we add completion milestones, the user will see:
starting X
finished X
starting Z
finishing Z
in a normal bootstrap. If something gets stuck in task Y, the user will see:
starting X
finished X
[hang]
This will make it more clear that something got stuck in between tasks.
In a one-line display like Tor Launcher, the completion milestones will normally flash by quickly and not be very visible to users. Completion milestones might make the NOTICE logs a bit more verbose. Summary: add completion milestones to bootstrap to report bootstrap phase when we actually start, not just unblock something
Deferring 51 tickets from 0.4.0.x-final. Tagging them with 040-deferred-20190220 for visibility. These are the tickets that did not get 040-must, 040-can, or tor-ci.