Opened 10 months ago

Closed 5 months ago

#28281 closed task (implemented)

outline of high-level bootstrap tracker abstractions

Reported by: catalyst Owned by: catalyst
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: s8-bootstrap, bootstrap-arch, 040-deferred-20190220
Cc: brade, mcs Actual Points: 2
Parent ID: #28018 Points: 0.5
Reviewer: Sponsor: Sponsor19

Description

This is a placeholder to summarize the high-level bootstrap tracking abstractions I talked about with Nick.

Child Tickets

Change History (14)

comment:1 Changed 10 months ago by catalyst

Working from the list in #27103, this is a hopefully useful breakdown of the high-level phases of bootstrapping:

  1. making the initial OR_CONN to any relay or bridge (see #27103)
    • this should track the farthest progress that any individual attempt has made so far
    • "farthest progress" should probably be reset under some circumstances (see #27691)
  2. directory info
    1. one-hop circuit, if needed?
    2. bridge descriptor, if bridges are used? (see #11966)
    3. consensus
    4. descriptors (usually microdescs for clients)
  3. building a useful application circuit
    1. first OR_CONN to a guard if we're not using bridges
    2. intermediate progress such as noting when each hop gets built (see #27104)

In a pubsub framework, (1) will need to subscribe to events from connections, and keep track of the maximum progress any one connection has gotten.

There should be an abstraction that tracks circuit-building progress. We can use it for (2)(a) and (3)(b).

We could make separate trackers for (2)(c) and (2)(d). As a bonus, those trackers could handle the scaling of incremental progress for downloads.

If we make a tracker that subscribes to both circuit and connection events, we could cleanly solve bugs such as #25061. It would also work for (3)(a), which needs to know both circuit type (application circuit) and connection state.

Version 0, edited 10 months ago by catalyst (next)

comment:2 Changed 10 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:3 Changed 9 months ago by catalyst

After chatting some with ahf, I thought it might be a good idea to write down here a proposed new set of bootstrap phases. The numbering of the new phases is yet to be determined, but they're meant to be in order. (Some phases might get skipped, and that's OK.)

Some design considerations include the spacing between phases. Right now many of them seem separated by 5%, which seems to be a decent amount of progress as seen by the user's eye. Any increments smaller than this aren't necessarily meaningful to show to the user, but we could use the smaller increments to add phase names that could give a more accurate picture about where something is broken than the user currently gets.

There are two gaps in the existing phases, one of which corresponds to incremental progress downloading descriptors. (The other one doesn't seem to currently be used to display incremental progress downloading a consensus.)

undef
shouldn't be visible to controllers or users
starting
can stay the same

The following high-level grouping of phases should deal with the first outbound connection to a Tor relay. This might be to a directory cache, a proxy, or a guard/bridge. Here we use "first" to mean whichever one has made the most progress so far, in case we open multiple connections before any one is fully open.

connecting
the initial outbound TCP connection toward the Tor network, for any purpose, which might include a firewall-bypassing proxy, or a pluggable transport; corresponds to OR_CONN_STATE_CONNECTING
proxy_handshake
the initial handshake with a firewall-bypass proxy or PT; corresponds to OR_CONN_STATE_PROXY_HANDSHAKING; might be skipped if not using proxies or PTs

Maybe insert additional phases here for intermediate proxy handshaking steps?

tls_handshake
the TLS handshake with the first relay; corresponds to OR_CONN_STATE_TLS_HANDSHAKING or related ORCONN states (some of these involve TLS protocol renegotiations to deal with older link protocol versions)
open
the Tor link protocol is open to the first relay and can send and receive cells

The following high-level grouping of phases should deal with receiving and verifying directory information. Some of these might get skipped if we're starting from cached info.

dir_circ_create
corresponds to the CREATE command opening the first circuit to a directory server; maybe reuse the existing onehop_create tag, because it already mostly means this? it might be better to have the more normalized naming though
dir_circ_created
corresponds to the CREATED response that means the first directory circuit is created
dir_stream_begin
corresponds to the BEGIN_DIR command
dir_stream_connected
corresponds to the CONNECTED response to the BEGIN_DIR command; the existing requesting_status phase actually gets sent here instead of where the corresponding work actually begins
requesting_bridge_desc
start downloading the bridge descriptor, if we're connected to a bridge; this is related to #11966
requesting_status
this can stay the same
loading_status
this can stay the same

Right now there is a gap (from 20 to 40) between these two phases, but we don't currently fill it in with incremental progress in downloading the consensus. Maybe we should?

loading_keys
this can stay the same
requesting_descriptors
this can stay the same
loading_descriptors
this can stay the same

Right now there is a gap between loading_descriptors and the next phase (from 50 to 80), which we fill in with incremental progress. Maybe we should retain this gap and the incremental progress display?

The next high-level grouping of phases corresponds to connecting to a guard, if bridges aren't in use. Similarly to the connecting grouping, these represent the furthest progress that any one attempt has made so far.

guard_connecting
same as connecting but for a guard
guard_proxy_handshake
same as proxy_handshake but for a guard
guard_tls_handshake
same as tls_handshake but for a guard
guard_open
same as open but for a guard
circ_create
same as dir_circ_create except for an application circuit
circ_created
same as dir_created except for an application circuit
circ_extend
corresponds to EXTEND command for the second hop
circ_extended
corresponds to EXTENDED response for the second hop
circ_exit_extend
corresponds to EXTEND command for the exit
circ_exit_extended
corresponds to EXTENDED response for the exit
done
same as existing phase

comment:4 in reply to:  3 ; Changed 9 months ago by arma

Good stuff!

Replying to catalyst:

Right now there is a gap (from 20 to 40) between these two phases, but we don't currently fill it in with incremental progress in downloading the consensus. Maybe we should?

A thought for this part in particular: incremental progress at fetching descriptors can be confirmed as we go (we verify that we got the bytes we wanted). But if we try to show incremental progress at fetching a consensus, but then we get it and we don't like it, we'll find ourselves going backwards in bootstrap progress. Not the end of the world but maybe something to avoid getting ourselves into if we can.

But most importantly of all: this particular incremental-progress dilemma can be totally deferred until everything else is done and in place. :)

comment:5 Changed 9 months ago by catalyst

I thought about this a bit more, and I think we might want to disambiguate the connection progress messages a bit. We probably shouldn't always report the first TCP connection the same way, because it means something different to the user if the TCP connection to the first proxy fails, compared to if the TCP connection to the first relay fails. So we shouldn't use the raw connection progress indications from the ORCONN code without decoding them first a bit.

I think if we know we're connecting through a proxy, we should report the first TCP connection as something like proxy_connecting and proxy_connected. But then this gets confusingly named with the proxy handling code in connection.c that talks the proxy protocol and makes connection requests to the proxy. Maybe we should report the progress of asking the proxy to make the relay connection as connecting and connected?

comment:6 Changed 9 months ago by catalyst

We might need to further disambiguate between PT proxies and firewall bypass proxies.

I think we have a terminology quirk we need to be mindful of: Tor Browser refers to PT bridges as simply "bridges". It also uses "proxy" to refer to only firewall bypass proxies.

comment:7 in reply to:  4 Changed 9 months ago by mcs

Replying to arma:

A thought for this part in particular: incremental progress at fetching descriptors can be confirmed as we go (we verify that we got the bytes we wanted). But if we try to show incremental progress at fetching a consensus, but then we get it and we don't like it, we'll find ourselves going backwards in bootstrap progress. Not the end of the world but maybe something to avoid getting ourselves into if we can.

We would like to avoid showing "backwards" progress in the Tor Launcher UI, but there are ways the situation you described could be addressed. Fr example, if each bootstrap phase that reports incremental progress was delineated clearly with "start" and "end" messages, then one could create a "checkmark" based UI that showed a second attempt rather than moving a simple progress bar backwards. For example:

  Connecting to the Tor network
  ...
  x Fetching relay information                [######           ]  FAILED; will retry
  ✓ Fetching relay information (attempt #2)   [#################]  
  ...

A general principle that I mentioned before is that the tor daemon itself should enable many different types of user interfaces. Making a rich set of info available that includes start and end milestones will make that possible.

comment:8 in reply to:  6 Changed 9 months ago by mcs

Replying to catalyst:

We might need to further disambiguate between PT proxies and firewall bypass proxies.

I think we have a terminology quirk we need to be mindful of: Tor Browser refers to PT bridges as simply "bridges". It also uses "proxy" to refer to only firewall bypass proxies.

Terminology used in the browser UI has evolved over time and will almost certainly do so again (especially if user testing shows that changing it will help users better understand what is going on). Also, Brave, Firefox, and other clients may choose to present completely different terminology to their end-users. To me this just means that tor log and control port messages should use whatever terminology will make the most sense to developers and experts.

+1 on making sure clients can disambiguate between messages related to PT proxies vs. firewall bypass proxies (and other similar components). It will be a big win if we can show users exactly where things went wrong. Today, in many cases, people need to guess at the next step to take if they are unable to connect to the network. Will a local proxy help? Will using a different PT bridge help? Maybe my device lacks a working Internet connection? Wait, what time is it?

comment:9 Changed 8 months ago by catalyst

Keywords: bootstrap-arch added

comment:10 Changed 8 months ago by catalyst

Keywords: s8-bootstrap added; s8-boostrap removed

comment:11 Changed 8 months ago by catalyst

Sponsor: Sponsor8Sponsor19

comment:12 Changed 6 months ago by nickm

Keywords: 040-deferred-20190220 added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

Deferring 51 tickets from 0.4.0.x-final. Tagging them with 040-deferred-20190220 for visibility. These are the tickets that did not get 040-must, 040-can, or tor-ci.

comment:13 Changed 5 months ago by catalyst

I'll probably close this soon, under the assumption that #28928 took care of recording this knowledge somewhere more stable. People should please let me know if they think there's still stuff to document.

comment:14 Changed 5 months ago by catalyst

Actual Points: 2
Resolution: implemented
Status: assignedclosed

I believe all the important information in this ticket is now captured in control-spec.txt.

Note: See TracTickets for help on using tickets.