Opened 8 weeks ago

Last modified 5 days ago

#29201 needs_information defect

Tor bootstrap hangs when offline

Reported by: atagar Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: 040-deferred-20190220
Cc: catalyst, gk Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi Nick. When launching a tor process stem uses bootstrap messages to determine when the instance we launch is available. Recently-ish tor changed such that when offline tor bootstrapping does not progress past 0%, printing hundreds of...

Jan 29 11:36:27.000 [warn] Problem bootstrapping. Stuck at 0% (starting): Starting. (Network is unreachable; NOROUTE; count 11; recommendation warn; host 322C6E3A973BC10FC36DE3037AD27BC89F14723B at 212.83.154.33:8443)
Jan 29 11:36:28.000 [warn] Problem bootstrapping. Stuck at 0% (starting): Starting. (Network is unreachable; NOROUTE; count 12; recommendation warn; host F741E5124CB12700DA946B78C9B2DD175D6CD2A1 at 163.172.154.162:9001)
Jan 29 11:36:28.000 [warn] Problem bootstrapping. Stuck at 0% (starting): Starting. (Network is unreachable; NOROUTE; count 14; recommendation warn; host D71B1CA1C9DC7E8CA64158E106AD770A21160FEE at 185.34.33.2:31415)
Jan 29 11:36:29.000 [warn] Problem bootstrapping. Stuck at 0% (starting): Starting. (Network is unreachable; NOROUTE; count 13; recommendation warn; host F2DFE5FA1E4CF54F8E761A6D304B9B4EC69BDAE8 at 129.13.131.140:443)
Jan 29 11:36:30.000 [warn] Problem bootstrapping. Stuck at 0% (starting): Starting. (Network is unreachable; NOROUTE; count 14; recommendation warn; host 47C42E2094EE482E7C9B586B10BABFB67557030B at 185.220.101.34:20034)
Jan 29 11:36:30.000 [warn] Problem bootstrapping. Stuck at 0% (starting): Starting. (Network is unreachable; NOROUTE; count 14; recommendation warn; host B06F093A3D4DFAD3E923F4F28A74901BD4F74EB1 at 178.17.174.14:9001)
Jan 29 11:36:31.000 [warn] Problem bootstrapping. Stuck at 0% (starting): Starting. (Network is unreachable; NOROUTE; count 15; recommendation warn; host CF6D0AAFB385BE71B8E111FC5CFF4B47923733BC at 154.35.175.225:443)

There's a few issues with this...

  1. Poor experience from a user perspective. Deluging the user with hundreds of warnings is pretty unhelpful.
  1. Stem's ability to launch tor processes no longer works when offline.
  1. Stem's integ tests no longer pass when offline. I can sidestep this but first I'd like to confirm if this is the desired behavior from tor or not.

Child Tickets

TicketStatusOwnerSummaryComponent
#29777newRate-limit "Problem bootstrapping" warnings to one every 5 secondsCore Tor/Tor
#29778newatagarStem's ability to launch tor processes no longer works when offlineCore Tor/Stem
#29779needs_informationatagarStem's integ tests no longer pass when offlineCore Tor/Stem

Change History (6)

comment:1 Changed 7 weeks ago by nickm

Cc: catalyst added
Milestone: Tor: 0.4.0.x-final

Going to pass this over to catalyst, since they've done most of the backup work here.

What do you expect to happen when Tor tries to bootstrap when it is offline? Is it just the number of warnings that is unexpected, or is it supposed to get past 0%?

comment:2 Changed 5 weeks ago by nickm

Keywords: 040-deferred-20190220 added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

Deferring 51 tickets from 0.4.0.x-final. Tagging them with 040-deferred-20190220 for visibility. These are the tickets that did not get 040-must, 040-can, or tor-ci.

comment:3 Changed 12 days ago by teor

Status: newneeds_information

What do you expect to happen when Tor tries to bootstrap when it is offline? Is it just the number of warnings that is unexpected, or is it supposed to get past 0%?

I opened #29777 to rate-limit the tor warning.

I would expect 0% when tor is offline: any other number suggests progress when there is none.

I'll open child tickets for the remaining bugs, and then we can work out whether they need changes in Tor or Stem? I will do my best to guess and explain my guess.

comment:4 Changed 11 days ago by atagar

Thanks teor. We've been changing our bootstrap behavior quite a bit of late (first message format in #28731, and now our offline behavior). This is fine, and I'd be happy to work with the network team on the long term bootstrapping behavior we want but this feels a bit ad-hoc. Why did we change this, and are there more bootstrapping changes coming down the pipe?

Nick added a DROPOWNERSHIP command so in tor versions going forward I can avoid parsing stdout logs (#28843). Taking advantage of that is pretty high on my todo list, and will let us change bootstrap log formatting at will. However, behavioral changes like this will still impact Stem.

In this particular case I do not mind the behavior change. Stem's tests already intend to run when we're 0% bootstrapped so I need to investigate this more on my side...

https://gitweb.torproject.org/stem.git/tree/test/runner.py#n629
https://gitweb.torproject.org/stem.git/tree/stem/process.py#n170

Stepping back, maybe we should rethink bootstrapping more fundamentally? In particular...

  1. Bootstrap behavior is undocumented. Traditionally this has been fine because it never changed.

Stem and Txtorcon use bootstrapping to determine "Is my tor process ready to do X?" where X are things like "make a circuit" or "run a hidden service". We don't care about the bootstrap percentage (see below), but we use their implication that tor has become ready to do things.

Our usage is causing you friction. We should use another mechanism or formalize this by documenting tor's bootstrapping behavior including what the various percentages (or messages) mean.

  1. If we're going to rethink bootstrapping maybe we should go further? Why do we even present bootstrapping as percentages? Does this make sense?

Said another way, if "tor is 13% boostrapped" what does that mean? It isn't a time estimate (we're not saying that we're 13% done). It's also not saying that tor is capable of 13% of its capabilities.

I suspect bootstrap percentages might be a holdover from Vidalia's progress bar, which from a user perspective always stunk (descriptor downloads usually dominate bootstrapping so the bar jumped to ~55%, hangs for a minute, then jumps to 100%).

Obligatory video: https://www.youtube.com/watch?v=1V2SHW6Qx_E

Just spitballing, but how about the following?

  1. Drop the bootstrap percentage.
  2. Define bootstrap logging as purely informational (ie. controllers like Stem should not use it).
  3. Add GETINFO commands that answer any "Is tor ready to do X?" inquiries.

comment:5 in reply to:  4 Changed 7 days ago by teor

Replying to atagar:

In this particular case I do not mind the behavior change. Stem's tests already intend to run when we're 0% bootstrapped so I need to investigate this more on my side...

https://gitweb.torproject.org/stem.git/tree/test/runner.py#n629
https://gitweb.torproject.org/stem.git/tree/stem/process.py#n170

Stepping back, maybe we should rethink bootstrapping more fundamentally? In particular...

  1. Bootstrap behavior is undocumented. Traditionally this has been fine because it never changed.

Bootstrap has been documented in the control spec since before Tor 0.2.6:
https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n3773

The CLIENT_STATUS events remain the same:
https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n2481

There is also new bootstrap documentation for 0.4.0 and later:
https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n3496

Stem and Txtorcon

and now Chutney, see #22132

use bootstrapping to determine "Is my tor process ready to do X?" where X are things like "make a circuit" or "run a hidden service". We don't care about the bootstrap percentage (see below), but we use their implication that tor has become ready to do things.

Our usage is causing you friction. We should use another mechanism or formalize this by documenting tor's bootstrapping behavior including what the various percentages (or messages) mean.

If the existing CLIENT_STATUS event or bootstrap tags don't do what you need, or the documentation is incomplete, let's open separate tickets for each issue.

  1. If we're going to rethink bootstrapping maybe we should go further? Why do we even present bootstrapping as percentages? Does this make sense?

Said another way, if "tor is 13% boostrapped" what does that mean? It isn't a time estimate (we're not saying that we're 13% done). It's also not saying that tor is capable of 13% of its capabilities.

I suspect bootstrap percentages might be a holdover from Vidalia's progress bar, which from a user perspective always stunk (descriptor downloads usually dominate bootstrapping so the bar jumped to ~55%, hangs for a minute, then jumps to 100%).

Obligatory video: https://www.youtube.com/watch?v=1V2SHW6Qx_E

Just spitballing, but how about the following?

  1. Drop the bootstrap percentage.

The bootstrap percentage is used by Tor Browser.

  1. Define bootstrap logging as purely informational (ie. controllers like Stem should not use it).

Controllers should use CLIENT_STATUS events or the bootstrap tags, as documented:
https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n2500

  1. Add GETINFO commands that answer any "Is tor ready to do X?" inquiries.

I think the CLIENT_STATUS events do what you're asking for?
https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n2481

comment:6 Changed 5 days ago by gk

Cc: gk added
Note: See TracTickets for help on using tickets.