Opened 7 weeks ago

Closed 4 weeks ago

Last modified 4 weeks ago

#27849 closed defect (fixed)

Bootstrapping hangs with 'SocksPort 0'

Reported by: pabs Owned by: dgoulet
Priority: Very High Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version: Tor: 0.3.4.8
Severity: Normal Keywords: config, regression, backport-034, 035-must
Cc: Actual Points:
Parent ID: Points:
Reviewer: nickm Sponsor: Sponsor8-can

Description

Debian uses a minimal torrc on onion.debian.org (our onionbalance node) and tor from Debian backports. We recently upgraded from 0.3.3.9-1~bpo9+1 to 0.3.4.8-1~bpo9+1 and discovered that our onionbalance services were no longer working, but the backend onion services and our normal onion services were still working. The issue was that the tor daemon on onion.debian.org could not achieve bootstrap. tor 0.3.3.9 worked with the existing config, but 0.3.4.8 did not. Changing the SocksPort parameter from 0 to 6666 allowed tor 0.3.4.8 to achieve bootstrap.

The config does not contain any onion services, because those are setup by onionbalance via the tor control port.

So the issue appears to be that tor is not trying to contact the tor network unless it has a socks port or an onion service or other tor network requiring thing configured.

SocksPort 0
Log notice syslog

#HiddenServiceSingleHopMode 1
#HiddenServiceNonAnonymousMode 1


ControlPort 9051

Child Tickets

Change History (27)

comment:1 Changed 7 weeks ago by arma

For context, this is our better guess for what was going wrong in #27826

comment:2 Changed 7 weeks ago by pabs

arma mentioned on IRC that our configuration gives him this warning, I couldn't find it in our systemd logs though:

Sep 24 22:49:15.149 [warn] SocksPort, TransPort, NATDPort, DNSPort, and ORPort are all undefined, and there aren't any hidden services configured.  Tor will still run, but probably won't do anything.

comment:3 Changed 7 weeks ago by arma

Confirmed, if I start my Tor 0.3.5.2-alpha with this torrc:

log notice stdout
log debug file /tmp/tord-small-log

socksport 0

I get the expected log line

Sep 24 22:49:48.166 [warn] SocksPort, TransPort, NATDPort, DNSPort, and ORPort are all undefined, and there aren't any hidden services configured.  Tor will still run, but probably won't do anything.

and Tor just sits there.

The weird thing is that Tor bootstraps fine if it has enough dir stuff cached -- that is, it'll go ahead and make my circuits if it's ready to try -- but if it doesn't have enough dir stuff cached, it will just sit there at

Sep 24 22:49:48.327 [notice] Bootstrapped 0%: Starting
Sep 24 22:49:48.327 [notice] Starting with guard context "default"

Next step is to compare the behavior in Tor 0.3.3.

comment:4 Changed 7 weeks ago by pabs

Not sure what the correct behaviour should be, maybe it should trigger bootstrap when onionbalance sets up a new onion service and block replying to onionbalance until that is done?

Or maybe having a ControlPort set in the config should also trigger bootstrap?

comment:5 Changed 7 weeks ago by arma

Keywords: regression backport-034 added
Milestone: Tor: 0.3.5.x-final

Confirmed, Tor 0.3.3 with that same torrc file will go fetch dir stuff when it doesn't have it cached.

comment:6 Changed 7 weeks ago by arma

We could say "we told you so, there was a warn message", but (a) it's kind of weird that Tor will build circuits if there is dir info available but won't if there isn't, and (b) it's definitely not good that Tor won't change its mind once you add_onion some new onion services via the control port.

comment:7 in reply to:  6 Changed 7 weeks ago by dgoulet

Priority: MediumVery High

Wow that is a pretty epic bug that went totally unnoticed in 034...

This is related to our mainloop refactoring where we created "Roles" and only enable callbacks if you are configured for that "Role". For example, if you are configured to be a hidden service, we'll enable the hs_service callback or if you are a relay we enable rotate_onion_key. See periodic_events in mainloop.c

Now, here lies the issue. With this configuration that is nothing except ControlPort, tor has basically no roles (see get_my_roles()). So, the rescan_periodic_events() doesn't enable anything because it thinks your tor just is pointless there.

But that isn't true since ControlPort is defined thus some external party can interact with tor and thus it should be working properly.

I believe the issue lies with the "client" detection role where we use the following but that function ignores the control port entirely (and extra points, this is my fault: 67a41b63063370c2 - #26062).

  int is_client = options_any_client_port_set(options);

Apart from the mainloop issue, I don't think not having the control port considered in that function is the problem. Instead, what we should do is enable our periodic events for the "ALL" role if the control port is opened as in "tor will do the basics".

comment:8 Changed 7 weeks ago by pabs

Personally, I think the current behaviour is almost correct, after all just having a control port does not mean that someone will issue a command that will require a connection to the Tor network. The main issue is that commands sent over the control port do not affect the roles and trigger bootstrap when needed.

That said, I do not mind if you prefer to make the control port config trigger the client role since it would work for Debian's situation.

comment:9 Changed 7 weeks ago by nickm

Sponsor: Sponsor8-can

Noting some tickets in 0.3.5 milestone as 8-can. These include tickets that are bugfixes on bugs caused by earlier sponsor8 work.

comment:10 Changed 7 weeks ago by atagar

This nipped me too. tor-prompt now hangs when it spawns a tor instance (#27863).

comment:11 Changed 7 weeks ago by atagar

Summary: incompatibility between tor 0.3.4.8 and onion.debian.org torrc (SocksPort 0)Bootstrapping hangs with 'SocksPort 0'

comment:12 Changed 7 weeks ago by arma

I was going to agree with pabs that the right fix is that roles should look at Tor's config, and if somebody does an add_onion then Tor's config changed, so the role changed ("now we're an onion service role"). But then I realized there are controller commands like "resolve" that expect to be able to use the network. They aren't changing Tor's config or role, they're just trying to *use* Tor.

Also, every role (onion service, relay, client, directory authority, etc) starts by bootstrapping directory information and launching client circuits to make sure things are working. So putting that as part of "ALL" makes a lot of sense.

In fact, I think there's a good argument for having us do "the basics" as dgoulet called them even when there's no controlport -- first because there might be some other exception that we didn't think of that will bite us here again, and second because a person who configures their Tor this way probably expects it to bootstrap (and also now it'll be closer to ready for whatever they ask it to do next). If we think it's a stupid config that nobody should use, we should refuse to start with it, not let it run but then have it do surprising things.

comment:13 in reply to:  12 ; Changed 7 weeks ago by dgoulet

Replying to arma:

I was going to agree with pabs that the right fix is that roles should look at Tor's config, and if somebody does an add_onion then Tor's config changed, so the role changed ("now we're an onion service role").

FYI, in theory, ADD_ONION should add the "HS" role to your tor and thus enable the hs_service callback from the mainloop and in turn enable all other callbacks needed to properly run tor. If an ADD_ONION command failed to bootstrap tor, we have another problem. I'll investigate.

But then I realized there are controller commands like "resolve" that expect to be able to use the network. They aren't changing Tor's config or role, they're just trying to *use* Tor.

Yes, considering the amount of things we can do through the control port, it gets complicated quickly to select which one can enable things or not.

Also, every role (onion service, relay, client, directory authority, etc) starts by bootstrapping directory information and launching client circuits to make sure things are working. So putting that as part of "ALL" makes a lot of sense.

In fact, I think there's a good argument for having us do "the basics" as dgoulet called them even when there's no controlport -- first because there might be some other exception that we didn't think of that will bite us here again, and second because a person who configures their Tor this way probably expects it to bootstrap (and also now it'll be closer to ready for whatever they ask it to do next). If we think it's a stupid config that nobody should use, we should refuse to start with it, not let it run but then have it do surprising things.

Agree. I'm currently aiming at considering the ControlPort set to be part of enabling all basic roles since one can almost do most of the roles through that port...

comment:14 Changed 7 weeks ago by dgoulet

Owner: set to dgoulet
Status: newaccepted

comment:15 Changed 7 weeks ago by dgoulet

Status: acceptedneeds_review

See branch ticket27849_034_01 (based on 034 for backport).

There is a whole lot of "merge commit" in that PR, I have no idea why ... but the diff only shows the fixes at least.

https://github.com/torproject/tor/pull/374

comment:16 Changed 7 weeks ago by nickm

Does this also work for __OwningControllerFD?

comment:17 in reply to:  16 Changed 7 weeks ago by dgoulet

Status: needs_reviewneeds_revision

Replying to nickm:

Does this also work for __OwningControllerFD?

Probably not... options->ControlPort_set doesn't seems affected by __OwningControllerFD. So a change is needed that is looking at if we have a control port but also if we also have this option.

comment:18 in reply to:  13 Changed 7 weeks ago by arma

Replying to dgoulet:

FYI, in theory, ADD_ONION should add the "HS" role to your tor and thus enable the hs_service callback from the mainloop and in turn enable all other callbacks needed to properly run tor. If an ADD_ONION command failed to bootstrap tor, we have another problem. I'll investigate.

I believe this was the original symptom ("my tor just sits there even after onionbalance set up its onion service"), so it's worth investigating. Maybe onionbalance doesn't do the add_onion if tor hasn't bootstrapped. Maybe...maybe lots of things. :)

I guess if it *is* a bug, we'll hide it by turning "has a control port set" to "is client role". Maybe that's fine, or maybe it isn't -- depends what kind of bug it is.

comment:19 Changed 6 weeks ago by nickm

Keywords: 035-must added

Add the 035-must tag to some assertion failures, hangs, ci-blockers, etc.

comment:20 Changed 6 weeks ago by nickm

Mark all 035-must tickets as "very high"

comment:21 Changed 6 weeks ago by meejah

Currently, when txtorcon launches a new Tor subprocess, it expects it to bootstrap successfully (e.g. waits for it to get to 100% bootstrapped or errors out). I'm currently trying to add a feature to support single-hop/non-anonymous onions -- I could certainly special-case (further) this behavior and *not* wait for the bootstrap, but it mostly makes sense to me to have some way to say "please bootstrap".

I guess looking at it the other way, why *wouldn't* I want to bootstrap right away?

One reason txtorcon looks for bootstrapping to complete is for error-cases (e.g. if bootstrapping errors out or pauses for too long).

comment:22 Changed 5 weeks ago by dgoulet

Status: needs_revisionneeds_review

Added the __OwningControllerFD check so if that is on, we consider tor to have the client role.

Branch: ticket27849_034_01
PR: https://github.com/torproject/tor/pull/374

comment:23 Changed 4 weeks ago by dgoulet

Reviewer: nickm

comment:24 Changed 4 weeks ago by teor

#17359 is a similar issue where DisablePredictedCircuits causes bootstrap to hang. But I don't think it is fixed by this patch.

comment:25 Changed 4 weeks ago by nickm

Status: needs_reviewmerge_ready

Code looks good but I'm not sure about the travis/rust failure. I've rebased and squashed as "bug27849_redux" and made a fresh PR as https://github.com/torproject/tor/pull/407 .

comment:26 Changed 4 weeks ago by nickm

Resolution: fixed
Status: merge_readyclosed

CI passed; merging!

comment:27 Changed 4 weeks ago by nickm

Had to add e97adaf8dc13a4f500fab3d70c9c31400a01954f to fix 0.3.5 here.

Note: See TracTickets for help on using tickets.