Opened 3 years ago

Closed 5 months ago

Last modified 5 months ago

#17873 closed defect (implemented)

replacing 0.0.0.0 listeners at runtime fails

Reported by: cypherpunks Owned by: rl1987
Priority: Medium Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-client, port, bind, switching, 035-triaged-in-20180711
Cc: rl1987@… Actual Points:
Parent ID: Points:
Reviewer: ahf Sponsor:

Description

I had DNSPort 0.0.0.0:53 set. Some jerks on the internet started flooding me with DNS requests so I reconfigured tor to only have the DNSPort on my LAN interface. I sent tor a HUP to reload the config, and it exited because it tried to bind the new listener before unbinding the old one:

Dec 16 16:19:51.000 [notice] Opening DNS listener on 172.16.0.1:53
Dec 16 16:19:51.000 [warn] Could not bind to 172.16.0.1:53: Permission denied
Dec 16 16:19:51.000 [notice] Closing no-longer-configured DNS listener on 0.0.0.0:53
Dec 16 16:19:51.000 [warn] Failed to parse/validate config: Failed to bind one of the listener ports.
Dec 16 16:19:51.000 [err] Reading config failed--see warnings above. For usage, try -h.
Dec 16 16:19:51.000 [warn] Restart failed (config error?). Exiting.

Child Tickets

TicketStatusOwnerSummaryComponent
#531closedTransition from 0.0.0.0 listener to specific listener disallowedCore Tor/Tor

Change History (51)

comment:1 Changed 3 years ago by teor

Component: - Select a componentTor
Keywords: easy added
Milestone: Tor: 0.2.8.x-final
Priority: MediumLow
Severity: NormalMinor

We could take a patch that swaps the unbind/bind in 0.2.8.
I suspect that even with the patch, tor might fall afoul of OS port close/open timeouts on some platforms.

Marking as low severity, because a restart is ok.

comment:2 Changed 3 years ago by cypherpunks

Requiring a restart to change this setting might be OK, but exiting because of an "invalid config" doesn't seem OK to me. For remote machines that are only reachable via hidden service SSH, it can actually be extremely inconvenient/expensive.

Is there any ticket about making it safer to reload the config, eg, falling back to the previously-read config if the new one fails?

Regardless of that, loading carefully-tested known-to-be-valid configs should not cause tor to exit!

comment:3 in reply to:  2 Changed 3 years ago by teor

Priority: LowMedium
Severity: MinorNormal

Replying to cypherpunks:

Requiring a restart to change this setting might be OK, but exiting because of an "invalid config" doesn't seem OK to me. For remote machines that are only reachable via hidden service SSH, it can actually be extremely inconvenient/expensive.

I see your point.

Is there any ticket about making it safer to reload the config, eg, falling back to the previously-read config if the new one fails?

I don't think so, please feel free to open one.

I'm not entirely sure how this would work, I wonder if it would be bad for security/privacy in some cases to not update to the latest config - like the one you reported in this ticket.

Regardless of that, loading carefully-tested known-to-be-valid configs should not cause tor to exit!

We'd appreciate a patch that swaps the close and bind steps in retry_all_listeners/retry_listener_ports.

The current code assumes that if the old and new addresses don't match, then binding to the new address before closing the old won't cause any conflicts. This assumption isn't true for:

  • 0.0.0.0
  • [::]
  • Binding to both IPv4 and IPv6 (is this even possible?)

comment:4 Changed 3 years ago by nickm

Actually, I don't think switching the order of the bind and the close is a good idea. The important observation about the current behavior is that (except in the cases noted on this ticket), it is reversible: we can't get stuck in a state where we closed the old ports but can't open the new ones.

comment:5 Changed 3 years ago by teor

One method of resolving this general issue is to use "SO_REUSEPORT" on recent Linux kernels.

However, this may not be a good idea:

  • there are security implications to port reuse, and
  • not all platforms have this feature.

comment:7 Changed 3 years ago by teor

One way to do this patch:

In retry_all_listeners/retry_listener_ports we match up old and new sockets that have the same address and port, and leave them alone.

Then we open any new sockets that weren't already open.
Then we close any old sockets that no logner need to be open.

This doesn't work if one of the addresses is 0.0.0.0 or [::], and the other is not - those addresses don't match (there needs to be a close/open), but they do conflict (we can't do the open first).

So, if either of the addresses for a *Port is 0.0.0.0 or [::], and the port numbers match, we need to close the old port first, then open the new port.

comment:8 Changed 3 years ago by rl1987

Cc: rl1987@… added

comment:9 Changed 3 years ago by rl1987

Status: newneeds_review

https://github.com/rl1987/tor/compare/bug17873

Some experimentation with changing DNSPort over control interface and launching DNS queries with dig(1) indicates that this branch is switching between listener addresses and ports properly.

comment:10 in reply to:  4 Changed 3 years ago by teor

Replying to rl1987:

https://github.com/rl1987/tor/compare/bug17873

Some experimentation with changing DNSPort over control interface and launching DNS queries with dig(1) indicates that this branch is switching between listener addresses and ports properly.

This patch switches the order of close and launch for all connections, but we want to keep "launch then close" as much as possible:

Replying to nickm:

Actually, I don't think switching the order of the bind and the close is a good idea. The important observation about the current behavior is that (except in the cases noted on this ticket), it is reversible: we can't get stuck in a state where we closed the old ports but can't open the new ones.

Can we only switch the order to "close then launch" if either the old or new connection is 0.0.0.0 or [::], and the bind() in connection_listener_new() fails with ERRNO_IS_EADDRINUSE(s)?

(I apologise for tagging this task "easy".)

As far as I can tell, this should only happen on Linux and Windows[0], and only for:

  • 0.0.0.0, which conflicts with any IPv4 address
  • [::], which conflicts with any IPv6 address

Let's make this fix very specific:

  • only on Linux and Windows,
    • (We use SO_REUSEADDR, which avoids this issue on every unix-based system except for Linux.)
  • only if the first bind() fails with ERRNO_IS_EADDRINUSE(s),
  • and only for:
    • 0.0.0.0 replacing IPv4, or IPv4 replacing 0.0.0.0, or
    • [::] replacing IPv6, or IPv6 replacing [::].

[0]: https://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t

comment:11 Changed 3 years ago by nickm

Status: needs_reviewneeds_revision

comment:12 Changed 3 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.???

It is impossible that we will fix all 261 currently open 028 tickets before 028 releases. Time to move some out. This is my first pass through the "needs_revision" and "needs_information" tickets, looking for things to move to ???.

Note that in most cases, if these tickets get the requested revisions done in time for the 0.2.8 merge window, they could get considered for review and merge in 0.2.8.

comment:13 Changed 3 years ago by nickm

Points: medium

comment:14 Changed 3 years ago by AVee

It's not just DNSPort:
Apr 11 20:37:37.000 [notice] Opening OR listener on xx.xx.xx.xx:443
Apr 11 20:37:37.000 [warn] Could not bind to xx.xx.xx.xx:443: Permission denied
Apr 11 20:37:37.000 [notice] Closing no-longer-configured OR listener on 0.0.0.0:443

As a fix, what's wrong with just closing the old port before opening the new one?

comment:15 in reply to:  14 Changed 3 years ago by teor

Replying to AVee:

It's not just DNSPort:
Apr 11 20:37:37.000 [notice] Opening OR listener on xx.xx.xx.xx:443
Apr 11 20:37:37.000 [warn] Could not bind to xx.xx.xx.xx:443: Permission denied
Apr 11 20:37:37.000 [notice] Closing no-longer-configured OR listener on 0.0.0.0:443

As a fix, what's wrong with just closing the old port before opening the new one?

See https://trac.torproject.org/projects/tor/ticket/17873#comment:4

comment:16 Changed 2 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:17 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:18 Changed 2 years ago by cypherpunks

related: #21818: tor's handling of SIGHUP considered harmful

comment:19 Changed 21 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:20 Changed 20 months ago by nickm

Keywords: tor-client port bind switching added; easy removed

comment:21 Changed 10 months ago by rl1987

Status: needs_revisionneeds_review

comment:22 Changed 10 months ago by teor

Milestone: Tor: unspecifiedTor: 0.3.4.x-final
Points: medium

comment:23 Changed 10 months ago by dgoulet

Reviewer: ahf

comment:24 Changed 10 months ago by ahf

Status: needs_reviewneeds_revision

Please create a PR since reviewing this seems impossible with GH's UI otherwise.

A few comments here:

  • I think the style with using if (x) foo = bar on the same line is uncommon on the Tor source code. Please move the assignment to its own line.
  • Shouldn't we lift replacement_s out of the .c file or at least document it since it's referred to in the doc strings? A typedef to get rid of the 'struct' might also be a good idea.
  • In replacement->new_port = (port_cfg_t *)wanted; where constness is removed? Do we actually need to remove the constness here, or? Maybe the loop could be made non-const if we need to?
  • Have this been tested on any other platforms than Windows and Linux? Maybe FreeBSD & OpenBSD could have ENABLE_LISTENER_REBIND too?

comment:25 Changed 10 months ago by teor

We need to test this on macOS, too - it's one of our Tor Browser platforms.

comment:26 Changed 10 months ago by rl1987

Status: needs_revisionneeds_review

I made code cleanups according to your feedback. Please check if everything is good now.

https://github.com/torproject/tor/pull/90

I did test it on macOS - we don't need the socket rebinding dance here. According to analysis above, we don't need it on BSD systems either.

Last edited 10 months ago by rl1987 (previous) (diff)

comment:27 Changed 9 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: 0.3.5.x-final
Status: needs_reviewneeds_revision

Hi! I like it! I've added a few comments to the patches on the review, and a couple of general requests.

The most difficult request will be that we get test coverage for this somehow -- either via unit tests, or some kind of lightweight integration test. I'd be glad to help brainstorming how we can do that.

comment:28 Changed 9 months ago by rl1987

Status: needs_revisionneeds_review

I addressed things you commented in pull request and wrote an integration test in Python. One remaining thing to do is to get make distcheck to pass (did I include my integration test into build incorrectly?; make check does pass with my new stuff). Once that's done, I will rebase/squash the changes into new branch.

comment:29 Changed 9 months ago by rl1987

Status: needs_reviewneeds_revision

Need to get this passing on CI.

comment:30 Changed 9 months ago by rl1987

Status: needs_revisionneeds_review

Build is green now: https://travis-ci.org/rl1987/tor/builds/381377375

There's also branch bug17873_take3_squashed2 with cleaner history. See: https://github.com/torproject/tor/pull/116

comment:31 Changed 9 months ago by ahf

Status: needs_reviewneeds_revision

Great with the test.

One minor issue: could you switch from using print "x" to print("x") in the Python code instead and add from __future__ import print_function?

comment:32 Changed 9 months ago by rl1987

Status: needs_revisionneeds_review

Pushed commit 4f9143d08c63389de7451b8ffdc4a34984904662 to bug17873_take3_squashed2.

comment:33 Changed 8 months ago by teor

ahf, are you waiting for me on this ticket?

comment:34 Changed 8 months ago by ahf

Status: needs_reviewneeds_information

There is two problems:

  • There is a merge conflict right now?
  • It looks like the changes to the Python style issue that was fixed isn't moved forward to the PR in #90, but are still in #116?

I'm unsure if the PR #116 was closed incorrectly?

comment:35 Changed 8 months ago by rl1987

My intention for opening !116 was to: 1) cleanup the git history by autosquashing fixup commits and 2) to fix merge conflicts by rebasing. We should have closed !90.

comment:36 Changed 8 months ago by ahf

Status: needs_informationmerge_ready

https://github.com/torproject/tor/pull/116 (even though it's marked as closed) looks good to me.

Marking this merge ready.

comment:37 Changed 8 months ago by nickm

I had a few questions; they might not be blockers, though. Please let me know what you think?

comment:38 Changed 8 months ago by nickm

Keywords: 035-triaged-in-20180711 added

comment:39 Changed 8 months ago by nickm

Status: merge_readyneeds_revision

comment:40 Changed 7 months ago by rl1987

Owner: set to rl1987
Status: needs_revisionaccepted

Setting myself as owner.

comment:41 Changed 7 months ago by rl1987

Status: acceptedneeds_review

comment:42 Changed 7 months ago by rl1987

Not sure how to solve the problem with fixed port in integration test. How much it is a problem? Do people commonly run multiple tests in parallel?

comment:43 in reply to:  42 Changed 7 months ago by teor

Replying to rl1987:

Not sure how to solve the problem with fixed port in integration test.

Choose a random port between 10000-60000, and retry N times if it fails. (Where N is small.)

How much it is a problem? Do people commonly run multiple tests in parallel?

Yes, particularly on automated builders, or when branches are merged forwards.

comment:45 Changed 6 months ago by rl1987

New pull request with CI fixes:

https://github.com/torproject/tor/pull/278

comment:46 Changed 6 months ago by ahf

Status: needs_reviewneeds_revision

I've left some comments and a small nit on the PR. Generally I think this looks good.

comment:47 Changed 6 months ago by rl1987

Status: needs_revisionneeds_review

Pushed one more commit and responded to comments on pull request.

comment:48 Changed 6 months ago by ahf

Status: needs_reviewmerge_ready

I think this can go in now. I'm happy with the changes now.

comment:49 Changed 5 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

This looks plausible to me. Let's give it a try.

comment:50 Changed 5 months ago by nickm

Incidentally, I thought you might like to know: The tests on this branch increased our overall coveralls.io test coverage by 1.73%. (Nice!)

comment:51 Changed 5 months ago by teor

The test_rebind.py integration test in this ticket contain a race condition that causes bug #27968.

Note: See TracTickets for help on using tickets.