replacing 0.0.0.0 listeners at runtime fails

changed milestone to %Tor: 0.3.5.x-final

added 035-triaged-in-20180711 bind component::core tor/tor milestone::Tor: 0.3.5.x-final owner::rl1987 port priority::medium resolution::implemented reviewer::ahf severity::normal status::closed switching tor-client type::defect labels

We could take a patch that swaps the unbind/bind in 0.2.8. I suspect that even with the patch, tor might fall afoul of OS port close/open timeouts on some platforms.

Marking as low severity, because a restart is ok.

Trac:
Keywords: N/A deleted, easy added
Component: - Select a component to Tor
Priority: Medium to Low
Milestone: N/A to Tor: 0.2.8.x-final
Severity: Normal to Minor

Requiring a restart to change this setting might be OK, but exiting because of an "invalid config" doesn't seem OK to me. For remote machines that are only reachable via hidden service SSH, it can actually be extremely inconvenient/expensive.

Is there any ticket about making it safer to reload the config, eg, falling back to the previously-read config if the new one fails?

Regardless of that, loading carefully-tested known-to-be-valid configs should not cause tor to exit!

Replying to cypherpunks:

Requiring a restart to change this setting might be OK, but exiting because of an "invalid config" doesn't seem OK to me. For remote machines that are only reachable via hidden service SSH, it can actually be extremely inconvenient/expensive.

I see your point.

Is there any ticket about making it safer to reload the config, eg, falling back to the previously-read config if the new one fails?

I don't think so, please feel free to open one.

I'm not entirely sure how this would work, I wonder if it would be bad for security/privacy in some cases to not update to the latest config - like the one you reported in this ticket.

Regardless of that, loading carefully-tested known-to-be-valid configs should not cause tor to exit!

We'd appreciate a patch that swaps the close and bind steps in retry_all_listeners/retry_listener_ports.

The current code assumes that if the old and new addresses don't match, then binding to the new address before closing the old won't cause any conflicts. This assumption isn't true for:

0.0.0.0
[::]
Binding to both IPv4 and IPv6 (is this even possible?)

Trac:
Severity: Minor to Normal
Priority: Low to Medium

Actually, I don't think switching the order of the bind and the close is a good idea. The important observation about the current behavior is that (except in the cases noted on this ticket), it is reversible: we can't get stuck in a state where we closed the old ports but can't open the new ones.

One method of resolving this general issue is to use "SO_REUSEPORT" on recent Linux kernels.

However, this may not be a good idea:

there are security implications to port reuse, and
not all platforms have this feature.

See https://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t

One way to do this patch:

In retry_all_listeners/retry_listener_ports we match up old and new sockets that have the same address and port, and leave them alone.

Then we open any new sockets that weren't already open. Then we close any old sockets that no logner need to be open.

This doesn't work if one of the addresses is 0.0.0.0 or [::], and the other is not - those addresses don't match (there needs to be a close/open), but they do conflict (we can't do the open first).

So, if either of the addresses for a *Port is 0.0.0.0 or [::], and the port numbers match, we need to close the old port first, then open the new port.

Trac:
Cc: N/A to rl1987@sdf.lonestar.org

https://github.com/rl1987/tor/compare/bug17873

Some experimentation with changing DNSPort over control interface and launching DNS queries with dig(1) indicates that this branch is switching between listener addresses and ports properly.

Trac:
Status: new to needs_review

Replying to rl1987:

https://github.com/rl1987/tor/compare/bug17873

Some experimentation with changing DNSPort over control interface and launching DNS queries with dig(1) indicates that this branch is switching between listener addresses and ports properly.

This patch switches the order of close and launch for all connections, but we want to keep "launch then close" as much as possible:

Replying to nickm:

Actually, I don't think switching the order of the bind and the close is a good idea. The important observation about the current behavior is that (except in the cases noted on this ticket), it is reversible: we can't get stuck in a state where we closed the old ports but can't open the new ones.

Can we only switch the order to "close then launch" if either the old or new connection is 0.0.0.0 or [::], and the bind() in connection_listener_new() fails with ERRNO_IS_EADDRINUSE(s)?

(I apologise for tagging this task "easy".)

As far as I can tell, this should only happen on Linux and Windows0, and only for:

0.0.0.0, which conflicts with any IPv4 address
[::], which conflicts with any IPv6 address

Let's make this fix very specific:

only on Linux and Windows,
- (We use SO_REUSEADDR, which avoids this issue on every unix-based system except for Linux.)
only if the first bind() fails with ERRNO_IS_EADDRINUSE(s),
and only for:
- 0.0.0.0 replacing IPv4, or IPv4 replacing 0.0.0.0, or
- [::] replacing IPv6, or IPv6 replacing [::].

Trac:
Status: needs_review to needs_revision

It is impossible that we will fix all 261 currently open 028 tickets before 028 releases. Time to move some out. This is my first pass through the "needs_revision" and "needs_information" tickets, looking for things to move to ???.

Note that in most cases, if these tickets get the requested revisions done in time for the 0.2.8 merge window, they could get considered for review and merge in 0.2.8.

Trac:
Milestone: Tor: 0.2.8.x-final to Tor: 0.2.???

Trac:
Points: N/A to medium

It's not just DNSPort: Apr 11 20:37:37.000 [notice] Opening OR listener on xx.xx.xx.xx:443 Apr 11 20:37:37.000 [warn] Could not bind to xx.xx.xx.xx:443: Permission denied Apr 11 20:37:37.000 [notice] Closing no-longer-configured OR listener on 0.0.0.0:443

As a fix, what's wrong with just closing the old port before opening the new one?

Trac:
Username: AVee
Reviewer: N/A to N/A

Replying to AVee:

It's not just DNSPort: Apr 11 20:37:37.000 [notice] Opening OR listener on xx.xx.xx.xx:443 Apr 11 20:37:37.000 [warn] Could not bind to xx.xx.xx.xx:443: Permission denied Apr 11 20:37:37.000 [notice] Closing no-longer-configured OR listener on 0.0.0.0:443

As a fix, what's wrong with just closing the old port before opening the new one?

See https://trac.torproject.org/projects/tor/ticket/17873#comment:4

Milestone renamed

Trac:
Milestone: Tor: 0.2.??? to Tor: 0.3.???

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

Trac:
Keywords: N/A deleted, tor-03-unspecified-201612 added
Milestone: Tor: 0.3.??? to Tor: unspecified

related: #21818 (moved): tor's handling of SIGHUP considered harmful

Remove an old triaging keyword.

Trac:
Keywords: tor-03-unspecified-201612 deleted, N/A added

Trac:
Keywords: easy deleted, tor-client port bind switching added

https://github.com/rl1987/tor/compare/bug17873_take3_squashed

Trac:
Status: needs_revision to needs_review

Trac:
Points: medium to N/A
Milestone: Tor: unspecified to Tor: 0.3.4.x-final

Trac:
Reviewer: N/A to ahf

Please create a PR since reviewing this seems impossible with GH's UI otherwise.

A few comments here:

I think the style with using if (x) foo = bar on the same line is uncommon on the Tor source code. Please move the assignment to its own line.
Shouldn't we lift replacement_s out of the .c file or at least document it since it's referred to in the doc strings? A typedef to get rid of the 'struct' might also be a good idea.
In replacement->new_port = (port_cfg_t *)wanted; where constness is removed? Do we actually need to remove the constness here, or? Maybe the loop could be made non-const if we need to?
Have this been tested on any other platforms than Windows and Linux? Maybe FreeBSD & OpenBSD could have ENABLE_LISTENER_REBIND too?

Trac:
Status: needs_review to needs_revision

We need to test this on macOS, too - it's one of our Tor Browser platforms.

I made code cleanups according to your feedback. Please check if everything is good now.

https://github.com/torproject/tor/pull/90

I did test it on macOS - we don't need the socket rebinding dance here. According to analysis above, we don't need it on BSD systems either.

Trac:
Status: needs_revision to needs_review

Hi! I like it! I've added a few comments to the patches on the review, and a couple of general requests.

The most difficult request will be that we get test coverage for this somehow -- either via unit tests, or some kind of lightweight integration test. I'd be glad to help brainstorming how we can do that.

Trac:
Milestone: Tor: 0.3.4.x-final to Tor: 0.3.5.x-final
Status: needs_review to needs_revision

I addressed things you commented in pull request and wrote an integration test in Python. One remaining thing to do is to get make distcheck to pass (did I include my integration test into build incorrectly?; make check does pass with my new stuff). Once that's done, I will rebase/squash the changes into new branch.

Trac:
Status: needs_revision to needs_review

Need to get this passing on CI.

Trac:
Status: needs_review to needs_revision

Build is green now: https://travis-ci.org/rl1987/tor/builds/381377375

There's also branch bug17873_take3_squashed2 with cleaner history. See: https://github.com/torproject/tor/pull/116

Trac:
Status: needs_revision to needs_review

Great with the test.

One minor issue: could you switch from using print "x" to print("x") in the Python code instead and add from __future__ import print_function?

Trac:
Status: needs_review to needs_revision

Pushed commit 4f9143d08c63389de7451b8ffdc4a34984904662 to bug17873_take3_squashed2.

Trac:
Status: needs_revision to needs_review

ahf, are you waiting for me on this ticket?

There is two problems:

There is a merge conflict right now?
It looks like the changes to the Python style issue that was fixed isn't moved forward to the PR in #90 (moved), but are still in #116 (moved)?

I'm unsure if the PR #116 (moved) was closed incorrectly?

Trac:
Status: needs_review to needs_information

My intention for opening !116 was to: 1) cleanup the git history by autosquashing fixup commits and 2) to fix merge conflicts by rebasing. We should have closed !90.

https://github.com/torproject/tor/pull/116 (even though it's marked as closed) looks good to me.

Marking this merge ready.

Trac:
Status: needs_information to merge_ready

I had a few questions; they might not be blockers, though. Please let me know what you think?

Trac:
Keywords: tor-client port bind switching deleted, bind, 035-triaged-in-20180711, tor-client, port, switching added

Trac:
Status: merge_ready to needs_revision

Setting myself as owner.

Trac:
Status: needs_revision to accepted
Owner: N/A to rl1987

Trac:
Status: accepted to needs_review

Not sure how to solve the problem with fixed port in integration test. How much it is a problem? Do people commonly run multiple tests in parallel?

Replying to rl1987:

Not sure how to solve the problem with fixed port in integration test.

Choose a random port between 10000-60000, and retry N times if it fails. (Where N is small.)

How much it is a problem? Do people commonly run multiple tests in parallel?

Yes, particularly on automated builders, or when branches are merged forwards.

https://github.com/torproject/tor/pull/116/commits/fd9b9c4cba9aa04decfedfe4d7e3318b1aafe46f

New pull request with CI fixes:

https://github.com/torproject/tor/pull/278

I've left some comments and a small nit on the PR. Generally I think this looks good.

Trac:
Status: needs_review to needs_revision

Pushed one more commit and responded to comments on pull request.

Trac:
Status: needs_revision to needs_review

I think this can go in now. I'm happy with the changes now.

Trac:
Status: needs_review to merge_ready

replacing 0.0.0.0 listeners at runtime fails

Child items 0

Activity