Opened 4 years ago

Closed 3 years ago

Last modified 13 months ago

#8746 closed defect (fixed)

Tor tries to kill nonexistent proxy PID on second SIGINT

Reported by: dcf Owned by: nickm
Priority: Medium Milestone: Tor: 0.2.5.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-client, pt, 024-deferrable, 025-triaged, 2016-bug-retrospective
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

This is what 180-pluggable-transport.txt says about proxies and SIGINT:

Proxies should respond to a single INT signal by closing their listener ports and not accepting any new connections, but keeping all connections open, then terminating when connections are all closed. Proxies should respond to a second INT signal by shutting down cleanly.

I implemented the websocket-server transport to work as specified:
https://gitweb.torproject.org/flashproxy.git/blob/c23caf1f71f8281319cadf55002723dbcd333905:/websocket-transport/websocket-server.go#l238
I noticed unexpected behavior when the proxy receives a SIGINT, and doesn't have any open connection, and so exits immediately without waiting for a second SIGINT. The parent tor process tries to kill a nonexistent PID:

^CApr 19 17:58:59.000 [notice] Interrupt: we have stopped accepting new connections, and will shut down in 30 seconds. Interrupt again to exit now.
^CApr 19 17:59:05.000 [notice] SIGINT received a second time; exiting now.
Apr 19 17:59:05.000 [notice] Failed to terminate process with PID '18277' ('No such process').

The PID might have been reclaimed in the meantime, and tor could be killing an unrelated process.

(Originally from https://lists.torproject.org/pipermail/tor-dev/2013-April/004679.html.)

Child Tickets

Change History (21)

comment:1 Changed 4 years ago by dcf

I think that doing a non-blocking waitpid before kill would at least reduce the race window. In the mailing list thread I also suggested requiring proxies to stay running until the second SIGINT, but that's not so good because the proxy could have crashed or stopped running for a different reason. Also the SIGINT behavior is only a "should" in the spec.

comment:2 Changed 4 years ago by arma

  • Milestone set to Tor: 0.2.4.x-final

Putting in 0.2.4.x since it's a bug ("Tor could try to kill a random process"), but it's deferable to 0.2.5.

comment:3 Changed 4 years ago by nickm

  • Keywords tor-client pt 024-deferrable added

comment:4 Changed 4 years ago by arma

See also #4952 (which I think should maybe be closed as a duplicate of this one?)

comment:5 Changed 4 years ago by nickm

  • Milestone changed from Tor: 0.2.4.x-final to Tor: 0.2.5.x-final

comment:6 Changed 3 years ago by andrea

Triage: keep for 0.2.5.x-final; this seems important.

comment:7 Changed 3 years ago by nickm

Also our sigchld handler in main.c needs to turn into a function where we can get notification of dead PIDs back.

Where are the two SIGINTS happening?

comment:8 Changed 3 years ago by nickm

  • Keywords 025-triaged added

comment:9 Changed 3 years ago by nickm

I think there's a fix in my branch "bug8746", though I still need to write more tests for it.

On unix, it just does "use waitpid properly". It dumps the "waitpid and discard" loop in main.c, and instead does a proper callback-notification thing.

On windows, it terminates the process by handle, not by pid.

comment:10 Changed 3 years ago by nickm

  • Owner set to nickm
  • Status changed from new to assigned

marking as assigned since I've started coding on these.

comment:11 Changed 3 years ago by nickm

  • Status changed from assigned to needs_review

Okay, now it has unit tests. The only non-covered lines are ones that trigger on BUG conditions.

comment:12 Changed 3 years ago by nickm

(And the new branch is in bug8746_v2 )

comment:13 Changed 3 years ago by nickm

(I just looked over this branch and it still seems ok to me)

comment:14 Changed 3 years ago by yawning

Looks good to me.

comment:15 Changed 3 years ago by andrea

Review of bug8746_v2:

6b33a9b26a90b3ee684b78024dd436024dde9297:

  • Shouldn't set_waitpid_callback() tor_free(old_ent) if it's non-NULL?
  • Otherwise, I think this is all okay

38b9983d9bee2023533cb109adaec98148f48fb4:

  • I think this all looks fine

4ace54466b034db9cea29e909c5b2ee552053719:

  • Looks good to me

e93fd974e1f5593fbaa782b8f43ebe161cfec091:

  • Looks good to me

c76d615fcd0cbf9cee88ea687724b5deae68ff80:

  • Awwww, no functinos? Don't tell me you're against sfunctions too.

76da16a15f5d0cc0c1f0ccb8ec3a310d50b43fe2:

  • Looks good to me

da598b74701aba5efdc97029c05b17c850bb7d46:

  • Looks good to me

0017a8668785dfa3715748155b9fdb35bb671cf3:

  • This is just the changes file; looks fine to me

comment:16 Changed 3 years ago by asn

Also maybe we want to tor_free(ent); in the LD_BUG of clear_waitpid_callback().

comment:17 Changed 3 years ago by yawning

Assuming the branch in set_waitpid_callback() is taken and the tor_free(old_ent) call suggested by athena is there, it's probably fairly reasonable to assume that there could very well be a different process_handle_t that holds a reference to old_ent.

Some time later when tor_destroy_process_handle(other_handle, 1) is called (I don't know if it ever will, but we're already in "our invariants have been violated" territory), the code path where asn suggests a free will be invoked with ent being the dangling pointer because the hashtable entry for the pid at that point is either missing (the callback fired and the replacement entry got removed from the table) or is the new entry.

Either leave it as is (leaks a trivial amount memory when impossible things happen), add a tor_free() only in clear_waitpid_callback() (may leak a trivial amount of memory, if the previous process handle is never destroyed), or add asserts in both branches because both conditions are invariants.

comment:18 Changed 3 years ago by nickm

  • Resolution set to fixed
  • Status changed from needs_review to closed

I'm going with "leave it as it is". I'd rather risk a small memory leak on a possible bug here than risk a later double-free. (If we reach either of those cases, our representation is messed up somewhere.)

Thanks for the reviews, everybody! Squashing + merging.

comment:19 Changed 13 months ago by nickm

  • Keywords 2016-bug-retrospective added

Mark bugs for 2016 bug retrospective based on hand-examination of changelogs for 0.2.5 onwards.

comment:20 Changed 13 months ago by nickm

Mark bugs for 2016 bug retrospective based on hand-examination of changelogs for 0.2.5 onwards.

comment:21 Changed 13 months ago by nickm

Mark bugs for 2016 bug retrospective based on hand-examination of changelogs for 0.2.5 onwards.

Note: See TracTickets for help on using tickets.