Opened 6 years ago

Closed 5 years ago

#9330 closed defect (duplicate)

Pluggable Transports on windows are killed with TerminateProcess

Reported by: yawning Owned by: asn
Priority: Medium Milestone:
Component: Circumvention/Pluggable transport Version: Tor: unspecified
Severity: Keywords: pt, win32, pluggable transport
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The pluggable transport that I am writing needs to do cleanup on shutdown (It write files to the pt_state directory, and use a child process).

The way that tor terminates pluggable transports under Windows currently is via ProcessTerminate (common/util.c:tor_terminate_process), which is the rough windows equivalent of SIGKILL (Immediate termination, no child processes killed, no further code in the application gets executed).

In theory Tor is supposed to send SIGINT for a graceful shutdown but it doesn't appear to be doing this on Windows (Does it do this at all? obfsproxy doesn't appear to install a SIGINT handler to handle graceful teardown per the PT spec), and furthermore Microsoft's documentation hints at horrible evil happening if SIGINT is actually used (http://msdn.microsoft.com/en-us/library/xdkz3x12%28v=vs.110%29.aspx).

Some way to properly handle graceful shutdown that works across all platforms would be nice. Apparently the way Real Windows Apps approach this problem is with GenerateConsoleCtrlEvent or with PostMessage, neither which are portable.

Note: I don't do Windows development as a general rule so I may be missing something obvious.

Child Tickets

Change History (15)

comment:1 Changed 6 years ago by infinity0

Summary of relevant info I found whilst doing research for #9376:

  • Windows has signals (including "SIGINT" and "SIGTERM") but these can only be sent to the same process[1]
  • However, it does have console events CTRL-C and CTRL-BREAK which can be sent to certain child processes [2]
  • By default, CTRL-C generates a SIGINT within the process, and CTRL-BREAK always generates a SIGBREAK (windows equivalent of SIGQUIT). I haven't found explicit statement of this, but I believe this is the correct way of interpreting [3]
  • It's theoretically possible to trap CTRL-C as SIGINT but it's broken in python 2.7 and they declined to fix it[4]
  • It is possible to send and trap CTRL-BREAK on python (os.kill / signal.signal) and C/C++, (GenerateConsoleCtrlEvent / SetConsoleCtrlHandler), so one option is to appropriate this for SIGTERM purposes, but we would need to update the PT spec to explicitly say this, since there is no standard behaviour for CTRL-BREAK and clients won't just assume this. (OTOH I am guessing Tor sends a SIGTERM on UNIX as that *is* the standard way to request a graceful shutdown.)

[1] http://blogs.msdn.com/b/shrib/archive/2008/05/23/signals-on-windows.aspx
[2] http://docs.python.org/2/library/os.html#os.kill
[3] http://msdn.microsoft.com/en-us/library/ms686016.aspx
[4] http://bugs.python.org/issue18040

comment:2 Changed 6 years ago by infinity0

Possibly explicit source consistent with my interpretation of source (3) above: [1]

The main problem is that a lot of online sources conflate signals with console events, so it's hard to tell whether by "signal" they mean an actual signal [2] (which is intra-process only on windows) or a console event [1] (inter-process) that then generates a signal

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/ms682541%28v=vs.85%29.aspx
[2] http://www.cs.uofs.edu/~beidler/Ada/win32/win32-crt-signal.html

comment:3 in reply to:  1 Changed 6 years ago by dcf

Replying to infinity0:

  • It is possible to send and trap CTRL-BREAK on python (os.kill / signal.signal) and C/C++, (GenerateConsoleCtrlEvent / SetConsoleCtrlHandler), so one option is to appropriate this for SIGTERM purposes, but we would need to update the PT spec to explicitly say this, since there is no standard behaviour for CTRL-BREAK and clients won't just assume this. (OTOH I am guessing Tor sends a SIGTERM on UNIX as that *is* the standard way to request a graceful shutdown.)

Cross-referencing #9732, about the (non-)documentation of what pluggable transports should do on SIGTERM.

comment:4 Changed 6 years ago by infinity0

To clarify, my suggested fix for this is to emulate a typical TERM-KILL loop on Unix.

  • send TERM to the process to initiate clean shutdown
  • wait a while, then see if the process has actually quit
  • send KILL to the process to force it to quit
  • wait a while, then reap zombies

This is implemented in pyptlib's subproc.py.

For windows, the counterpart (since signals are intra-process only) would be:

  • send CTRL-BREAK to the process to initiate clean shutdown
  • wait a while, then see if the process has actually quit
  • call ProcessTerminate to force the process to quit

Additionally, Tor must set the CREATE_NEW_PROCESS_GROUP flag when starting the PT in order to send console events to it correctly.

Last edited 6 years ago by infinity0 (previous) (diff)

comment:5 in reply to:  4 Changed 6 years ago by GITNE

Replying to infinity0:

Additionally, Tor must set the CREATE_NEW_PROCESS_GROUP flag when starting the PT in order to send console events to it correctly.

This is actually not entirely correct. CREATE_NEW_PROCESS_GROUP has nothing to do with the parent process being able to send CTRL+BREAK to child processes.

Last edited 6 years ago by GITNE (previous) (diff)

comment:6 Changed 6 years ago by yawning

Tor needs to set CREATE_BREAKAWAY_FROM_JOB, or things will break on certain versions of Windows (Basically anything between Vista and 7). Not an issue in 8 supposedly since processes can belong in multiple groups.

Don't need CREATE_NEW_PROCESS_GROUP since the PT can create the group.

comment:7 in reply to:  description Changed 6 years ago by GITNE

Replying to yawning:

The pluggable transport that I am writing needs to do cleanup on shutdown (It write files to the pt_state directory, and use a child process).

There multiple ways to accomplish this on Windows:

  • You can implement DllMain() and handle DLL_PROCESS_DETACH (don't forget to export it)
    Since your main process module is unloaded last, it is also the last function to be called before actually the process' kernel object gets destroyed. Don't get yourself confused, this applies to DLLs as well as EXEs.
  • You can listen for a WM_QUIT message with GetMessage() or PeakMessage().
    This is probably the best approach to your problem.
  • And, as always the ANSI C way by registering a function with atexit()

The way that tor terminates pluggable transports under Windows currently is via ProcessTerminate (common/util.c:tor_terminate_process), which is the rough windows equivalent of SIGKILL (Immediate termination, no child processes killed, no further code in the application gets executed).

And, this is not the preferred way to do it on Windows hence Tor's problems with sub-processes on Windows. TerminateProcess() does neither graceful process shutdown nor does it provide the intended semantics for Tor's purpose.

In theory Tor is supposed to send SIGINT for a graceful shutdown but it doesn't appear to be doing this on Windows (Does it do this at all? obfsproxy doesn't appear to install a SIGINT handler to handle graceful teardown per the PT spec), and furthermore Microsoft's documentation hints at horrible evil happening if SIGINT is actually used (http://msdn.microsoft.com/en-us/library/xdkz3x12%28v=vs.110%29.aspx).

Some way to properly handle graceful shutdown that works across all platforms would be nice. Apparently the way Real Windows Apps approach this problem is with GenerateConsoleCtrlEvent or with PostMessage, neither which are portable.

There is no POSIX approach to this problem on Windows simply because Windows does not fully implement POSIX for some better or worse reasons. So do not waste time on searching for such a cross-platform approach because you won't find it unless you want to rely on a third party library that actually just mimicks POSIX behavoir at best. If you want reliable or good Windows support, do it the Windows way (as hard as it sounds).

Note: I don't do Windows development as a general rule so I may be missing something obvious.

If so, then my advice is to really listen to Windows developers even then when it should mean more work or be a pain in the bu*t.

Last edited 6 years ago by GITNE (previous) (diff)

comment:8 in reply to:  6 Changed 6 years ago by GITNE

Replying to yawning:

Don't need CREATE_NEW_PROCESS_GROUP since the PT can create the group.

No, Tor must do it. Processes cannot create process groups on arbitrary existing prcesses. This would be a major design flaw and security hazard.

comment:9 Changed 6 years ago by yawning

tor.exe spawns my child with CREATE_BREAKAWAY_FROM_JOB.

My child calls CreateJobObject.

My child spawns children of it's own, and assigns them to the Job Object.

tor.exe notifies it's child of a graceful shutdown somehow (DllMain probably not an option because a lot of PTs are Python scripts, might be a way to do it, don't care, it's orthogonal to this discussion).

My child does it's cleanup, and terminates Job Object (the better thing to do would be to propagate the graceful termination request).

Not seeing the "must". *shrugs* Maybe it would be better that tor.exe creates the group so that it has a way of reaping children via terminating the Job Object if it absolutely needs them to die. Don't know, that's a design decision that's not up to me.

What "must" be done is that pt-spec (that I didn't write mind you) needs to explicitly specify how this is to be done on Windows vs the rest of the supported platforms (When working on this, which was a while ago, I did look into using GetMessage, but it wasn't a workable solution for me at the time because it required modifying the PT spec, and tor.exe. It seems like the cleanest approach that's usable from the most languages.). For this I will defer to people that actually know the correct "windows" way of doing things is and implement what they came up with.

comment:10 Changed 6 years ago by infinity0

What is the problem with my suggested fix (4), and why are all these flags needed?

The documentation for CREATE_NEW_PROCESS_GROUP states:

Process groups are used by the GenerateConsoleCtrlEvent function to enable sending a CTRL+BREAK signal to a group of console processes.

What we really need is a good document that explains the process model of Windows, from which we can devise a coherent strategy, rather than trying to patch together a coherent strategy from disparate sources of API references.

Last edited 6 years ago by infinity0 (previous) (diff)

comment:11 Changed 6 years ago by infinity0

Another solution on the internet suggests using CTRL-BREAK to act as SIGTERM (clean shutdown):

http://stackoverflow.com/questions/1453520/run-arbitrary-subprocesses-on-windows-and-still-terminate-cleanly

Its recommendation is:

  • Tor must set CREATE_NEW_PROCESS_GROUP on the PT
  • The PT must set *empty* creation flags for the children.

I'm not certain about the latter point; it implies that the parent process must be in a new process group, which means (using this method) it would be problematic to run a standalone PT that terminates its children using CTRL-BREAK.

I'll try to get a basic test case at some point.

Last edited 6 years ago by infinity0 (previous) (diff)

comment:12 Changed 6 years ago by infinity0

Got a Python version working here:

https://github.com/infinity0/pyptlib/commit/157459e630a415756c7872399ed7f36172ed638d

In particular, test_killall_kill shows that CTRL-BREAK can be trapped in principle (I am setting CREATE_NEW_PROCESS_GROUP on both the parent and the child) and that the handler for it can do pretty much anything it wants and only dies to TerminateProcess.

comment:13 Changed 6 years ago by dcf

Summary: Pluggable Transports on windows are killed with ProcessTerminatePluggable Transports on windows are killed with TerminateProcess

comment:14 Changed 6 years ago by dcf

Once I was looking at this ticket but I was really looking for #10047 (pluggable transports could close when they detect EOF on their stdout or stdin).

comment:15 Changed 5 years ago by dcf

Resolution: duplicate
Status: newclosed

#15593 has a plan to fix this.

Note: See TracTickets for help on using tickets.