Opened 6 years ago

Closed 6 years ago

#10088 closed defect (fixed)

pyptlib - use JobObjects on windows to automatically kill PT children when PT itself dies

Reported by: asn Owned by:
Priority: Medium Milestone: Tor: 0.2.4.x-final
Component: Circumvention/Pluggable transport Version:
Severity: Keywords: tor-pt
Cc: dcf@… Actual Points:
Parent ID: #10006 Points:
Reviewer: Sponsor:

Description (last modified by infinity0)

This allows obfs-flash-client to work smoothly on windows. Otherwise, some orphan processes sometimes hang around, using up listen-ports, and prevent the next PT from starting *its* children.

Patch is here: https://github.com/infinity0/pyptlib/compare/w32-subproc

Note that this is just a temporary measure to get things working on windows, until we implement #10047.

original ticket contents, obsolete since #comment:7:

Allow tor helpers to use JobObjects by setting CREATE_BREAKAWAY_FROM_JOB (Windows-only)

While building Windows pbfs-flash PTTB bundles, we encountered an issue where the obfsproxy child of obfs-flash would not be terminated properly after closing the PTTBB. More information can be found in comment:5:ticket:10006 .

After lots of digging around, the problem was solved by toggling the CREATE_BREAKAWAY_FROM_JOB flag of CreateProcess() in tor_spawn_background(). More info in comment:26:ticket:10006.

We should look at the possible side-effects of adding the CREATE_BREAKAWAY_FROM_JOB flag there, and if it's innocuous then we should implement the change and get it merged.

Child Tickets

Change History (9)

comment:1 Changed 6 years ago by dcf

Cc: dcf@… added

comment:2 Changed 6 years ago by asn

More information on why the CREATE_BREAKAWAY_FROM_JOB fix actually worked:
http://stackoverflow.com/questions/8822152/how-to-create-a-child-process-depending-on-its-parent

comment:3 Changed 6 years ago by infinity0

This is probably a more direct explanation. We only need this fix for Windows Vista (and possibly above); we don't need it for Windows XP, where processes are not default-assigned to the UAC JobObject.

An optional addition to this issue, would be for Tor itself to use the kill-children-on-death thing, example C++, our own Python example part1 part2. (I had to tweak it from the C++ slightly, hopefully it's clear.)

We have 10047 open for a clean long-term cross-platform solution to have managed child PTs die when Tor crashes. This shifts the responsibility from Tor to the child PT (using JobObjects, the responsibility remains with the parent), but has the advantage that the child PT has a chance to do cleanup tasks.

comment:4 Changed 6 years ago by infinity0

Summary: Shutdown tor helpers properly with CREATE_BREAKAWAY_FROM_JOB (?)Allow tor helpers to use JobObjects by setting CREATE_BREAKAWAY_FROM_JOB (Windows-only)

comment:5 Changed 6 years ago by infinity0

Thinking about this some more, I'm not sure that this ticket is even necessary.

MSDN says:

CREATE_BREAKAWAY_FROM_JOB 0x01000000
The child processes[A] of a process[B] associated with a job are not associated with the job.
If the calling process[C] is not associated with a job, this constant has no effect. If the calling process is associated with a job, the job must set the JOB_OBJECT_LIMIT_BREAKAWAY_OK limit.

There are two ways of interpreting this:

  1. C is the process that calls CreateProcess (e.g. Tor), B is the child process (e.g. the PT), and A is the grandchild (e.g. the PT's helper)
  2. C is the process that calls CreateProcess (e.g. Tor), B is also this process, and A is the child (e.g. the PT)

As it turns out, (2) is the correct interpretation. From this 3rd-party book:

To make [a newly spawned process ... execute outside the job], you must call CreateProcess with the new CREATE_BREAKAWAY_FROM_JOB flag. If you call CreateProcess with the CREATE_BREAKAWAY_FROM_JOB flag but the job does not have the JOB_OBJECT_BREAKAWAY_OK limit flag turned on, CreateProcess fails. This mechanism is useful if the newly spawned process also controls jobs.

In other words, it doesn't matter if obfs-flash is part of a job (as this ticket tries to prevent); all that matters is that the children aren't part of the same job (which it accomplishes by setting the flag on its own children).

I believe that the reason we opened this ticket, was an incorrect mixed interpretation of both (1) and (2), and actually the real fix that enabled dcf's tests to work was in 10006#comment:23 where I removed the "overengineered workaround" - previously, I was only setting the flag conditionally in the subproc tests, but now I am doing it everywhere, including when obfs-flash spawns its children.

If my suspicion is correct, then dcf if you run your tests from 10006#comment:28 with a pristine unhacked tor.exe, it will still work. (I don't have access to a Vista/7 machine to do this test myself. And be sure to use a post-"removed-workaround" commit like 33f090961c1e5d0587938a169c9d8107133ff646.)

In other words, the original intent of this ticket (simply setting the flag) is pointless, because Tor is not using JobObjects. However, if this flag is set *together with* the "optional addition" I mentioned in the previous comment, then we do achieve something - namely the behaviour that, if Tor crashes, the child PTs die too. (At the moment this doesn't happen with our Python JobObjects, since that is done in the PT and so implements the behaviour "if the parent PT dies, the child sub-PTs die too".)

Apologies for the noise.

Last edited 6 years ago by infinity0 (previous) (diff)

comment:6 in reply to:  5 Changed 6 years ago by dcf

Replying to infinity0:

If my suspicion is correct, then dcf if you run your tests from 10006#comment:28 with a pristine unhacked tor.exe, it will still work. (I don't have access to a Vista/7 machine to do this test myself. And be sure to use a post-"removed-workaround" commit like 33f090961c1e5d0587938a169c9d8107133ff646.)

I tested with

(I changed PYPTLIB_TAG and OBFS_FLASH_TAG in bundle/Makefile before running make, leaving it as unstaged changes.)

Here is the compiled bundle:

This configuration works for me. All the subprocesses are destroyed. The first time I tried it, I got a "Vidalia has encountered and error and needs to close," but not the second time, so it was probably unrelated. To be clear, this is with an unmodified tor.exe, not having CREATE_BREAKAWAY_FROM_JOB set.

I hope this is what you wanted.

comment:7 Changed 6 years ago by infinity0

Yay! OK, so the takeaway from this, is that a process P should always set CREATE_BREAKAWAY_FROM_JOB when creating child processes that are later assigned to a new JobObject. (Also, to be nice to others, the JobObject itself should have JOB_OBJECT_LIMIT_BREAKAWAY_OK unless you have a specific reason not to.)

As a corollary, there is no point setting the flag, if P itself does not do anything with JobObjects, so the original point of this ticket is unnecessary.

We have two options going forwards:

  1. amend this ticket to set the flag and to implement the kill-children-on-parent-death behaviour, but in C in tor itself.
    1. this behaviour includes grandchildren.
    2. can be done together with (2) below - the combination of CREATE_BREAKAWAY_FROM_JOB and JOB_OBJECT_LIMIT_BREAKAWAY_OK makes things play nicely with each other.
  1. keep the kill-children-on-parent-death behaviour in pyptlib and push it to master, so that we can deploy it.

For purposes of #10006 we only need (2) and it's the easier option because the code is already written and tested and reviewed. For general tor purposes we could do (1), but it would be less effort to jump straight to #10047 since that is cross-platform.

comment:8 Changed 6 years ago by infinity0

Component: TorPluggable transport
Description: modified (diff)
Parent ID: #10006
Status: newneeds_review
Summary: Allow tor helpers to use JobObjects by setting CREATE_BREAKAWAY_FROM_JOB (Windows-only)pyptlib - use JobObjects on windows to automatically kill PT children when PT itself dies

As discussed, we are proceeding with (2) for now.

comment:9 Changed 6 years ago by asn

Resolution: fixed
Status: needs_reviewclosed

Merged! Thank you!

Will close this for now. Reopen if you think it's appropriate.

Note: See TracTickets for help on using tickets.