When testing meek-client-wrapper, I noticed two things:
it does not respond to SIGINT or SIGKILL. also, the signal handling code is different from meek-client. perhaps we should move it to goptlib?
it uses sigkill to kill its children, not giving them a chance to clean up. Yes, this is awkward on windows but we can at least do something nicer on posix systems.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
And I thought SIGKILL "cannot be caught, blocked, or ignored"?
also, the signal handling code is different from meek-client. perhaps we should move it to goptlib?
That actually is the reason it's not in goptlib—I have had to do the signal handling code in different ways in different programs. The program might want to close sockets, or log something, or whatever. I didn't find a good way to abstract it (that is significantly more expressive than just handling the signals explicitly).
it uses sigkill to kill its children, not giving them a chance to clean up. Yes, this is awkward on windows but we can at least do something nicer on posix systems.
That's a fair point. However keep in mind that the code already passes on any signal it receives; so if we got a SIGTERM, so will the child processes. I guess I might prefer to keep the destructive kill, but only do it if we haven't sent some other signal to a child process (rather than unconditionally as a defer as we do now).
I saw your ae0dc0ff ("give child processes a chance to clean up"). I am strongly disinclined to do anything involving a timer. Do signals really work like that? The parent process needs to still be running in order for the signal to be delivered?
I'm skeptical that closing stdin is sufficient to stop Firefox. Does that work on Windows?
Trac: Owner: asn to dcf Component: Pluggable transport to meek
Ah, I could have been wrong about not passing on the signals properly. At least, the commit I did that you mentioned, fixes things on my Debian system.
I think the real bug is that, whilst you do pass on SIGTERM, you immediately then return from the main thread, which triggers the defer behaviour which is to send SIGKILL. This does not give the child enough time to clean up - for example if it has its own children, which is the case if meek-browser-helper is using xvfb-run. So a timer between when we send SIGTERM and SIGKILL is necessary. This is a fairly standard pattern that you will see in well-written "graceful shutdown" scripts (TERM, sleep, KILL), it's not that unusual. Somewhere around 10-30 seconds is normal.
(Yes, signals are delivered immediately, so if you want a delay then the parent does need to stick around for this duration of delay, then send the second signal.)
Closing stdin is probably not sufficient to stop firefox, no. But meek-browser-helper is supposed to be a shell script anyways, so we might be able to work with that on windows. Something along the lines of start firefox; read x; kill firefox. Needs more testing, of course.
Unfortunately it must be run as cscript X.vbs rather than as X.vbs which on most Windows systems would run wscript not cscript, and the former does not have StdIn. There is no way to override this on a per-script basis; the closest thing I found online is to do "if we're not being run by cscript, run ourselves using cscript" but this doesn't work for our purposes because it doesn't provide a way to pass stdout back to the calling process.
It does not seem feasible to do the equivalent behaviour from a .bat file either. :(
So I tried another approach - turn meek-browser-helper-windows.bat into a wrapper for meek-browser-helper-windows.js (I rewrote the .vbs in .js which also works, and is probably better since VBS is dead):
meek-browser-helper-windows.bat
@echo offREM meek-browser-helper program for Tor Browser on Windowscscript //b TorBrowser\Tor\PluggableTransports\meek-browser-helper-windows.js
However, it does not work when double-clicked or when run from Tor Browser. Log output suggests that when a .bat file is executed directly (i.e. as "argv[0]"), Windows automatically closes its stdin. So our "terminate children on stdin close" behaviour doesn't work for this case... I'll try to think of other ways.
Ah! It turns out I was testing with your 4.5a4-12716-1 which doesn't contain my commit that fixes this bug. That's why meek-browser-helper was exiting early - meek-client-wrapper was not giving an open stdin for it to hold on to. (I.e. the last sentence in my previous comment was incorrect.)
Now, with the files in my previous comment, everything cleans up nicely after TBB closes (though meek-client-wrapper is currently waiting the full 16 seconds, because meek-client itself doesn't implement --exit-on-stdin-eof).
I will send in an amendment to the patch in #12716 (moved).
Okay, but don't spend too much time going down obscure paths. I'd rather have something that is 90% correct and simple than 100% correct and complex. Since (so far) we only need this change for Unix with xvfb, my current plan is just to remove the explicit calls to Kill on platforms other than Windows, and rely on the forwarded signals to cause termination. I'm fine with leaving unclean termination on Windows, as long as the xvfb use case works; we can always improve it further later on.
A less obscure option would be to replace the .bat and .js by a single go program. It would take me longer to figure out how to integrate that into the TBB build, though.
So what are your thoughts on this? I believe that this is needed for #12716 (moved) to work well on windows... Maybe not my exact patch, but something like it. Arguably we should reduce the timeout to just a few seconds.
Any particular reason you are "strongly disinclined to do anything involving a timer"? This is how all graceful strong shutdowns work.
Terminate firefox and meek-client simultaneously.
The main differences compared to what has already been discussed on this ticket are that there are separate terminatePTCmd and terminateCmd functions (terminatePTCmd closes stdin, terminateCmd does not), separate implementations for Windows and non-Windows (Windows doesn't send SIGTERM, and its terminateCmd kills the process immediately, without any timer), and the two terminations run in parallel, not in series.
Trac: Status: new to needs_review Sponsor: N/AtoN/A Summary: meek-client-wrapper does not use signals well to meek-client-torbrowser does not use signals well Reviewer: N/AtoN/A
Is there a reason to have four separate if statements here
if firefoxCmd != nil { wg.Add(1) } if meekClientCmd != nil { wg.Add(1) } if firefoxCmd != nil { go func() { err := terminateCmd(firefoxCmd) // We terminate Firefox with SIGTERM, so don't log an // error if the exit status is "terminated by SIGTERM." ... wg.Done() }() } if meekClientCmd != nil { go func() { ... } wg.Wait()
instead of doing
if firefoxCmd != nil { wg.Add(1) go func() { err := terminateCmd(firefoxCmd) // We terminate Firefox with SIGTERM, so don't log an // error if the exit status is "terminated by SIGTERM." ... wg.Done() }() } if meekClientCmd != nil { wg.Add(1) go func() { ... } wg.Wait()
Is there a reason to have four separate if statements here
Thanks for commenting on this. You're right. For some reason I was thinking that you had to do all the wg.Add before any wg.Done. But looking at the example for WaitGroup, there's actually no problem. I made your suggested change here, then I merged and tagged 0.32.