Currently, most Vidalia users allow Vidalia to start Tor with a new, randomly generated control port password, so if Vidalia crashes, Tor will keep running in the background, but nothing will be able to connect to its control port. This has (at least) two negative consequences: (a) when the user starts Vidalia again, Vidalia will be unable to start a new Tor instance because the old one is still listening on port 9051, and (b) on Windows, the only way to stop the orphaned Tor process without using the control port is equivalent to the Unix SIGKILL, which sucks if that Tor instance is running as a bridge or relay.
We should add a new OwningControllerPID option to Tor, to specify a process which Tor should try to not outlive, and a new TAKEOWNERSHIP control port command to specify that Tor should shut down if that control port connection closes.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
So, TAKEOWNERSHIP will be used when the process doesn't have OwningControllerPID option set? There may be some conflicts there, or will the TAKEOWNERSHIP set OwningControllerPID and fail if the latter is already set?
Anyway, I like this idea, it'll certainly simplify some of the intentions of trying to kill a zombie Tor process.
So, TAKEOWNERSHIP will be used when the process doesn't have OwningControllerPID option set? There may be some conflicts there, or will the TAKEOWNERSHIP set OwningControllerPID and fail if the latter is already set?
The TAKEOWNERSHIP command and OwningControllerPID options should do exactly what I specified that they should do in the ticket ‘description’. There is no conflict between their effects.
When Vidalia is configured to ‘own’ the Tor instance it controls, Vidalia should specify its process ID to Tor using the OwningControllerPID option, and then send the TAKEOWNERSHIP command when it connects to Tor's control port. After Vidalia has sent the TAKEOWNERSHIP command, it should unset the OwningControllerPID option so that Tor will stop polling for a process with that process ID.
The reason to have an OwningControllerPID option is to try to avoid leaving a zombie Tor process if Vidalia exits before connecting to Tor's control port. The reason to have a TAKEOWNERSHIP command is to reliably avoid leaving a zombie Tor process if Vidalia exits after connecting to Tor's control port, even if the OS gives a new process the same process ID that Vidalia had before Tor polls for it again.
BTW, you'll want to have the option named __OwningControllerPID so that SAVECONF never writes it to a file, and so that it's treated as for-use-by-controllers.
I'm not sure how you go about implementing that feature, fwiw. Is there a standard way to watch for a process with a particular pid to change its status? waitpid() and friends seem to only be for child processes. You can kill(0) the pid, but that doesn't tell you if that process is still there: it only tells you if some process with that pid is still there.
On BSDish systems, you can use kevent to wait for a process, but I don't believe libevent exposes that atm. On linuxy systems, you could pull the process's start time out of /proc/[pid]/stat and see whether it's changed or somethign. On Windows, you can grab a handle to the process somehow and poll it from time to time to see whether it's been signalled.... but all of these approaches seem pretty heavyweight atm.
I'm not sure how you go about implementing that feature, fwiw. Is there a standard way to watch for a process with a particular pid to change its status? waitpid() and friends seem to only be for child processes. You can kill(0) the pid, but that doesn't tell you if that process is still there: it only tells you if some process with that pid is still there.
Polling for whether some process has that PID is the best option available in POSIX. That's part of the reason I want a TAKEOWNERSHIP control-port command -- __OwningControllerPID is going to be racy on many systems.
On BSDish systems, you can use kevent to wait for a process, but I don't believe libevent exposes that atm. On linuxy systems, you could pull the process's start time out of /proc/[pid]/stat and see whether it's changed or somethign. On Windows, you can grab a handle to the process somehow and poll it from time to time to see whether it's been signalled.... but all of these approaches seem pretty heavyweight atm.
On Windows, it is possible to wait for the process handle (among others), too. I don't know whether libevent allows us to wait for an arbitrary handle (and I assume it doesn't).
I plan to poll for a PID's existence on all Unixoid systems, and poll a process handle on Windows, for now. This is the other part of the reason I want a TAKEOWNERSHIP control-port command -- the __OwningControllerPID option will involve polling, and we don't want to do that for any longer than necessary.
The "TAKEOWNERSHIP" part should be quite simple and, I agree, useful. It should get us most of what we want.
I wonder if we might want to leave the OwningControllerPID part to Tor 0.2.3 though, since it may introduce bugs (e.g. portability issues) that will take a while to settle.
On the Vidalia implementation side, I believe vidalia already starts out by running 'tor --version' before trying to run tor for reals. See TorProcess::version() in src/torcontrol/TorProcess.cpp. So it should be able to predict whether Tor will recognize the OwningControllerPID option.
Speaking of that, it looks like Vidalia counts on that line format and will fail to detect Tor's version if the format changes. We should add that to our (currently undocumented?) list of output formats that shouldn't change. I just opened #3163 (moved) to cover the topic.
Also, is some of this topic better solved by teaching Vidalia to launch processes and use the right incant to indicate that its children should die when it disappears, even if it disappears by crashing? I haven't looked at how good Vidalia is at that lately.
Also, is some of this topic better solved by teaching Vidalia to launch processes and use the right incant to indicate that its children should die when it disappears, even if it disappears by crashing? I haven't looked at how good Vidalia is at that lately.
Right now, there is no way for Vidalia to indicate that its child processes should not outlive it. That is why we need the changes described in this ticket.
The "TAKEOWNERSHIP" part should be quite simple and, I agree, useful. It should get us most of what we want.
I wonder if we might want to leave the OwningControllerPID part to Tor 0.2.3 though, since it may introduce bugs (e.g. portability issues) that will take a while to settle.
We can postpone the OS-specific extensions to __OwningControllerProcess until 0.2.3.x, or test them on 0.2.3.x and backport later. But we currently only support Unixoid systems and Windows NT systems, and the generic Unixoid implementation is both highly portable across Unixoids and Good Enough for a first implementation.
(I've renamed the option so that we can leave room for additional OS-specific arguments after the PID, because that will allow us to make that piece of process ownership less race-prone.)
On the Vidalia implementation side, I believe vidalia already starts out by running 'tor --version' before trying to run tor for reals. See TorProcess::version() in src/torcontrol/TorProcess.cpp. So it should be able to predict whether Tor will recognize the OwningControllerPID option.
Or it could just refuse to start a too-old version of Tor.
Also also, a while ago we had a bug where Tor would hang up on its control port because it was writing so much stuff, and Vidalia wasn't processing/reading quickly enough, that it hit the "there is too much stuff queued on this connection, kill it" trigger.
I think that trigger is now INT_MAX, so maybe that particular issue is moot now. But are there other issues, either on Tor's side or Vidalia's side, that could cause similar problems? I don't immediately see any.
This feature reminds me of a related feature mwenge advocated for a long time -- letting a controller "lock" the control port with its connection, so nobody else gets to connect so long as it remains connected. That way once your controller is connected, assuming you mean to have only one controller, some jerk can't use a flash applet to look up your control password and connect.
That idea may have a trac number or a proposal number, but I don't see it. If it doesn't, and it's still a good idea, we should record it somewhere.
This feature reminds me of a related feature mwenge advocated for a long time -- letting a controller "lock" the control port with its connection, so nobody else gets to connect so long as it remains connected. That way once your controller is connected, assuming you mean to have only one controller, some jerk can't use a flash applet to look up your control password and connect.
That's completely orthogonal to this feature, even if it would be used together with this one in almost all cases.
That idea may have a trac number or a proposal number, but I don't see it. If it doesn't, and it's still a good idea, we should record it somewhere.
I don't see a Trac ticket for it, and a duplicate ticket wouldn't hurt us much.
This is not yet complete; I plan to add the changes/ file when I implement the control port command, and these changes will also need to be documented in torspec.git/control-spec.txt.
Quick review of stuff through the current head ( f0c70fb52f44 ) :
Is there a system where we don't expect owning_controller_process_spec to be an integer of some type?
Instead of having the functions take an event_base, you can just have them call tor_libevent_get_base.
For the timer, I'd suggest looking at periodic_timer_new; it's a little more accurate, and IMO makes our intent clearer.
In monitor_owning_controller_process, if we're going to make a failure to allocate owning_controller_process_monitor be a recoverable error, we should make sure that it frees and clears owning_controller_process_spec so that the two are consistent again. On the other hand, if we don't plan to make it a recoverable error, then the log message should be log_err.
Quick review of stuff through the current head ( f0c70fb52f44 ) :
Is there a system where we don't expect owning_controller_process_spec to be an integer of some type?
I expect owning_controller_process_spec to begin with an integer on all systems. I expect that in the future, that integer will be followed by the process's start time (in some format) on Linux, Windows NT, FreeBSD, and probably other OSes.
Instead of having the functions take an event_base, you can just have them call tor_libevent_get_base.
For the timer, I'd suggest looking at periodic_timer_new; it's a little more accurate, and IMO makes our intent clearer.
I want procmon.h and procmon.c to be easy for people to port to plain libevent 2.x, so that other Tor-related programs (e.g. protocol obfuscators) can implement the same process-ownership features that Tor has. I'll leave in the event_base parameters in case someone else needs them, but using periodic_timer_new sounds good.
In monitor_owning_controller_process, if we're going to make a failure to allocate owning_controller_process_monitor be a recoverable error, we should make sure that it frees and clears owning_controller_process_spec so that the two are consistent again. On the other hand, if we don't plan to make it a recoverable error, then the log message should be log_err.
It should be an explicitly unrecoverable error, even though tor_process_monitor_new cannot currently return NULL.
Instead of having the functions take an event_base, you can just have them call tor_libevent_get_base.
For the timer, I'd suggest looking at periodic_timer_new; it's a little more accurate, and IMO makes our intent clearer.
I want procmon.h and procmon.c to be easy for people to port to plain libevent 2.x, so that other Tor-related programs (e.g. protocol obfuscators) can implement the same process-ownership features that Tor has. I'll leave in the event_base parameters in case someone else needs them, but using periodic_timer_new sounds good.
Great.
FWIW, if somebody's requiring Libevent 2, periodic_timer_new() isn't necessary: they can just add the EV_PERSIST flag to their timer events. The periodic_timer_new() function is there to make backward compatibility with libevent 1.x possible.
In monitor_owning_controller_process, if we're going to make a failure to allocate owning_controller_process_monitor be a recoverable error, we should make sure that it frees and clears owning_controller_process_spec so that the two are consistent again. On the other hand, if we don't plan to make it a recoverable error, then the log message should be log_err.
It should be an explicitly unrecoverable error, even though tor_process_monitor_new cannot currently return NULL.
Instead of having the functions take an event_base, you can just have them call tor_libevent_get_base.
For the timer, I'd suggest looking at periodic_timer_new; it's a little more accurate, and IMO makes our intent clearer.
I want procmon.h and procmon.c to be easy for people to port to plain libevent 2.x, so that other Tor-related programs (e.g. protocol obfuscators) can implement the same process-ownership features that Tor has. I'll leave in the event_base parameters in case someone else needs them, but using periodic_timer_new sounds good.
Great.
FWIW, if somebody's requiring Libevent 2, periodic_timer_new() isn't necessary: they can just add the EV_PERSIST flag to their timer events. The periodic_timer_new() function is there to make backward compatibility with libevent 1.x possible.
I decided to not use periodic_timer_new, because procmon.c really should use a struct event directly someday when Libevent 2.17 learns to detect process termination for us. I did use the EV_PERSIST flag on Libevent 2, though.
Looking good; I'd like to take another look at the shutdown code when I'm a little more awake. (Also, somebody should at least try building and running this on Windows before we merge.)
I still like it. One thing -- is it right for there to be more than one "owning" controller at a time? The "ownership" metaphor implies exclusivity, but the TAKEOWNERSHIP command doesn't seem to prevent an arbitrary number of controllers from having "ownership" at the same time. I think that's not an unreasonable thing to do, but we sure want to document it, and document how TAKEOWNERSHIP interacts with __OwningControllerProcess: Otherwise somebody will get surprised.
Also, I think we ought to have a way to relinquish ownership, though that can be another patch.
I still like it. One thing -- is it right for there to be more than one "owning" controller at a time? The "ownership" metaphor implies exclusivity, but the TAKEOWNERSHIP command doesn't seem to prevent an arbitrary number of controllers from having "ownership" at the same time. I think that's not an unreasonable thing to do, but we sure want to document it, and document how TAKEOWNERSHIP interacts with __OwningControllerProcess: Otherwise somebody will get surprised.
I've pushed further explanation of the behaviour of TAKEOWNERSHIP on multiple control connections and the intended use of these features to my torspec feature3049 branch, along with two fixup! commits to document these features as ‘added in 0.2.2.28-beta’ (since it seems they will be added in the next release).
Also, I think we ought to have a way to relinquish ownership, though that can be another patch.
That does seem proper, even though I have no good use for that in mind. (I wouldn't want a controller to set TAKEOWNERSHIP if it will want a Tor process to outlive it.)
This feature reminds me of a related feature mwenge advocated for a long time -- letting a controller "lock" the control port with its connection, so nobody else gets to connect so long as it remains connected. That way once your controller is connected, assuming you mean to have only one controller, some jerk can't use a flash applet to look up your control password and connect.
If a controller ‘owns’ Tor in the sense used here (i.e. Tor dies when its controller does), that controller can keep other processes from connecting to its Tor instance by sending ‘RESETCONF ControlPort’ to turn off the control-port listener, without breaking existing control connections.
That isn't the Right Thing if Tor needs to outlive a single controller process, as the control listener would not reappear if the control connection closes -- but if your Tor process outlives its controller, ensuring that only one controller is connected at a time is likely to be either unnecessary or insufficient.