Opened 6 years ago

Closed 5 years ago

Last modified 7 months ago

#11016 closed enhancement (implemented)

Add support for systemd watchdog protocol

Reported by: misc Owned by:
Priority: Low Milestone: Tor: 0.2.6.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay systemd
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Systemd provides a watchdog protocol, as described on http://0pointer.de/blog/projects/watchdog.html . It basically requires the service to write a message to a socket every X second, if some environment variables are set. While present since a long time, the newer v209 version provides a helper function to implement it, by moving the parsing logic in a library.

So here is 2 patches for tor :

The first one implement the Notify protocol for systemd, thus permitting people to use Type=notify in the systemd unit ( see http://www.freedesktop.org/software/systemd/man/systemd.service.html for the detail, as well as the rather lengthy debate on Debian init system for Jessie ). The patch is not doing much or adding much features, but I guess some people would prefer to have this Type of systemd unit rather than simple as it could prevent some race condition if a service requires tor to be really started and working, before being started himself. It also add status line to systemd, but that's not a feature that is currently used much ( I do hope maybe some higher level tool like cockpit would use it later ).

The 2nd one is adding a event to the main loop to ping the watchdog on a regular basis, using the new function in libsystemd-daemon. So this way, if tor is stuck, it will be restarted. I guess I do not have to explain how and why this improve tor :)

So far, I only have been able to test the first one on my Fedora 20 and I am waiting for package to test the 2nd one in real life. However, in order to release early, release often, I upload them here for review.

Child Tickets

Attachments (2)

0002-Add-support-for-systemd-watchdog-protocol.patch (2.5 KB) - added by misc 6 years ago.
patch to add systemd watchdog protocol support
0001-add-support-for-systemd-notification-protocol.patch (4.8 KB) - added by misc 5 years ago.
New version ( 2.0 ) of the patch

Download all attachments as: .zip

Change History (16)

Changed 6 years ago by misc

patch to add systemd watchdog protocol support

comment:1 Changed 6 years ago by misc

So the patch i sent first was wrong, I have added a fixed one ( the one for watchdog ).

I am also looking at directly pushing the support in libevent, as this would benefit to more software and avoid code duplication. If that's accepted there, then I guess this bug report would be moot ( at least for the watchdog part )

comment:2 Changed 5 years ago by nickm

Keywords: tor-relay systemd added
Milestone: Tor: 0.2.5.x-final

comment:3 Changed 5 years ago by nickm

Status: newneeds_review

comment:4 Changed 5 years ago by nickm

Status: needs_reviewneeds_revision

note 1: does PKG_PROG_PKG_CONFIG exist everywhere? If not, do we need to add it to our m4/ directory?

note 2: Maybe the configure.in logic should check for the presence of the systemd headers. On some systems, the headers get installed separately from the libraries, and that confuses some users. They want to build Tor but wonder "why can't I build with systemd? I installed systemd!"

note 3: The systemd libs should really only get linked with the program (tor) that needs them. The stuff in tools doesn't need to link against libsystemd.

note 4: The control.c module should not be interacting with anything other than the controller. If we want to do something else once we are bootstrapped, we should have a new intermediate function (in status.c maybe?) that calls both control_event_bootstrap and systemd_event_bootstrap. I'm happy to make this change myself if you don't want to.

note 5: There should be a comment on why we divide watchdog_delay by 2.

note 6: Can watchdog_delay be 2 seconds or greater? If so, you can't divide by 2 and put it in tv_usec; libevent requires that timeout values be well-formed (with tv_usec between 0 and 999999).

comment:5 Changed 5 years ago by misc

So, to answer questions :

1) PKG_PROG_PKG_CONFIG come from pkgconfig. it is widly used and portable, but indeed, that's external to autoconf, so you could need to add it to m4/.

2) pkgconfig use a file .pc, and this one should be installed with the headers. I can add AC_CHECK_HEADER if you think someone would install systemd without the .pc but with the header, but I think that's unlikely. So if HAVE_SYSTEMD is set, that mean the .pc is here and we can reasonably think the headers are here as well.

3) indeed, will fix

4) the part about control.c can be dropped. It is just to give a status to systemd, but no UI exploit it so far, and it doesn't bring much since it will just say "done." most of the time and to me the log is enough. I will remove this part and resubmit if I find something that really bring something.

5) yep, and likely also one on the various unit conversion, it did surprised me first and so is "non obvious".

6) watchdog can be more than 2 seconds, yes, and this code is quite buggy. I only tested with a watchdog timeout of 1 second, since it was faster to see what failed :)
But I think I will propose to push the watchdog code in libevent as it would directly benefit more software than tor ( and the patch I wrote but didn't send yet in libevent is a bit more correct:

    watchdog_delay /= 2;
    watchdog.tv_sec = watchdog_delay / 1000000;
    watchdog.tv_usec = watchdog_delay % 1000000;

( but I still need to do a better job on configure.ac before submitting )

I will rediff later the patch and submit a new version of patch 1. For patch 2, depend if you can accept the same kid of feature in libevent ( ie, send a regular signal to say "still alive" by default in libevent )

comment:6 Changed 5 years ago by nickm

Thanks! I look forward to the next version of the patch.

For what it's worth, even if the code _does_ go into Libevent, we can't rely on it being there: Tor needs to work even with older versions of Libevent, since it takes distributions a while to upgrade.

Changed 5 years ago by misc

New version ( 2.0 ) of the patch

comment:7 Changed 5 years ago by misc

For the 2nd patch, I do not know how to remove it from the ticket.

Since both are independent, I shouldn't have bundled them in the same ticket. So we can have the first one (notify) without the 2nd (watchdog) at run time. And so we can have 1 in tor as notify need to know when the software is started) and the 2nd in libevent ( since that's just a simple regular message during the main loop ), and being compiled independently, especially since the watchdog stuff requires a systemd version we cannot find in any stable release of a distribution for now.

comment:8 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.6.x-final

Moving nearly all needs_revision tickets into 0.2.6, as now untimely for 0.2.5.

comment:9 Changed 5 years ago by nickm

Milestone: Tor: 0.2.6.x-finalTor: 0.2.???

I think we could take this if the revisions get made, but we're shouldn't block a release on the revisions.

comment:10 Changed 5 years ago by candrews

*bump*

Any updates on this issue? It certainly seems like it would be a valuable addition to tor.

comment:11 Changed 5 years ago by nickm

Status: needs_revisionneeds_review

Oh hey, I think maybe this should be needs_review now.

comment:12 Changed 5 years ago by nickm

Milestone: Tor: 0.2.???Tor: 0.2.6.x-final

comment:13 Changed 5 years ago by nickm

Okay; I did the rest of the requested changes, and merged it to master. Let's see how it works.

comment:14 Changed 5 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed
Note: See TracTickets for help on using tickets.