Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#19016 closed enhancement (fixed)

add shutdown hook

Reported by: iwakeh Owned by: iwakeh
Priority: Low Milestone: CollecTor 1.0.2
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: ctip
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Add shutdown hook to provide a controlled way of stopping.
This should be provided in class Main.

What exactly should be done when the shutdown-hook is called?
Especially looking at the different reasons for shutdown (system maintenance, corrupt disk, full disk, ...).

Child Tickets

Change History (16)

comment:1 Changed 3 years ago by iwakeh

Status: newneeds_information

comment:2 Changed 3 years ago by karsten

Oh wow, that's actually a tough question. And I think the answer depends on how modules are started (cron vs. internal scheduler), how that scheduler is started (cron vs. service script) and stopped (kill vs. service script), and so on. For example, I could imagine using a shutdown hook to release a lock file, but it might be that we don't need a lock file anymore once we switch to a service script.

One thing about reasons for shutdown: I don't think we can do anything useful in case of a corrupt or full disk. We should probably focus on system maintenance here. And doing that would be great, because then our sysadmins could simply install updates and reboot hosts whenever they need to.

How about we postpone this discussion until those other changes are implemented? I think we can do some good things with shutdown hooks, so I don't suggest closing this issue. But I think I can contribute more to this discussion in a month or two.

comment:3 Changed 3 years ago by iwakeh

Priority: MediumLow
Status: needs_informationnew

Yes, let's postpone this ticket.

I lower the priority and set it to new. We can increase it again when the other changes are implemented.

comment:4 Changed 3 years ago by iwakeh

Milestone: CollecTor 1.1.0

Added to second release milestone.

comment:5 Changed 3 years ago by iwakeh

Status: newneeds_information

With the scheduler functionality it makes sense to revisit this ticket.

Tasks for the shutdown-hook:

  • start a regular scheduler shutdown, which lets all running tasks finish and exits.
  • this is written to the logs in order to make the operator aware that a shutdown is running.

What else?

comment:6 Changed 3 years ago by iwakeh

Status: needs_informationneeds_review

The shutdown hook that allows currently running modules to finish properly.
Shutdown hook log statements are also printed to the console (by logback configuration) in order to indicate the shutdown process.

Please review this branch.

(The coverage-check is failing with the current master; this is also corrected in the above branch.)

comment:7 Changed 3 years ago by karsten

Status: needs_reviewneeds_revision

Looks great! Here are a few suggestions to make it even better:

  • Would it be possible to catch a second kill signal that terminates the process immediately? The tor daemon does this by first giving 30 seconds to shut down gracefully and shutting down ungracefully upon receiving the second kill signal. Of course, there's always the option to kill -9, but that might seem like, erm, overkill.
  • Should the log message mention the 20 minutes grace period, like: "Waiting up to 20 minutes for termination of running tasks ..."? If we implement the previous suggestion, how about: "Waiting up to 20 minutes for termination of running tasks; kill again to terminate immediately ..."?
  • How's this change log entry?
diff --git a/CHANGELOG.md b/CHANGELOG.md
index cf1ea12..ad9da3f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,7 +1,9 @@
-# Changes in version 1.1.0 - 2016-08-31
+# Changes in version 1.1.0 - 2016-09-xx
 
  * Medium changes
    - Add support for Bifroest's bridge descriptor tarballs.
+   - Use a shutdown hook that gives currently running modules up to 20
+     minutes to finish properly, rather than killing them immediately.
 
  * Minor changes
    - Remove quotes around base URL in index.json.

comment:8 Changed 3 years ago by karsten

Addendum: It looks like ShutdownHook.java contains quite a few unused imports. I'll let Eclipse organize imports of that class and squash those changes into your commit before merging to master.

comment:9 Changed 3 years ago by iwakeh

Thanks, for the suggestions!

I think there is no way to catch a second kill signal; the shutdown thread is only called once by the jvm.

The 20 min waiting time is pretty arbitrary and could only be fully used by relaydesc module anyway, afaict.
I thought that it should accommodate a just started relaydesc download to finish properly to make sense.

Maybe this grace period should be made configurable to give operators better control?

Please review my branch with the additional log-stmt and w/o the redundant imports (Thanks for spotting these!);
and another commit with a configurable grace period.

With the configurable grace period the change log needs to be adapted otherwise it's fine.

comment:10 Changed 3 years ago by iwakeh

Status: needs_revisionneeds_review

comment:11 Changed 3 years ago by karsten

Looks good! Last question before merging: Should collector.properties contain 20 as default value rather than 10? (Happy to change that, but figured I should ask first, because there might be a reason for this deviation.)

comment:12 Changed 3 years ago by iwakeh

I changed the default, b/c I observed that a relaydesc module run doesn't take more than 10 min to finish.

Feel free to adapt the minutes to the number of minutes the longest running module (on the main CollecTor instance) would take to finish properly.

comment:13 Changed 3 years ago by karsten

Ah, 10 is fine. Do you mind if I change the remaining 20s in Scheduler.java (variable gracePeriodMinutes) to 10, too?

comment:14 in reply to:  13 Changed 3 years ago by iwakeh

Replying to karsten:

Ah, 10 is fine. Do you mind if I change the remaining 20s in Scheduler.java (variable gracePeriodMinutes) to 10, too?

That's fine. Better be consistent when setting the defaults.

comment:15 Changed 3 years ago by karsten

Resolution: fixed
Status: needs_reviewclosed

Merged! Thanks for building this, this is really a step towards making CollecTor more reliable! Closing.

comment:16 Changed 3 years ago by iwakeh

Milestone: CollecTor 1.1.0CollecTor 1.0.2

Added to appropriate milestone.

Note: See TracTickets for help on using tickets.