Opened 5 years ago

Closed 4 years ago

#13003 closed enhancement (fixed)

Figure out a better strategy to avoid concurrent Onionoo executions

Reported by: karsten Owned by:
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The current strategy for the hourly cronjob to avoid concurrent executions is to write a lock file at startup and delete it upon termination. And if there's already such a file at startup the cronjob doesn't start.

This strategy works fine if there's a live process not succeeding on time. It fails pretty badly if a process died, because subsequent runs won't start without human intervention.

Child Tickets

Attachments (1)

0001-SMTPAppender-configuration-added.patch (1.9 KB) - added by iwakeh 5 years ago.
additional SMTPAppender

Download all attachments as: .zip

Change History (24)

comment:1 Changed 5 years ago by Sebastian

You could write the pid of the process into the lockfile, and if there's a lockfile check that a process with that pid is still running. If it isn't, remove the lockfile and start over. This is not 100% if the system starts another process with the same pid, but should be a lot better. You can make it more complex too, but that's probably not warranted

Last edited 5 years ago by Sebastian (previous) (diff)

comment:2 Changed 5 years ago by karsten

That could work. But I'm not sure if there's a platform-independent way to build it. (One might argue that the current cron strategy is not platform-independent anyway, but hey.)

Or maybe it's time to move away from having cron execute a new hourly Java process towards a single Java process that runs in a loop and sleeps after an execution until the next. I just wonder how the service operator would learn when an execution fails, ideally via email. Hmm.

comment:3 Changed 5 years ago by iwakeh

I second giving up cron:
cron is designed for entirely independ executions (ideally).
Having a long running task and not wanting the next task to run, unless the
previous one is finished introduces a dependency between the two.
Once there is such a dependency it should rather be addressed by design.

Some ideas for a different approach:

  • Using the ScheduledExecutorService with fixed rate or delay gives quite some control over the single executions and could mail in case of failure.
  • If the future version of onionoo uses logging (e.g. logback), the failure mails could be sent via an SMTPAppender.
  • Actually, a combination of these two might be best: java controlled timed execution and mailing via SMTPAppender.

comment:4 Changed 5 years ago by Sebastian

Many regularly running cronjobs employ locking if parallel execution is a concern. Your machine might always have a clock jump, a high load or some other condition preventing orderly execution otherwise. If onionoo is prone to dying (which apparently happened here), you'd want a watchdog to do the email sending bits in a different process anyway, rather than increase the lifetime of the onionoo process?

comment:5 Changed 5 years ago by Sebastian

(The watchdog would be platform-specific again, in any case. What are the target platforms? Does it really include windows? Because cron is easily portable to bsd)

comment:6 Changed 5 years ago by karsten

I like the idea of using ScheduledExecutorService and SMTPAppender for logging warnings and non-fatal errors. I'd like to move away from cron for that. This is currently blocking on better logging infrastructure, but once we have that, I'd say let's switch.

Regarding the watchdog, I wonder if we can add a Nagios warning for this. In theory, it's sufficient to download https://onionoo.torproject.org/summary?limit=0 and make sure the included timestamps are not older than, say, three hours. Sebastian, do you know how to write such a warning and make sure that I learn about problems via email?

comment:7 Changed 5 years ago by Sebastian

I don't have access to Tor's nagios infrastructure, sorry

comment:8 in reply to:  7 Changed 5 years ago by karsten

Replying to Sebastian:

I don't have access to Tor's nagios infrastructure, sorry

Okay, created #13008 for this.

comment:9 in reply to:  6 ; Changed 5 years ago by karsten

Replying to karsten:

I like the idea of using ScheduledExecutorService and SMTPAppender for logging warnings and non-fatal errors. I'd like to move away from cron for that. This is currently blocking on better logging infrastructure, but once we have that, I'd say let's switch.

The Nagios warning is implemented, and the better logging infrastructure is in place. That means this ticket isn't blocking on anything anymore. Should we try the SMTPAppender first and then switch to using ScheduledExecutorService? Would you be able to submit a patch for the former?

comment:10 in reply to:  9 Changed 5 years ago by iwakeh

Not so much a patch b/c it is mostly configuration and I don't know
the Onionoo server parameters. Hence, some changes will
be necessary after applying the patch. I tried to indicate the changes
by XML comments.

The attached patch adds lines to build.xml and logback.xml.

In addition, a java mail implementation has to be provided by installing
gnumail-1.1.2.jar and gnumail-providers-1.1.2.jar, which can be found in
wheezy package libgnumail-java.

Well, now mailing should work. That is, once an ERROR is logged the e-mails
are triggered and will contain a few log messages up-to this error.

If there are any problems/questions, please ask.

Changed 5 years ago by iwakeh

additional SMTPAppender

comment:11 Changed 5 years ago by karsten

Hmm, this doesn't work as expected. Here's what I changed, compared to commit 0a2f917:

metrics@sewerzowi:/srv/onionoo.torproject.org/onionoo$ git diff
diff --git a/bin/update.sh b/bin/update.sh
index 9294f41..52b350f 100755
--- a/bin/update.sh
+++ b/bin/update.sh
@@ -1,3 +1,3 @@
 #!/bin/bash
-ant run >> log && cat errors
+ant run >> log && cat onionoo-err.log
 
diff --git a/build.xml b/build.xml
index b4a9031..e37cf6d 100644
--- a/build.xml
+++ b/build.xml
@@ -26,6 +26,10 @@
       <include name="logback-core.jar"/>
       <include name="slf4j-api.jar"/>
     </fileset>
+    <fileset dir="gnumail">
+      <include name="gnumail-1.1.2.jar"/>
+      <include name="gnumail-providers-1.1.2.jar"/>
+    </fileset>
     <fileset dir="deps/metrics-lib">
       <include name="descriptor.jar"/>
     </fileset>
diff --git a/logback.xml b/logback.xml
index f6d16d0..b801045 100644
--- a/logback.xml
+++ b/logback.xml
@@ -47,8 +47,19 @@
     </filter>
   </appender>
 
+  <appender name="EMAIL" class="ch.qos.logback.classic.net.SMTPAppender">
+    <smtpHost>localhost</smtpHost>
+    <to>karsten@torproject.org</to>
+    <from>metrics@sewerzowi.torproject.org</from>
+    <subject>ONIONOO: %level %logger{20} - %m</subject>
+    <layout class="ch.qos.logback.classic.PatternLayout">
+      <pattern>${utc-date-pattern} [runtime: %r] %msg%n</pattern>
+    </layout>
+  </appender>
+
   <logger name="org.torproject" >
     <appender-ref ref="FILEERR" />
+    <appender-ref ref="EMAIL" />
   </logger>
   <logger name="org.torproject.onionoo.cron.Main" >
     <appender-ref ref="FILESTATISTICS" />
@@ -57,6 +68,7 @@
   <!-- a named logger -->
   <logger name="statistics" >
     <appender-ref ref="FILESTATISTICS" />
+    <appender-ref ref="EMAIL" />
   </logger>
 
   <root level="ALL">

Note that I added the appender to two loggers for testing purposes. I would expect it to send me mail once per hour with the statistics. Still, I don't receive anything.

I also tried sending mail from the command line. The following command succeeds and results in an actual email in my inbox:

metrics@sewerzowi:/srv/onionoo.torproject.org/onionoo$ echo "Test body." | mail -s "Test subject" karsten@torproject.org

Not sure if this is an issue that sysadmins could fix, because sending mail apparently works. Is there an easy way to debug this on the Java side?

comment:12 in reply to:  11 ; Changed 5 years ago by iwakeh

Unless an ERROR was logged, I think this is normal behavior.
Quote from my comment below:
Well, now mailing should work. That is, once an ERROR is logged the e-mails
are triggered and will contain a few log messages up-to this error.

Currently statistics are logged as INFO. Hence, no mail.

The mail appender functionality is intended as follows:
The log buffer of the appender is filled up to a certain number of lines, then the first
lines are dropped when more logging statements arrive. The first ERROR logged will cause
the mailing of all the lines in the buffer up to the ERROR in a single mail. This makes
it possible to tell what happend just by reading that one mail.

If the stat-log should be mailed, too, this requires either changing its log-level
(actually one dummy ERROR written to "statistics" after the statistic logging lines
should suffice trigger the mailing, anything else could stay on INFO), or some additional
coding (more than editing logback.xml).

PS:
I usually have an error immediatly after starting ant run, b/c I still do not have the
correct geoip setup. Is the documentation already updated or should I open a ticket?
In addition, I usually have time-out errors during the run.

Last edited 5 years ago by iwakeh (previous) (diff)

comment:13 in reply to:  12 Changed 5 years ago by karsten

Replying to iwakeh:

Unless an ERROR was logged, I think this is normal behavior.

Are you sure? I explicitly added the EMAIL appender to the statistics logger. I don't see where messages would be filtered based on log level.

I also added an error log statement to the end of Main, but still don't receive emails.

Here's the current diff:

diff --git a/bin/update.sh b/bin/update.sh
index 9294f41..52b350f 100755
--- a/bin/update.sh
+++ b/bin/update.sh
@@ -1,3 +1,3 @@
 #!/bin/bash
-ant run >> log && cat errors
+ant run >> log && cat onionoo-err.log
 
diff --git a/build.xml b/build.xml
index b4a9031..e37cf6d 100644
--- a/build.xml
+++ b/build.xml
@@ -26,6 +26,10 @@
       <include name="logback-core.jar"/>
       <include name="slf4j-api.jar"/>
     </fileset>
+    <fileset dir="gnumail">
+      <include name="gnumail-1.1.2.jar"/>
+      <include name="gnumail-providers-1.1.2.jar"/>
+    </fileset>
     <fileset dir="deps/metrics-lib">
       <include name="descriptor.jar"/>
     </fileset>
diff --git a/logback.xml b/logback.xml
index f6d16d0..b801045 100644
--- a/logback.xml
+++ b/logback.xml
@@ -47,8 +47,19 @@
     </filter>
   </appender>
 
+  <appender name="EMAIL" class="ch.qos.logback.classic.net.SMTPAppender">
+    <smtpHost>localhost</smtpHost>
+    <to>karsten@torproject.org</to>
+    <from>metrics@sewerzowi.torproject.org</from>
+    <subject>ONIONOO: %level %logger{20} - %m</subject>
+    <layout class="ch.qos.logback.classic.PatternLayout">
+      <pattern>${utc-date-pattern} [runtime: %r] %msg%n</pattern>
+    </layout>
+  </appender>
+
   <logger name="org.torproject" >
     <appender-ref ref="FILEERR" />
+    <appender-ref ref="EMAIL" />
   </logger>
   <logger name="org.torproject.onionoo.cron.Main" >
     <appender-ref ref="FILESTATISTICS" />
@@ -57,6 +68,7 @@
   <!-- a named logger -->
   <logger name="statistics" >
     <appender-ref ref="FILESTATISTICS" />
+    <appender-ref ref="EMAIL" />
   </logger>
 
   <root level="ALL">
diff --git a/src/main/java/org/torproject/onionoo/cron/Main.java b/src/main/java/org/torproject/onionoo/cron/Main.java
index d9cb1b1..16b7cc1 100644
--- a/src/main/java/org/torproject/onionoo/cron/Main.java
+++ b/src/main/java/org/torproject/onionoo/cron/Main.java
@@ -74,7 +74,7 @@ public class Main {
           + "execution may not start as expected");
     }
 
-    log.info("Terminating.");
+    log.error("Terminating.");
   }
 }

If the stat-log should be mailed, too, [...]

No, this is just for testing.

PS:
I usually have an error immediatly after starting ant run, b/c I still do not have the
correct geoip setup. Is the documentation already updated or should I open a ticket?

Oops. Please find the updated INSTALL file in master.

In addition, I usually have time-out errors during the run.

Can you be more specific, ideally in a new ticket?

comment:14 Changed 5 years ago by iwakeh

Why are your gnumail jars not in /usr/share/java as all the others?
I assume javax.mail.* from the gnumail jars is not found.
The setup looks ok otherwise.
For debugging, set the following in logback.xml:

- <configuration debug="false">
+ <configuration debug="true">

This will print a detailed logging setup at the beginning.
I hope you'll find a ClassNotFound somewhere.

(There should be a ticket for logging documentation, where these things
can be kept for future reference. Contributor or Deployer?)

Are you sure? I explicitly added the EMAIL appender to the statistics logger.> I don't see where messages would be filtered based on log level.

The SMTPAppender is triggered by ERROR; it's part of its functionality.
(The default setting, which is enabled in our case, uses an OnErrorEvaluator, tar-ball with sources).

I also added an error log statement to the end of Main, but still don't receive emails.

That should trigger mailing.

Oops. Please find the updated INSTALL file in master.

Thanks!

In addition, I usually have time-out errors during the run.

Can you be more specific, ideally in a new ticket?

Download time-out errors due to my internet connection, I guess.
No need for a new ticket, I think.

 Could not fetch or store 
	https://collector.torproject.org/recent/bridge-descriptors/statuses/20140917-093705-4A0CCD2DDC7995083D73F5D667100C8A5831F16D.  Skipping.
	Reason: Connection timed out
Last edited 5 years ago by iwakeh (previous) (diff)

comment:15 in reply to:  14 ; Changed 5 years ago by karsten

Replying to iwakeh:

Why are your gnumail jars not in /usr/share/java as all the others?
I assume javax.mail.* from the gnumail jars is not found.

The jars are not installed to /usr/share/java/ yet, but I copied them over for testing. Once everything works I'm planning to ask our sysadmin to install them. But the jars should be present:

metrics@sewerzowi:/srv/onionoo.torproject.org/onionoo$ ls -l gnumail/
total 288
-rw-r--r-- 1 metrics metrics 180561 Sep 15 14:30 gnumail-1.1.2.jar
-rw-r--r-- 1 metrics metrics 110399 Sep 15 14:30 gnumail-providers-1.1.2.jar

The setup looks ok otherwise.
For debugging, set the following in logback.xml:

- <configuration debug="false">
+ <configuration debug="true">

This will print a detailed logging setup at the beginning.
I hope you'll find a ClassNotFound somewhere.

Hmm, nothing in onionoo-all.log. Weird. Do you have an example of what should be logged?

(There should be a ticket for logging documentation, where these things
can be kept for future reference. Contributor or Deployer?)

Fine question. (I don't have a good answer yet.)

Are you sure? I explicitly added the EMAIL appender to the statistics logger.> I don't see where messages would be filtered based on log level.

The SMTPAppender is triggered by ERROR; it's part of its functionality.
(The default setting, which is enabled in our case, uses an OnErrorEvaluator, tar-ball with sources).

I also added an error log statement to the end of Main, but still don't receive emails.

That should trigger mailing.

Understood. Makes sense!

Oops. Please find the updated INSTALL file in master.

Thanks!

In addition, I usually have time-out errors during the run.

Can you be more specific, ideally in a new ticket?

Download time-out errors due to my internet connection, I guess.
No need for a new ticket, I think.

 Could not fetch or store 
	https://collector.torproject.org/recent/bridge-descriptors/statuses/20140917-093705-4A0CCD2DDC7995083D73F5D667100C8A5831F16D.  Skipping.
	Reason: Connection timed out

Okay.

comment:16 in reply to:  15 Changed 5 years ago by iwakeh

For debugging, set the following in logback.xml:

- <configuration debug="false">
+ <configuration debug="true">

This will print a detailed logging setup at the beginning.
I hope you'll find a ClassNotFound somewhere.

Hmm, nothing in onionoo-all.log. Weird. Do you have an example of what should be logged?

Well, at the beginning the logging setup is written to Stdout, if debug is enabled as above.
Makes sense, b/c the logging is in the process of being configured.

vagrant@vagrant:/srv/onionoo.torproject.org/onionoo$ ant run
     [java] 11:22:25,917 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
     [java] 11:22:25,917 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
     [java] 11:22:25,917 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/srv/onionoo.torproject.org/onionoo/classes/logback.xml]
     [java] 11:22:26,013 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
     [java] 11:22:26,021 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
     [java] 11:22:26,032 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [FILEALL]
     [java] 11:22:26,076 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
     [java] 11:22:26,195 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy - No compression will be used
     [java] 11:22:26,197 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy - Will use the pattern ./onionoo-all.%d{yyyy-MM-dd}.%i.log for the active file
....

and a lot more.

If an Exception is thrown even more lines.
The exception I hope to see:

...
     [java] 11:28:17,750 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.classic.net.SMTPAppender]
     [java] 11:28:17,752 |-ERROR in ch.qos.logback.core.joran.action.AppenderAction - Could not create an Appender of type [ch.qos.logback.classic.net.SMTPAppender]. ch.qos.logback.core.util.DynamicClassLoadingException: Failed to instantiate type ch.qos.logback.classic.net.SMTPAppender
     [java] 	at ch.qos.logback.core.util.DynamicClassLoadingException: Failed to instantiate type ch.qos.logback.classic.net.SMTPAppender
     [java] 	at 	at ch.qos.logback.core.util.OptionHelper.instantiateByClassName(OptionHelper.java:54)
     [java] 	at 	at ch.qos.logback.core.util.OptionHelper.instantiateByClassName(OptionHelper.java:32)
     [java] 	at 	at ch.qos.logback.core.joran.action.AppenderAction.begin(AppenderAction.java:54)
     [java] 	at 	at ch.qos.logback.core.joran.spi.Interpreter.callBeginAction(Interpreter.java:276)
     [java] 	at 	at ch.qos.logback.core.joran.spi.Interpreter.startElement(Interpreter.java:148)
     [java] 	at 	at ch.qos.logback.core.joran.spi.Interpreter.startElement(Interpreter.java:130)
     [java] 	at 	at ch.qos.logback.core.joran.spi.EventPlayer.play(EventPlayer.java:50)
     [java] 	at 	at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:147)
     [java] 	at 	at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:133)
     [java] 	at 	at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:96)
     [java] 	at 	at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:55)
     [java] 	at 	at ch.qos.logback.classic.util.ContextInitializer.configureByResource(ContextInitializer.java:75)
     [java] 	at 	at ch.qos.logback.classic.util.ContextInitializer.autoConfig(ContextInitializer.java:148)
     [java] 	at 	at org.slf4j.impl.StaticLoggerBinder.init(StaticLoggerBinder.java:84)
     [java] 	at 	at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:54)
     [java] 	at 	at org.slf4j.LoggerFactory.bind(LoggerFactory.java:128)
     [java] 	at 	at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:108)
     [java] 	at 	at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:279)
     [java] 	at 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:252)
     [java] 	at 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:265)
     [java] 	at 	at org.torproject.onionoo.server.ServerMain.<clinit>(ServerMain.java:12)
     [java] Caused by: java.lang.NoClassDefFoundError: javax/mail/Message
     [java] 	at 	at java.lang.Class.getDeclaredConstructors0(Native Method)
     [java] 	at 	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
     [java] 	at 	at java.lang.Class.getConstructor0(Class.java:2842)
     [java] 	at 	at java.lang.Class.newInstance(Class.java:345)
     [java] 	at 	at ch.qos.logback.core.util.OptionHelper.instantiateByClassName(OptionHelper.java:50)
     [java] 	at 	... 20 common frames omitted
     [java] Caused by: java.lang.ClassNotFoundException: javax.mail.Message
     [java] 	at 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
     [java] 	at 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
     [java] 	at 	at java.security.AccessController.doPrivileged(Native Method)
     [java] 	at 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
     [java] 	at 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
     [java] 	at 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
     [java] 	at 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
     [java] 	at 	... 25 common frames omitted
     [java] 11:28:17,752 |-ERROR in ch.qos.logback.core.joran.spi.Interpreter@23:73 - ActionException in Action for tag [appender] ch.qos.logback.core.joran.spi.ActionException: ch.qos.logback.core.util.DynamicClassLoadingException: Failed to instantiate type ch.qos.logback.classic.net.SMTPAppender
 ...

PS:

If there are no exceptions for the SMTPAppender, it could be the e-mail address in from: metrics@sewerzowi.torproject.org.
Your mail test above (at the end of comment 11) might send from metrics@localhost.
If that is the reason for the missing mails. The inbox of ’metrics’ could have the rejected mails.
And, the mailing will work with metrics@localhost.

Last edited 5 years ago by iwakeh (previous) (diff)

comment:17 Changed 5 years ago by karsten

No exception, it seems:

     [java] 18:15:02,806 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
     [java] 18:15:02,807 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
     [java] 18:15:02,807 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [file:/srv/onionoo.torproject.org/onionoo/classes/logback.xml]
     [java] 18:15:02,965 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
     [java] 18:15:02,970 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [FILEALL]
     [java] 18:15:03,013 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
     [java] 18:15:03,113 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy - No compression will be used
     [java] 18:15:03,115 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy - Will use the pattern /srv/onionoo.torproject.org/onionoo/onionoo-all.%d{yyyy-MM-dd}.%i.log for the active file
     [java] 18:15:03,118 |-INFO in ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@2e6ed964 - The date pattern is 'yyyy-MM-dd' from file name pattern '/srv/onionoo.torproject.org/onionoo/onionoo-all.%d{yyyy-MM-dd}.%i.log'.
     [java] 18:15:03,119 |-INFO in ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@2e6ed964 - Roll-over at midnight.
     [java] 18:15:03,123 |-INFO in ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP@2e6ed964 - Setting initial period to Wed Sep 17 17:32:19 UTC 2014
     [java] 18:15:03,129 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILEALL] - Active log file name: /srv/onionoo.torproject.org/onionoo/onionoo-all.log
     [java] 18:15:03,130 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILEALL] - File property is set to [/srv/onionoo.torproject.org/onionoo/onionoo-all.log]
     [java] 18:15:03,131 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.FileAppender]
     [java] 18:15:03,131 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [FILEERR]
     [java] 18:15:03,132 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
     [java] 18:15:03,141 |-INFO in ch.qos.logback.core.FileAppender[FILEERR] - File property is set to [/srv/onionoo.torproject.org/onionoo/onionoo-err.log]
     [java] 18:15:03,141 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.FileAppender]
     [java] 18:15:03,141 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [FILESTATISTICS]
     [java] 18:15:03,142 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
     [java] 18:15:03,151 |-INFO in ch.qos.logback.core.FileAppender[FILESTATISTICS] - File property is set to [/srv/onionoo.torproject.org/onionoo/onionoo-statistics.log]
     [java] 18:15:03,151 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.classic.net.SMTPAppender]
     [java] 18:15:03,165 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [EMAIL]
     [java] 18:15:03,230 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [FILEERR] to Logger[org.torproject]
     [java] 18:15:03,231 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [EMAIL] to Logger[org.torproject]
     [java] 18:15:03,232 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [FILESTATISTICS] to Logger[org.torproject.onionoo.cron.Main]
     [java] 18:15:03,232 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [FILESTATISTICS] to Logger[statistics]
     [java] 18:15:03,232 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [EMAIL] to Logger[statistics]
     [java] 18:15:03,232 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to ALL
     [java] 18:15:03,232 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [FILEALL] to Logger[ROOT]
     [java] 18:15:03,232 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
     [java] 18:15:03,234 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@629bee3a - Registering current configuration as safe fallback point
     [java] 18:15:03,241 |-INFO in ch.qos.logback.classic.net.SMTPAppender[EMAIL] - SMTPAppender [EMAIL] is tracking [1] buffers

comment:18 Changed 5 years ago by iwakeh

I added a PS above while you tried this.

comment:19 Changed 5 years ago by karsten

Still no luck. But I wonder if we can do better than using SMTPAppender. The Nagios plugin that checks that Onionoo is running and returns recent data works just fine. Maybe we should write a second plugin that checks that the back-end cronjob works without problems. And we could write a third plugin that checks that the front-end part works without issues. The two new plugins would probably simply look for a file with error logs on disk. I can write those scripts using Python. Also, not requiring a working mail setup could make deployment easier, too. Whoever wants to deploy Onionoo can use whatever they like to watch the service, which could be our Nagios scripts or something else. What do you think?

comment:20 Changed 5 years ago by iwakeh

Yes, you're right. Monitoring should be separated from the application.

For the front-end I'd suggest jmx/MBeans (actually comment here #11573). These can be very easily be verified with a nagios plugin.
We even could re-use the jmx-example that comes with the tomcat-extra (i.e. jmx) package.
It lists sessions, memory and the like as plain text.

Using nagios plugins right now is useful and won't prevent any other/additional monitoring later on.
And, python plugins are way quicker to write and deploy.

Depending on the server setup, the nagios plugin might get read access to the logs of onionoo's backend?

comment:21 Changed 4 years ago by karsten

Type: defectenhancement

Without having re-read all text above, I think this is an enhancement, not a defect.

comment:22 Changed 4 years ago by iwakeh

Isn't this ticket solved with #14826 ?

comment:23 in reply to:  22 Changed 4 years ago by karsten

Resolution: fixed
Status: newclosed

Replying to iwakeh:

Isn't this ticket solved with #14826 ?

Looks like it. Great! Resolving.

Note: See TracTickets for help on using tickets.