Fail-fast if file access permissions for out and status prevent an update

added component::metrics/onionoo priority::medium reporter::leeroy resolution::invalid severity::normal status::closed type::enhancement labels

Yes, this sounds useful. I could imagine checking whether the out/ directory exists, and if not, attempt to create it before this line. Doing the same for the status/ directory would be slightly more complicated, but having such a check for out/ should already catch most cases. Want to submit a patch?

By the way, I'm curious how you create regular backups. What I do is stop the cronjob, copy directories, and re-enable the cronjob. Maybe you're doing something smarter there?

Sure I'll look into it. This happened when I tried to transfer a backup to a new instance of Onionoo running branch-task-13600. I just forgot to recursively change owner on out and status and found the server flooding the logs and rotating over dozens of log files.

I pretty much do the same thing for backups. Because I've been testing multiple server versions I usually try to delay importing the latest recent data. Wait until I'm between updates, tar.bz2 the folders, then transfer them elsewhere. I could probably make my life easier by doing all this as the onionoo user and using a location which they can access.

On a related issue: By my estimation, each month adds ~460MB and ~64K uncompressed files (in the past 1.5 years). So a fully deployed server can be expected to have to backup ~40GB. A lot of small files too, so lots of IO. Should Onionoo include functionality for backup/restore? This might work better with an hourly updater + archives, and could be done using java.util.zip or apache.commons.compress.

On another related issue: Should I make a ticket for having the server stop trying to process the update if a certain number of errors occur? If the server keeps trying to update this might make a (hardware) failure worse. Maybe the server should stop trying to update if encountering many errors. The server admin will get a notice somehow (if they configured it right) so they probably have other things to worry about at this point.

Trac:
Username: leeroy

Replying to leeroy:

Sure I'll look into it. This happened when I tried to transfer a backup to a new instance of Onionoo running branch-task-13600. I just forgot to recursively change owner on out and status and found the server flooding the logs and rotating over dozens of log files.

I pretty much do the same thing for backups. Because I've been testing multiple server versions I usually try to delay importing the latest recent data. Wait until I'm between updates, tar.bz2 the folders, then transfer them elsewhere. I could probably make my life easier by doing all this as the onionoo user and using a location which they can access.

Sounds like a special case, but if it's easy to avoid flooding the logs by making sure that out/ either does not exist or is a directory, let's do it.

On a related issue: By my estimation, each month adds ~460MB and ~64K uncompressed files (in the past 1.5 years). So a fully deployed server can be expected to have to backup ~40GB. A lot of small files too, so lots of IO. Should Onionoo include functionality for backup/restore? This might work better with an hourly updater + archives, and could be done using java.util.zip or apache.commons.compress.

I'm yet unsure how Onionoo would include functionality for backup/restore. Want to create a new ticket for that and explain in more detail?

On another related issue: Should I make a ticket for having the server stop trying to process the update if a certain number of errors occur? If the server keeps trying to update this might make a (hardware) failure worse. Maybe the server should stop trying to update if encountering many errors. The server admin will get a notice somehow (if they configured it right) so they probably have other things to worry about at this point.

Maybe there's an easy way to rate-limit log statements? But yes, creating a new ticket for that might be good.

Okay, while working on this I read the logback documentation. In particular to see if there is support for rate-limiting-like feature. I found the following options for candidate tickets (improvements to logging).

Use a DuplicateMessageFilter. Not really rate-limiting, but an alternative. It drops duplicate messages after a threshold.
Implement an Appender. This provides control over the events which make it through to the Encoder. By going with this option it would be possible to replace a flood with a message about n-similar messages in past m-seconds.

Neither option would stop the updater from running despite permission problems. So I checked the source and it seems like it would be cumbersome to implement this in the classes that depend on out, status, in. The easiest way would be to just check as needed prior to instantiating them in the updater's Main class. I created some static members which I've placed in LockFile.checkAccessWithPath(String). Maybe not the best place, but the updater loads that class anyway. Let me know if you prefer something else. Otherwise I'll test a patch for the updater and lockfile with better permission handling.

I'll wait for the updater to go through some normalization, and for more improvement to bulk imports before the backup/restore.

Trac:
Username: leeroy

Trac:
Username: leeroy
Status: new to needs_review

Trac:
Username: leeroy

0001-Fail-fast-if-permissions-prevent-an-update.patch

Proposed minor changes to updater.

Trac:
Username: leeroy

This updated patch also enforces directory checks when checking in, status, and out.

Trac:
Username: leeroy

The lock-file is not used anymore.

Closing as obsolete.

Trac:
Reviewer: N/A to N/A
Resolution: N/A to invalid
Status: needs_review to closed
Sponsor: N/A to N/A
Severity: N/A to Normal

closed

Fail-fast if file access permissions for out and status prevent an update

Child items 0

Activity