Opened 4 years ago

Closed 3 years ago

#16401 closed enhancement (invalid)

Fail-fast if file access permissions for out and status prevent an update

Reported by: leeroy Owned by:
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Regular backups of the out and status folders for an Onionoo server is encouraged. When running as a separate user it wouldn't be uncommon to have the archives owned by another user. Onionoo's updater doesn't check the access permissions for out and status until after rebuilding the folder in, and it basically fills the logs with entries for every file it failed to access. So if you didn't recursively set access permissions that's a lot of log entries before the updater finally stops.

If the Onionoo updater were to check the permissions early and fail-fast it would prevent this behavior.

Child Tickets

TicketTypeStatusOwnerSummary
#16296defectclosedImplement lock file in a more robust way

Attachments (1)

0001-Fail-fast-if-permissions-prevent-an-update.patch (3.6 KB) - added by leeroy 4 years ago.
Proposed minor changes to updater.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 4 years ago by karsten

Yes, this sounds useful. I could imagine checking whether the out/ directory exists, and if not, attempt to create it before this line. Doing the same for the status/ directory would be slightly more complicated, but having such a check for out/ should already catch most cases. Want to submit a patch?

By the way, I'm curious how you create regular backups. What I do is stop the cronjob, copy directories, and re-enable the cronjob. Maybe you're doing something smarter there?

comment:2 Changed 4 years ago by leeroy

Sure I'll look into it. This happened when I tried to transfer a backup to a new instance of Onionoo running branch-task-13600. I just forgot to recursively change owner on out and status and found the server flooding the logs and rotating over dozens of log files.

I pretty much do the same thing for backups. Because I've been testing multiple server versions I usually try to delay importing the latest recent data. Wait until I'm between updates, tar.bz2 the folders, then transfer them elsewhere. I could probably make my life easier by doing all this as the onionoo user and using a location which they can access.

On a related issue: By my estimation, each month adds ~460MB and ~64K uncompressed files (in the past 1.5 years). So a fully deployed server can be expected to have to backup ~40GB. A lot of small files too, so lots of IO. Should Onionoo include functionality for backup/restore? This might work better with an hourly updater + archives, and could be done using java.util.zip or apache.commons.compress.

On another related issue: Should I make a ticket for having the server stop trying to process the update if a certain number of errors occur? If the server keeps trying to update this might make a (hardware) failure worse. Maybe the server should stop trying to update if encountering many errors. The server admin will get a notice somehow (if they configured it right) so they probably have other things to worry about at this point.

comment:3 in reply to:  2 Changed 4 years ago by karsten

Replying to leeroy:

Sure I'll look into it. This happened when I tried to transfer a backup to a new instance of Onionoo running branch-task-13600. I just forgot to recursively change owner on out and status and found the server flooding the logs and rotating over dozens of log files.

I pretty much do the same thing for backups. Because I've been testing multiple server versions I usually try to delay importing the latest recent data. Wait until I'm between updates, tar.bz2 the folders, then transfer them elsewhere. I could probably make my life easier by doing all this as the onionoo user and using a location which they can access.

Sounds like a special case, but if it's easy to avoid flooding the logs by making sure that out/ either does not exist or is a directory, let's do it.

On a related issue: By my estimation, each month adds ~460MB and ~64K uncompressed files (in the past 1.5 years). So a fully deployed server can be expected to have to backup ~40GB. A lot of small files too, so lots of IO. Should Onionoo include functionality for backup/restore? This might work better with an hourly updater + archives, and could be done using java.util.zip or apache.commons.compress.

I'm yet unsure how Onionoo would include functionality for backup/restore. Want to create a new ticket for that and explain in more detail?

On another related issue: Should I make a ticket for having the server stop trying to process the update if a certain number of errors occur? If the server keeps trying to update this might make a (hardware) failure worse. Maybe the server should stop trying to update if encountering many errors. The server admin will get a notice somehow (if they configured it right) so they probably have other things to worry about at this point.

Maybe there's an easy way to rate-limit log statements? But yes, creating a new ticket for that might be good.

comment:4 Changed 4 years ago by leeroy

Okay, while working on this I read the logback documentation. In particular to see if there is support for rate-limiting-like feature. I found the following options for candidate tickets (improvements to logging).

  1. Use a DuplicateMessageFilter. Not really rate-limiting, but an alternative. It drops duplicate messages after a threshold.
  1. Implement an Appender. This provides control over the events which make it through to the Encoder. By going with this option it would be possible to replace a flood with a message about n-similar messages in past m-seconds.

Neither option would stop the updater from running despite permission problems. So I checked the source and it seems like it would be cumbersome to implement this in the classes that depend on out, status, in. The easiest way would be to just check as needed prior to instantiating them in the updater's Main class. I created some static members which I've placed in LockFile.checkAccessWithPath(String). Maybe not the best place, but the updater loads that class anyway. Let me know if you prefer something else. Otherwise I'll test a patch for the updater and lockfile with better permission handling.

I'll wait for the updater to go through some normalization, and for more improvement to bulk imports before the backup/restore.

comment:5 Changed 4 years ago by leeroy

Status: newneeds_review

Changed 4 years ago by leeroy

Proposed minor changes to updater.

comment:6 Changed 4 years ago by leeroy

This updated patch also enforces directory checks when checking in, status, and out.

comment:7 Changed 3 years ago by iwakeh

Resolution: invalid
Severity: Normal
Status: needs_reviewclosed

The lock-file is not used anymore.

Closing as obsolete.

Note: See TracTickets for help on using tickets.