Opened 2 years ago

Closed 8 months ago

#27181 closed enhancement (wontfix)

Avoid unnecessary disk writes

Reported by: irl Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Right now if you run the updater, every status document will be touched by Onionoo regardless of whether or not the content actually changed. Additionally every file in the out directory will be touched, although I'm less concerned about this as I've seen that the out directory is reproducibly built from the status directory where no changes have occurred. My primary motivation for this ticket is enabling consistency in backups and the creation of test instances where we'd like to copy out the status directory between updater runs, where there is a limited time window.

I propose that we add a field to the Document class that marks a document as having changed since it was loaded from disk. Initially this flag would be clear but if any field in the document is changed then this would trigger a write. There may be conditions where a value is set to the value it already has and we should detect these cases and not set the flag to perform a write.

Alternatively, we can store the serialization of the document along with the parsed fields. When serializing the document we can compare the document with its original serialization to determine if it needs to be written to disk.

We will need to have good test coverage for this as if we miss setting the flag to write data we may end up with gaps in data on the Onionoo hosts.

Child Tickets

Change History (1)

comment:1 Changed 8 months ago by karsten

Resolution: wontfix
Status: newclosed

This ticket is mostly obsolete with #32660 where we compare the digest of an existing file with the digest of the serialized string that we were about to write. We could indeed do more by remembering whether a status object has changed before serializing it and comparing checksums. But it's unclear whether that development and testing effort would be well spent. I'd say as long as I/O remains the critical resource we should keep what we have and just spend that extra CPU time on serializing and computing digests. I'll close this ticket as something we're very unlikely to do, but feel free to reopen it if you object.

Note: See TracTickets for help on using tickets.