Opened 2 years ago

Last modified 15 months ago

#21219 assigned enhancement

Remove old descriptor files from out/ after archiving

Reported by: tom Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: metrics-2018
Cc: Actual Points:
Parent ID: #20518 Points:
Reviewer: Sponsor:

Description

Unless I'm mistaken (or misconfigured) -- which is entirely possible -- collector will accumulate uncompressed data in out/ indefinitely, long after it's been archived in archive/ and will no longer be modified.

This takes up a lot of disk space and it'd be nice to

a) get confirmation I can remove data from out/ than is older than N months (2? 3?)
b) have it deleted automagically (or at least with a config setting)

Child Tickets

Change History (6)

comment:1 Changed 2 years ago by karsten

You're right. It's safe to delete descriptors that are at least N = 2 months old. In fact, on the main CollecTor instance I usually delete descriptors that are at least N = 1.5 months old. But I do this manually, because of https://xkcd.com/1205/ and https://xkcd.com/1319/, and more importantly because I'm afraid of messing up with scripts and accidentally deleting all descriptors.

I admit that these reasons don't fully apply anymore with three CollecTor instances running. Would you submit a patch to delete descriptors older than N = 2 months, possibly even with a configurable N?

https://gitweb.torproject.org/collector.git/tree/src/main/resources/create-tarballs.sh

And if you have ideas for making that script even less copy-and-pasty, please feel free to tweak it!

comment:2 in reply to:  1 Changed 2 years ago by iwakeh

Owner: changed from metrics-team to iwakeh
Parent ID: #20546
Status: newaccepted

Replying to karsten:

You're right. It's safe to delete descriptors that are at least N = 2 months old. In fact, on the main CollecTor instance I usually delete descriptors that are at least N = 1.5 months old. But I do this manually, because of https://xkcd.com/1205/ and https://xkcd.com/1319/, and more importantly because I'm afraid of messing up with scripts and accidentally deleting all descriptors.

I think you accidentally pasted the wrong link :-)
The description is here.

I admit that these reasons don't fully apply anymore with three CollecTor instances running. Would you submit a patch to delete descriptors older than N = 2 months, possibly even with a configurable N?

https://gitweb.torproject.org/collector.git/tree/src/main/resources/create-tarballs.sh

And if you have ideas for making that script even less copy-and-pasty, please feel free to tweak it!

One of the planned improvements is to integrate all scripted maintenance into java and get rid of scripting.
We're planning and working on these steps already, see #20518, #20546. So, please check the back first to avoid duplicate work. Setting parent to #20546.

comment:3 Changed 2 years ago by iwakeh

Parent ID: #20546#20518

#20518 is the correct parent number. (copy and paste is hard sometimes ;-)

comment:4 Changed 19 months ago by karsten

Summary: collector should rm data from out/ after archivingRemove old descriptor files from out/ after archiving

Tweak summary.

comment:5 Changed 19 months ago by karsten

Keywords: metrics-2018 added

comment:6 Changed 15 months ago by iwakeh

Owner: changed from iwakeh to metrics-team
Status: acceptedassigned

Move to metrics-team as these are not worked on by me during the next week.

Note: See TracTickets for help on using tickets.