Opened 8 years ago

Closed 7 years ago

#4687 closed enhancement (implemented)

Provide metrics data from the last 3--7 days via rsync

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Keywords:
Cc: phobos Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

metrics-db provides compressed tarballs of its collected and sanitized files via rsync, so that others can copy them for their analyses or services. But these tarballs are only updated once per day, and updating them more often has performance implications on the metrics-db host. Also, having others update their services from potentially very large monthly tarballs has performance issues, too.

Can we make the collected and sanitized descriptors from the last, say, 3 to 7 days available via rsync? For a point of reference, in the first 8.5 days of December, metrics-db collected and sanitized 178,000 files being 2.4G in size. Are those numbers totally crazy for rsync?

We could implement this by having a script copy new descriptors to a directory that is then made available via rsync. The same script would also delete files that are older than 3--7 days.

This ticket is part of decoupling metrics-db from metrics-web and other metrics services. Right now, metrics-web uses the metrics-db output using a symbolic link the file system. That means metrics-db and metrics-web rely on running on the same host. We should change that to allow others to run their own metrics-web or or similar service.

Child Tickets

Change History (3)

comment:1 Changed 8 years ago by weasel

I'm not really sure what you want me to do or say.

"Yes, we could".

If you make a script to create and maintain that directory and tell us which dir it is, we can easily export it with rsync.

comment:2 Changed 7 years ago by karsten

Owner: changed from weasel to karsten
Status: newassigned

Yes, that's the kind of answer I was hoping for. I was just not sure if providing 178,000 files via rsync is so crazy that I shouldn't spend a minute on writing such a script. Will write the script for rsync'ing 3 days of data and let you know. Thanks!

Reassigning the ticket to myself.

comment:3 Changed 7 years ago by karsten

Resolution: implemented
Status: assignedclosed

The last three days of data are now available via rsync. See "rsync metrics.torproject.org::" Thanks, weasel!

Note: See TracTickets for help on using tickets.