Opened 7 years ago

Closed 7 years ago

#5805 closed task (fixed)

Compare anguilla's tarballs to yatei's and maybe merge them

Reported by: karsten Owned by: karsten
Priority: Low Milestone:
Component: Metrics/CollecTor Version:
Severity: Keywords:
Cc: Actual Points: 10
Parent ID: Points: 12
Reviewer: Sponsor:

Description

weasel was running his directory-archive script until a week or two ago. I want to compare anguilla's tarballs to yatei's to figure out if yatei is missing some descriptors and why, and to merge missing descriptors into yatei's tarballs.

This ticket is mostly here to note down the endless hours that I already worked on this task, mostly because I need to write new comparison scripts and investigate differences between single descriptors manually before identifying a pattern. As of now, I spent 9 points on this task, and I'm not done. I think another 3 points remain. The task looked so tiny when I decided to do it, but it's also important enough to spend the remaining points.

Current insights from the comparison, which might turn into new tasks, are:

  • Quite a few of the consensuses collected by yatei have missing or extraneous signatures as compared to anguilla's. This has to do with authorities serving consensuses that don't have all signatures. I don't really care, so I'm probably leaving this alone.
  • Quite often, missing a consensus automatically means missing all votes. We might switch to downloading votes by all known authorities, not only by the ones contained in a consensus (which we're missing in these cases). Not super important, but probably worth doing.
  • We have quite a few files in yatei's tarballs that are empty or truncated. We need to try parsing descriptors with metrics-lib (which is not yet used by metrics-db) and only store valid descriptors to disk.

Child Tickets

Change History (2)

comment:1 Changed 7 years ago by karsten

Here are two more things I found:

  • anguilla knew about some server descriptors and extra-info descriptors that yatei didn't know about, but that rate was below 0.6% of all descriptors in a month, which is acceptable.
  • There was one exception to the previous finding, namely in March 2011. yatei's tarball was missing descriptors from the last 2.5 days in that month. The reason was probably related to automatically making tarballs and not updating it correctly. The files have probably been there in extracted state, so this isn't something metrics-db could check. Nothing we can do about.

This ends the comparison. I made tickets #5812 and #5813 for the two issues I'd like to fix in metrics-db.

Tarballs are recompressing tonight and will be available tomorrow. Will close this ticket once they're available.

comment:2 Changed 7 years ago by karsten

Actual Points: 10
Resolution: fixed
Status: newclosed

Updated tarballs are in place. The remaining 3 points turned out to be just 1 point. Closing.

Note: See TracTickets for help on using tickets.