Opened 4 years ago

Closed 4 years ago

#19433 closed defect (fixed)

Consider repackaging tarballs generated by old directory archive script

Reported by: karsten Owned by:
Priority: Low Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords:
Cc: tomlurge Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

tomlurge/thms/oma recently noticed that server-descriptors-2007-09.tar.xz also contains a few server descriptors published in August and in October.

My current guess is that these descriptors made it into this tarball, because those tarballs were generated using the old directory archive script, not by CollecTor. I didn't read the script code now, but I could imagine that it used download time for sorting descriptors into month folders rather than contained publication timestamp.

This would mean that there's no bug in CollecTor, but that we should consider repackaging those older tarballs to make them less confusing. Not really urgent nor super important, but something to consider.

I'll post an update once I have processed the remaining tarballs, or a large enough sample.

Child Tickets

Change History (2)

comment:1 Changed 4 years ago by karsten

There, I processed most tarballs and found that the following tarballs had descriptors published in different months than what the tarball name implies:

  • extra-infos-2010-04.tar.xz and earlier months,
  • server-descriptors-2010-04.tar.xz and earlier months, and
  • statuses-2009-02.tar.xz and some but not all earlier months.

Note that I excluded votes-2012-02.tar.xz and later months, because those tarballs are huge and all previous months were okay, so processing them all seemed like a waste of CPU time.

comment:2 in reply to:  1 Changed 4 years ago by karsten

Resolution: fixed
Status: newclosed

Replying to karsten:

There, I processed most tarballs and found that the following tarballs had descriptors published in different months than what the tarball name implies:

  • extra-infos-2010-04.tar.xz and earlier months,

These are now repackaged and available here.

  • server-descriptors-2010-04.tar.xz and earlier months, and

Repackaged tarballs are available here.

  • statuses-2009-02.tar.xz and some but not all earlier months.

All statuses-*.tar.xz tarballs, including later months, are repackaged and available here.

Note that I excluded votes-2012-02.tar.xz and later months, because those tarballs are huge and all previous months were okay, so processing them all seemed like a waste of CPU time.

Let me know if there are any remaining (or new) issues with these tarballs or with others. Closing.

Note: See TracTickets for help on using tickets.