Opened 3 years ago

Closed 3 years ago

#25525 closed defect (fixed)

Fix either spec or code regarding full path of sanitized webstats files

Reported by: karsten Owned by: metrics-team
Priority: High Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


This issue came up when discussing webstats tarballs that I created the other day: what file structure should these tarballs have, internally.

Turns out we already specified this file structure in Section 5.4 of the Protocol of CollecTor's File Structure:

"'webstats' contains compressed log files structured and named according to the 'Tor web server logs' specification, section 4.3 [0]."

And Section 4.3 of the referenced specification says:

"Sanitized log files may additionally be sorted into directories by virtual host and date as in:

So, I'd say this is sufficiently specified.

However, the current structure of CollecTor's out/ directory is different, as implemented here:

    this.storagePath = Paths.get(
        this.desc.getLogDate().format(yearPattern), // year
        this.desc.getLogDate().format(monthPattern), // month
        this.desc.getLogDate().format(dayPattern), // day

Note the day part which does not exist in the specification.

So, we'll either have to fix the specification or the code. I don't feel strongly which one we change. But let's make a decision really soon, before I start reprocessing archives due to #25522. Therefore setting priority to High.

Child Tickets

Change History (2)

comment:1 Changed 3 years ago by iwakeh

I'd vote for adapting the PROTOCOL spec. There is usually only one log per day, but this is similar to other data types offered.

Last edited 3 years ago by iwakeh (previous) (diff)

comment:2 Changed 3 years ago by karsten

Resolution: fixed
Status: newclosed

Sounds good. Fixed the spec. Closing. Thanks!

Note: See TracTickets for help on using tickets.