Write a specification for Tor web server logs
This document should answer the following questions:
- What will the raw input data look like?
- compressed logs
- varying dates in log-lines despite the file being tagged with a single date
- are there only GET log-lines of 200 responses to be expected?
- size could be huge (in future)
- exact input format (if possible to define)
- meta-data is provided in paths and filenames
- ...
- What will sanitized stored (on disk) logs look like?
- cleaned log-lines, define exact format, give examples (as this might deviate from the current python sanitation)
- meta-data is provided in paths and filenames
- should files be reassembled, i.e., only log lines of a given date in a descriptor for that log date?
- should storage (on disk) be in compressed files (opposed to storing other descriptors uncompressed)?
- Should such log be stored (on disk) in reasonably sized chunks (once a GB size is reached)?
- ...
Please add more.