Please find the attached PDF or alternatively commit 512c5f0 in my task-25625 branch with a possible start. The idea is that the most important parts of this protocol are the file names of files in recent/ and archive/ as well as paths contained in file names.
If this looks reasonable, I'll include similar paragraphs for all descriptor types, and then we can see if something else remains from the protocol that we need to include, too.
with YYYY-MM again being year and month of the descriptor publication time, X and Y being the first and second character of the hex-encoded, lower-case SHA-1 descriptor digest, and DIGEST being that descriptor digest in full.
Y is year but is also part of the fingerprint. Can we use YYYY-MM/X/Z/DIGEST (or some other letters) instead?
I feel this chunk may get repeated a lot. Let's pick some letters we like and add this as a paragraph under the top "Data Formats" heading. We also need to explain what HH, MM (which means both month and minutes at the moment) and SS mean. This could even be a table, e.g.:
Hmm, is this something that you can write a patch for, and where I can then go ahead and fill in all the details? It's something that I keep postponing, because I'm not yet sure how to best present things in order for others to make sense of. Of course, if you're already too busy with other things, let me know, and I'll give it another try.
I'd forgotten we were doing this and I've been referencing section numbers for the original specification in both the modern CollecTor technical report and in the documentation for the prototype. I had also added the protocol to https://spec.torproject.org/collector-protocol.
I think for now I'd prefer to set this to wontfix, and instead start thinking about what version 2 of this will look like. There are some backwards-incompatible changes that I think we need to make. If we're doing that anyway, there are more backwards-incompatible changes that I think we really should make.