To allow CollecTor to know if it's downloaded the file successfully, add a filesize in bytes and a base64-encoded SHA-256 digest to index.xml. For some level of future proofing, we could also add a base64-encoded SHA3-256 digest.
Even though CollecTor is currently not using this file, we should add more information about files being served than just the filename.
However, I think that computing file digests is too much, because we'd either have to recompute these digests for all files once per day or store them somewhere. If we really need these digests on the consuming side, then let's consider adding them. But until that's the case, let's try something simpler.
I wrote a patch that includes sizes and last-modified times along with filenames. These can be looked up really quickly while writing the XML file. While touching this code I also made sure that the index.xml file itself is not contained and I fixed the encoding issue introduced by the Python 3 upgrade.
Even though CollecTor is currently not using this file, we should add more information about files being served than just the filename.
However, I think that computing file digests is too much, because we'd either have to recompute these digests for all files once per day or store them somewhere. If we really need these digests on the consuming side, then let's consider adding them. But until that's the case, let's try something simpler.
I wrote a patch that includes sizes and last-modified times along with filenames. These can be looked up really quickly while writing the XML file. While touching this code I also made sure that the index.xml file itself is not contained and I fixed the encoding issue introduced by the Python 3 upgrade.
Thank you for the patch! I have read and tested it - no issues there and it can be merged.
Also, I did a quick check - it takes ~1 second to compute all sha256 digests for one years' worth of OP data downloaded from Collector on my old Lenovo x220 - I don't think recomputing these is an issue on processors nowadays.
Adding sha256 hashes would be a small change (have attached a patch), up to you if you want to go ahead with it :)
Okay, you're right that the overhead to compute these digests is really small. I had the CollecTor use case in mind where we include SHA-256 digests of large descriptor tarballs. But it's true, the files produced by OnionPerf are tiny in comparison.
I tweaked your patch a bit by making the field more similar to the one produced by CollecTor, that is, by calling it sha256 and encoding the digest using base64. Example of the produced file: