Opened 17 months ago

Closed 5 weeks ago

#29365 closed enhancement (implemented)

Add digests and sizes to index.xml

Reported by: irl Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Onionperf Version:
Severity: Normal Keywords: metrics-team-roadmap-2020, metrics-team-roadmap-2020-june
Cc: karsten, acute Actual Points: 0.4
Parent ID: #33321 Points: 0.5
Reviewer: acute Sponsor: Sponsor59-must

Description

To allow CollecTor to know if it's downloaded the file successfully, add a filesize in bytes and a base64-encoded SHA-256 digest to index.xml. For some level of future proofing, we could also add a base64-encoded SHA3-256 digest.

(Copied from https://github.com/robgjansen/onionperf/issues/43)

Child Tickets

Attachments (1)

0001-Adds-file-SHA-256-digests-to-index.xml.patch (1.3 KB) - added by acute 6 weeks ago.

Download all attachments as: .zip

Change History (16)

comment:1 Changed 17 months ago by irl

Cc: acute added
Owner: changed from hiro to metrics-team
Status: newassigned

Updating owner and CC

comment:2 Changed 14 months ago by gaba

Sponsor: Sponsor13

comment:3 Changed 11 months ago by irl

Resolution: invalid
Status: assignedclosed

comment:4 Changed 5 months ago by acute

Resolution: invalid
Status: closedreopened

Moving all gitlab OP tickets back to Trac.

comment:5 Changed 2 months ago by gaba

Keywords: metrics-team-roadmap-2020 added
Points: 0.5
Sponsor: Sponsor59

comment:6 Changed 2 months ago by karsten

Sponsor: Sponsor59Sponsor59-must

Moving to Sponsor59-must, because we should really do these in order to call Sponsor59 done.

comment:7 Changed 8 weeks ago by karsten

Parent ID: #33321

comment:8 Changed 6 weeks ago by karsten

Owner: changed from metrics-team to karsten
Status: reopenedaccepted

I'll give this a try.

comment:9 Changed 6 weeks ago by karsten

Status: acceptedneeds_review

Even though CollecTor is currently not using this file, we should add more information about files being served than just the filename.

However, I think that computing file digests is too much, because we'd either have to recompute these digests for all files once per day or store them somewhere. If we really need these digests on the consuming side, then let's consider adding them. But until that's the case, let's try something simpler.

I wrote a patch that includes sizes and last-modified times along with filenames. These can be looked up really quickly while writing the XML file. While touching this code I also made sure that the index.xml file itself is not contained and I fixed the encoding issue introduced by the Python 3 upgrade.

Please review commit 011359d in my task-29365 branch.

comment:10 Changed 6 weeks ago by karsten

Actual Points: 0.2

comment:11 Changed 6 weeks ago by gaba

Keywords: metrics-team-roadmap-2020-june added

Adding all this tickets to the OnionPerf roadmap for June.

comment:12 Changed 6 weeks ago by karsten

Reviewer: acute

acute kindly said that she'd be happy to review this ticket.

comment:13 in reply to:  9 Changed 6 weeks ago by acute

Replying to karsten:

Even though CollecTor is currently not using this file, we should add more information about files being served than just the filename.

However, I think that computing file digests is too much, because we'd either have to recompute these digests for all files once per day or store them somewhere. If we really need these digests on the consuming side, then let's consider adding them. But until that's the case, let's try something simpler.

I wrote a patch that includes sizes and last-modified times along with filenames. These can be looked up really quickly while writing the XML file. While touching this code I also made sure that the index.xml file itself is not contained and I fixed the encoding issue introduced by the Python 3 upgrade.

Please review commit 011359d in my task-29365 branch.

Thank you for the patch! I have read and tested it - no issues there and it can be merged.

Also, I did a quick check - it takes ~1 second to compute all sha256 digests for one years' worth of OP data downloaded from Collector on my old Lenovo x220 - I don't think recomputing these is an issue on processors nowadays.

Adding sha256 hashes would be a small change (have attached a patch), up to you if you want to go ahead with it :)

comment:14 Changed 6 weeks ago by acute

Actual Points: 0.20.3
Status: needs_reviewneeds_revision

comment:15 Changed 5 weeks ago by karsten

Actual Points: 0.30.4
Resolution: implemented
Status: needs_revisionclosed

Okay, you're right that the overhead to compute these digests is really small. I had the CollecTor use case in mind where we include SHA-256 digests of large descriptor tarballs. But it's true, the files produced by OnionPerf are tiny in comparison.

I tweaked your patch a bit by making the field more similar to the one produced by CollecTor, that is, by calling it sha256 and encoding the digest using base64. Example of the produced file:

<?xml version='1.0' encoding='ASCII'?>
<files>
  <file name="2020-04-28.onionperf.analysis.json.xz" size="398192" last_modified="2020-04-29 00:00:42" sha256="O6vcH/1wkAdRYZmYTFK/XUaPgwZtkPuQL+zKBkf+7Xk="/>
  <file name="2020-04-29.onionperf.analysis.json.xz" size="559528" last_modified="2020-04-30 00:00:55" sha256="nDuLiOCHcZ6S5YkUI5gw5xydavptcfV5nQXBbabGrgM="/>
...

Rebased and pushed to master, and closing as implemented. Thanks!

Note: See TracTickets for help on using tickets.