Opened 13 months ago

Last modified 6 weeks ago

#26673 assigned enhancement

Record download times of smaller file sizes from partial completion times

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Onionperf Version:
Severity: Normal Keywords: user-experience, acute-2019-q1-planned, ex-sponsor-19, ex-sponsor19
Cc: metrics-team, robgjansen, karsten, acute Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In #25774 (and possibly elsewhere) we discussed options to derive download times of different file sizes than 50 KiB, 1 MiB, or 5 MiB from partial completion times.

As of now, OnionPerf (like Torperf) records timestamps whenever it completes deciles of the requested file size. Commit 13 on #25774 contains a sample graph of partial completion times derived from 1 MiB downloads.

If these partial completion timestamps are as accurate as timestamps for completing entire requests, which is something we should find out, let's consider switching OnionPerf's download model: it could download a single file size, 1 MiB or 5 MiB depending on available bandwidth, and record timestamps for 50 KiB, 100 KiB, 200 KiB, 1 MiB, 2 MiB, 5 MiB, and so on.

This isn't urgent. We decided on #25774 that development and deployment of this new feature are out of scope for the current roadmap until Mexico. Doesn't hurt to have this ticket, though, and maybe start discussing whether this even makes sense.

Child Tickets

Change History (9)

comment:1 Changed 7 months ago by gaba

Keywords: user-experience added
Sponsor: Sponsor19

comment:2 Changed 5 months ago by irl

Cc: acute added

Updating owner and CC

comment:3 Changed 5 months ago by irl

Keywords: acute-2019-q1-planned added

comment:4 Changed 3 months ago by acute

Some feedback required;
I've made a pull request which adds support for recording partial completion timestamps, both in json and torperf produced analysis files. The data is recorded the same way as the percentile data, and only in case where there are no errors. An example of the resulting fields in the torperf file (in the case of a 5m download) are: PARTIAL1048576=1555678482.70 PARTIAL51200=1555678479.75 PARTIAL5242880=1555678495.97

For a 1m download, only values PARTIAL1048576 and PARTIAL51200 are recorded, and similarly, for a 50k download, only PARTIAL51200 will be recorded.
We could only record PARTIAL values smaller than the total size of the download (we already record DATAPERC100, which is the same timestamp/value, equivalent to 100% download), however having the field might help process things in an automated fashion further down the pipeline - happy to make changes if this is not the case.

Finally - the names of the fields may be changed to PARTIAL{50k,1m,5m} although I've noticed all the other fields in the tpf file use the value in bytes and not the human readable version. Suggestions welcome.

comment:5 Changed 3 months ago by karsten

Sounds like a fine plan to me! I didn't read the code, but the design looks sane to me.

I have two questions related deployment:

  1. Can we reprocess existing logs to include these partial completion timestamps for past measurements, maybe when we reprocess logs for #29787? (I'm happy to do the actual reprocessing, I'm just asking if you see any problems with that.)
  1. I think the goal is to change the weights for making 50k/1m/5m downloads towards making more 5m downloads and still obtaining 50k/1m results from these new timestamps. Should we change weights in several steps to ensure we're not overloading anything? Like, from 12/2/1 over 6/2/1 and 2/1/1 to 0/0/1?

Thanks!

comment:6 in reply to:  5 Changed 3 months ago by acute

Replying to karsten:

  1. Can we reprocess existing logs to include these partial completion timestamps for past measurements, maybe when we reprocess logs for #29787? (I'm happy to do the actual reprocessing, I'm just asking if you see any problems with that.)

Yes, if we have all the logs reprocessing should be easy, I have some code to do this already for testing the analysis changes - which can be bundled into a script!

  1. I think the goal is to change the weights for making 50k/1m/5m downloads towards making more 5m downloads and still obtaining 50k/1m results from these new timestamps. Should we change weights in several steps to ensure we're not overloading anything? Like, from 12/2/1 over 6/2/1 and 2/1/1 to 0/0/1?

This is the next step - following this week's metrics team meeting, this is ready to do whenever we're ready with the changes downstream.

comment:7 Changed 2 months ago by acute

I've written a reprocessing module for onionperf, and have tested the reprocessing of 3 months' worth of files from op-ab. These files have the PARTIAL fields we discuss here as well as an optional ERRORCODE field where there were errors (as per #29787)

The reprocessed files are here if you want to have a play:

https://erg.abdn.ac.uk/~ana/reprocessing_test/

comment:8 Changed 6 weeks ago by gaba

Keywords: ex-sponsor-19 added

Adding the keyword to mark everything that didn't fit into the time for sponsor 19.

comment:9 Changed 6 weeks ago by gaba

Keywords: ex-sponsor19 added
Sponsor: Sponsor19

Remove sponsor 19 and add a keyword ex-sponsor19 to mark all the tickets that could have been in the scope of the sponsor.

Note: See TracTickets for help on using tickets.