Opened 17 months ago

Closed 4 weeks ago

#29369 closed defect (fixed)

Fix message logging and filtering

Reported by: irl Owned by: phw
Priority: Medium Milestone:
Component: Metrics/Onionperf Version:
Severity: Normal Keywords: metrics-team-roadmap-2020, metrics-team-roadmap-2020-june
Cc: karsten, acute, metrics-team Actual Points: 1.0
Parent ID: #33321 Points: 1.0
Reviewer: phw Sponsor: Sponsor59-must

Description

When running in measurement mode, new analysis results are produced every night at midnight utc. However, in cases when OnionPerf was not running at midnight and restarts on a later day, log messages from both partial days appear in the same log file. When the analysis is run at the midnight following the day it was restarted, downloads from the day it was first stopped may appear in the analysis results (the .tpf files).

We should fix this.

allow onionperf analysis mode to filter out log messages that do not match a certain date; onionperf measurement mode should take advantage of that feature
When exporting Torperf .tpf files, and a date filter is applied, we need to ensure that we do not include downloads that started on a different day (to handle the case where it ended on the filtered date but started on the previous day)
onionperf should name the log files according to the date of the log message timestamps, and each log message should only be appended to the file corresponding to that date

(Copied from https://github.com/robgjansen/onionperf/issues/31)

Child Tickets

Change History (18)

comment:1 Changed 17 months ago by irl

Cc: acute added
Owner: changed from hiro to metrics-team
Status: newassigned

Updating owner and CC

comment:2 Changed 11 months ago by irl

Resolution: invalid
Status: assignedclosed

comment:3 Changed 5 months ago by acute

Resolution: invalid
Status: closedreopened

Moving all gitlab OP tickets back to Trac.

comment:4 Changed 2 months ago by gaba

Keywords: metrics-team-roadmap-2020 added
Points: 1
Sponsor: Sponsor59

comment:5 Changed 2 months ago by karsten

See #30238 for a (near) duplicate. What happened there is that op-ab did not rotate its log files on 2019-01-13 00:00:00 and also did not produce an analysis file for 2019-01-12. This is a bit unfortunate, because there have been quite a few measurements on that day. We'll have to think about that case when we work on this ticket to avoid it in the future. (No need to recreate the analysis file for 2019-01-12, that's long ago.)

comment:6 Changed 2 months ago by karsten

Sponsor: Sponsor59Sponsor59-must

Moving to Sponsor59-must, because we should really do these in order to call Sponsor59 done.

comment:7 Changed 7 weeks ago by karsten

Owner: changed from metrics-team to karsten
Status: reopenedaccepted

I started looking into this.

comment:8 Changed 7 weeks ago by karsten

Actual Points: 0.4
Owner: changed from karsten to metrics-team
Points: 11.0
Status: acceptedassigned

After having looked at past logs and thinking about possible fixes I came up with a rather trivial fix: let's give up on using a strict date filter when analyzing logs and simply include all transfers and control events found in the parsed log files.

Example: op-ab's log files from 2019-01-13 contain measurements from 2019-01-12 and -13, because log rotation didn't happen at 2019-01-12 23:59:59. With the current code, the analysis file produced at 2019-01-13 23:59:59 is called 2019-01-13.onionperf.analysis.json.xz and contains measurements from 2019-01-13 only. After the suggested change the file would still have that file name but contain all measurements from 2019-01-12 and -13 as found in the log files.

(A possible alternative that I had been thinking about before was that the analysis of a single pair of log files could produce more than one analysis files depending on how many dates are covered in the log files. This can get nasty, though. We'd have to consider all sorts of edge cases where two log files contain measurements for the same date. But really, we don't have to make sure that measurements go into analysis files with a specific date. Let's just not do this anymore.)

I didn't work on the implementation yet and would like to leave this to others, also as a way to sanity check the concept. If I were to implement this I'd probably avoid using the date_filter parameter when doing the nightly analysis and introduce some sort of date_prefix parameter. We should just use that date_filter by default for all analysis files, and we might consider deprecating the date_prefix parameter in the user interface. When testing this, we'll have to ensure that it works for the nightly case as well as the reprocessing case.

Returning to metrics-team.

comment:9 Changed 7 weeks ago by karsten

Parent ID: #33321

comment:10 Changed 6 weeks ago by phw

Cc: metrics-team added
Owner: changed from metrics-team to phw

comment:11 in reply to:  8 Changed 6 weeks ago by phw

Replying to karsten:

I didn't work on the implementation yet and would like to leave this to others, also as a way to sanity check the concept. If I were to implement this I'd probably avoid using the date_filter parameter when doing the nightly analysis and introduce some sort of date_prefix parameter. We should just use that date_filter by default for all analysis files, and we might consider deprecating the date_prefix parameter in the user interface. When testing this, we'll have to ensure that it works for the nightly case as well as the reprocessing case.


I'm afraid I don't follow. Don't we accomplish our goal by simply omitting the date filter when analysing the results? I don't understand what the date prefix should do here.

comment:12 Changed 6 weeks ago by karsten

Without the date prefix the output file written at midnight will overwrite the output file from one day before. Do you prefer if I write this patch early next week and you review it?

comment:13 in reply to:  12 Changed 6 weeks ago by phw

Replying to karsten:

Without the date prefix the output file written at midnight will overwrite the output file from one day before. Do you prefer if I write this patch early next week and you review it?


I think I now understand. Is this (untested) code what you had in mind?
https://github.com/NullHypothesis/onionperf/compare/defect/29369

comment:14 Changed 6 weeks ago by karsten

Reviewer: phw
Status: assignedneeds_review

Not quite, but it's going in the direction I had in mind. Please take a look at squash commit 251f457 in my task-29369 branch.

comment:15 Changed 5 weeks ago by karsten

Actual Points: 0.40.5

comment:16 Changed 5 weeks ago by gaba

Keywords: metrics-team-roadmap-2020-june added

Adding all this tickets to the OnionPerf roadmap for June.

comment:17 Changed 5 weeks ago by phw

Actual Points: 0.51
Status: needs_reviewmerge_ready

Two comments:

  • YYYY-MM-YY should probably be YYYY-MM-DD.
  • The string "onionperf.analysis.json.xz" shows up thrice in the save function. We could use a variable instead, e.g., base_filename = "onionperf.analysis.json.xz".

Feel free to address these comments as you see fit. I don't think this patch requires another review round, so I'm setting it to merge ready and adding my 0.5 points.

comment:18 Changed 4 weeks ago by karsten

Actual Points: 11.0
Resolution: fixed
Status: merge_readyclosed

Thanks for those suggestions, I included both of them. Rebased, squashed, pushed to master, and now closing as fixed. Thanks again!

Note: See TracTickets for help on using tickets.