Opened 10 months ago

Closed 7 months ago

#29374 closed defect (fixed)

Analysis files sometimes present negative numbers in the payload_progress field

Reported by: irl Owned by: metrics-team
Priority: Medium Milestone:
Component: Archived/Onionperf Version:
Severity: Normal Keywords: acute-2019-q1-planned
Cc: acute Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description


Child Tickets

Change History (15)

comment:1 Changed 10 months ago by irl

Keywords: acute-2019-q1-planned added

comment:2 Changed 8 months ago by acute

Numbers in the payload progress field are calculated based on a unix_ts_start property of a transfer, which is in turned calculated by subtracting the time elapsed recorded by tgen from unix_ts_end (which holds the timestamp of the latest line in the transfer log). In case of completed transfers this works ok because tgen reports elapsed seconds to completion. In case of errors tgen does not report this, and so the unix_ts_start property is incorrectly calculated (hence the negative numbers).

I propose a patch to record unix_ts_start timestamp directly from the log line corresponding to the tgen socket creation for that transfer. The line looks like this:

2019-04-01 03:15:10 1554084910.250433 [info] [shd-tgen-transport.c:234] [_tgentransport_newHelper] Initiated transport to socks proxy at localhost:127.0.0.1:30231

For completed transfers, this would be overwritten with the calculated value based on the tgen report, which is more acurate (usec). For transfers which do not complete, this would be used to calculate the payload progress field instead.

I have a patch ready on my development machine, but need more tgen logs to test it with.

Any suggestions for improvement to the above mechanism would be appreciated.

Last edited 8 months ago by acute (previous) (diff)

comment:3 Changed 8 months ago by karsten

Some thoughts on this, since I have been looking at tgen logs for a while now:

  • The tgen log line you mention does not contain anything to identify the transfer. Would it be easier to use the next tgen log line which at least contains the source port? 2019-02-11 00:13:00 1549843980.555482 [info] [shd-tgen-transport.c:203] [_tgentransport_changeState] transport TCP,16,localhost:127.0.0.1:43810,localhost:127.0.0.1:41204,op-ab.onionperf.torproject.net:137.50.19.2:80,state=CONNECT,error=NONE moving from state CONNECT to state INIT; and would it still be accurate enough?
  • What's the reason for overwriting this timestamp with the calculated value for complete transfers? This log line has usec granularity, too (1554084910.250433). I think it would be easier to compare partial progress of complete and incomplete transfers if we'd use the same timestamps for calculating them.
  • Do we know of any tools relying on payload_progress numbers being negative for incomplete downloads? I admit that this would be a weird use of the data. Still, maybe worth checking if we're using them anywhere ourselves and probably worth writing something in the next release notes.
  • You'll find more tgen logs for testing at https://op-ab.onionperf.torproject.net/ .

comment:4 in reply to:  3 Changed 8 months ago by acute

Replying to karsten:

Some thoughts on this, since I have been looking at tgen logs for a while now:

  • The tgen log line you mention does not contain anything to identify the transfer. Would it be easier to use the next tgen log line which at least contains the source port? 2019-02-11 00:13:00 1549843980.555482 [info] [shd-tgen-transport.c:203] [_tgentransport_changeState] transport TCP,16,localhost:127.0.0.1:43810,localhost:127.0.0.1:41204,op-ab.onionperf.torproject.net:137.50.19.2:80,state=CONNECT,error=NONE moving from state CONNECT to state INIT; and would it still be accurate enough?

Have had another look at it, the OP parser only looks at lines containing [transfer-status], [transfer-complete] and [transfer-error]. It currently identifies transfers by the time and transfer number only, without including any ports.
The problem is that the transfer number only starts appearing in lines onwards after 'moving from state COMMAND to state RESPONSE', and the parser gets all the other information for a transfer from the tgen process summary report...

Not sure where to go from here...we could change the way we identify reports (I agree with you that we should use both ports + time) to get timestamp information from the previous lines, and we add rules for parsing these? This would mean a major change to the OP parser.

  • What's the reason for overwriting this timestamp with the calculated value for complete transfers? This log line has usec granularity, too (1554084910.250433). I think it would be easier to compare partial progress of complete and incomplete transfers if we'd use the same timestamps for calculating them.

My fear was that if the values reported by the tgen process are the absoulute source of truth, by changing them we'd report inaccurate information. I agree that calculating everything the same way is more consistent - we should go with this after we figure out what to do for the previous step.

  • Do we know of any tools relying on payload_progress numbers being negative for incomplete downloads? I admit that this would be a weird use of the data. Still, maybe worth checking if we're using them anywhere ourselves and probably worth writing something in the next release notes.

Excellent point, will have a look at the visualization module.

Thanks for this!

comment:5 Changed 8 months ago by karsten

Interesting! I didn't know that OnionPerf only looks at [transfer-*] log lines. This is also good to know for #29787, just in case we need information from other log lines to further identify the failure case. And I'd be curious how we match these log lines to Tor controller event logs if we don't know the source port. But this can all be part of #29787.

For this ticket, how about we leave out payload_progress for measurements where are unable to extract that information from log lines we parse. I could see how we would be able to learn that by making major changes the OnionPerf parser. But maybe we can do this in two steps: 1. fix information that is wrong and 2. put in information that is right. And step 2 could easily happen as a separate enhancement ticket to be done in the future. What do you think?

comment:6 in reply to:  5 ; Changed 8 months ago by acute

Replying to karsten:

Interesting! I didn't know that OnionPerf only looks at [transfer-*] log lines. This is also good to know for #29787, just in case we need information from other log lines to further identify the failure case. And I'd be curious how we match these log lines to Tor controller event logs if we don't know the source port. But this can all be part of #29787.

Agreed!

For this ticket, how about we leave out payload_progress for measurements where are unable to extract that information from log lines we parse.
And step 2 could easily happen as a separate enhancement ticket to be done in the future. What do you think?

Do you mean that for now we should change the parser to leave the field out entirely in cases where the transfer does not complete? Also, +1 to discussing the parser refactoring in a new ticket!

comment:7 in reply to:  6 Changed 8 months ago by karsten

Replying to acute:

Do you mean that for now we should change the parser to leave the field out entirely in cases where the transfer does not complete?

If the two alternatives at the moment are to either include a broken field or remove it, I'd suggest to remove it.

comment:8 Changed 8 months ago by acute

Thanks for clarifying, pull request for this change submitted: https://github.com/torproject/onionperf/pull/14
The request also adds tests for the parser, which will hopefully help with future changes.

comment:9 Changed 8 months ago by acute

Note this also means that all the produced torperf (*tpf) files will have all 'DATAPERC{x}' fields equal to 0.0 for transfers that have errored. Any thoughts on this/is this likely to affect things futher down the data processing pipeline?

comment:10 in reply to:  9 ; Changed 8 months ago by karsten

Replying to acute:

Note this also means that all the produced torperf (*tpf) files will have all 'DATAPERC{x}' fields equal to 0.0 for transfers that have errored. Any thoughts on this/is this likely to affect things futher down the data processing pipeline?

Wait, these fields are defined as optional. Setting them to 0.0 would mean setting them to 1970-01-01 00:00:00.000. Can't we just leave these fields out if the corresponding events never happened? Any parser of these files shouldn't rely on these fields to be present. After all, these fields haven't been around right from the start, and parsers were able to handle those files, too. Does this make sense?

comment:11 in reply to:  10 Changed 8 months ago by acute

Replying to karsten:

Replying to acute:

Note this also means that all the produced torperf (*tpf) files will have all 'DATAPERC{x}' fields equal to 0.0 for transfers that have errored. Any thoughts on this/is this likely to affect things futher down the data processing pipeline?

Wait, these fields are defined as optional. Setting them to 0.0 would mean setting them to 1970-01-01 00:00:00.000. Can't we just leave these fields out if the corresponding events never happened?

Ah, I see! Very useful link, thank you. This simplifies things, it's very easy to exclude them, will update the corresponding PR.

comment:12 in reply to:  5 Changed 7 months ago by acute

For this ticket, how about we leave out payload_progress for measurements where are unable to extract that information from log lines we parse. I could see how we would be able to learn that by making major changes the OnionPerf parser. But maybe we can do this in two steps: 1. fix information that is wrong and 2. put in information that is right. And step 2 could easily happen as a separate enhancement ticket to be done in the future.

PR was merged, this ticket can now be closed. I will open an enhancement ticket as described above.

comment:13 Changed 7 months ago by karsten

Great!

Let's leave this ticket open until that other ticket exists, and let's include a reference here when closing this ticket. Thanks!

comment:14 in reply to:  13 Changed 7 months ago by acute

Replying to karsten:

Great!

Let's leave this ticket open until that other ticket exists, and let's include a reference here when closing this ticket. Thanks!

Created ticket #30362.

comment:15 Changed 7 months ago by karsten

Resolution: fixed
Status: newclosed

Thanks for creating that ticket. Closing this one!

Note: See TracTickets for help on using tickets.