Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#2672 closed defect (fixed)

Fix bugs/issues with consolidate_stats

Reported by: mikeperry Owned by: mikeperry
Priority: High Milestone:
Component: Archived/Torperf Version:
Severity: Keywords: MikePerryIteration20110320
Cc: karsten, tomb@… Actual Points: 5
Parent ID: Points: 2
Reviewer: Sponsor:

Description

Quoting karsten:

I think there's a bug in this script: Whenever we skip a line in a .data file, because that line represents a failure, we might get out of sync with the .extradata file and stop writing any data to .mergedata. You should be able to reproduce this bug with the Torperf 50KB run (https://metrics.torproject.org/data/torperf-50kb.data https://metrics.torproject.org/data/torperf-50kb.extradata). The last line in the result has CIRC_ID=4384. If I delete the line in .extradata starting with CIRC_ID=4397, the result has more entries than before. I think the fix is to distinguish between absolute slack of up to 1 second and a time difference of, say, more than 1 minute.

Also, is there a way to include timed out runs in the .mergedata, too? We do include failures, so by including timeouts, we wouldn't have to parse the original files for timeouts/failures anymore. This could be a new ticket, I'd just like to know whether it's possible.

Child Tickets

Attachments (2)

50kb.mergedata (18.3 KB) - added by karsten 9 years ago.
Sample .mergedata
task-2672-2011-03-15-data.tar.bz2 (587.0 KB) - added by karsten 9 years ago.
Sample data from March 15 mentioned in comment 11

Download all attachments as: .zip

Change History (16)

comment:1 Changed 9 years ago by mikeperry

Cc: karsten added

I'll look into the line-skipping bug. I thought the code should have handled missing lines in either .data or .extradata, but perhaps there is a bug with it.

As for the timeout logging, we need to record the time that this happens in both data files to properly match it. Right now we don't do this at all in .extradata, but I can add that. Is the timestamp of the timeout in the .data files currently?

comment:2 Changed 9 years ago by mikeperry

After some conversation with karsten on IRC, we've decided to try to use the STREAM launch times to correlate streams to circuits to capture torperf timeouts properly, as well as include all lines from .extradata, not just the matched ones.

comment:3 Changed 9 years ago by karsten

Component: MetricsTorperf
Priority: normalmajor
Status: newassigned

Setting priority to major, because other tasks depend on a good Torperf data format that includes CBT details, circuits that didn't have streams, and Torperf attempts that were aborted by the timeout script.

Mike, FYI, my estimation was that this takes 4 points rather than 2.

comment:4 Changed 9 years ago by mikeperry

Ok, I will do some work on it tonight, starting with the extra_stats.py piece where we will actually need to do the STREAM identifier tracking to catch the open times.

I think you're right on the increased points, because now we have to modify extra_stats.py first, and then test consolidate stats on that. But since we started the iteration already, I think we'll just wait and update the Actual Points field.

If you think waiting for a new extra_stats.py to run to gather more data to merge is going to be too long of a delay for your other tasks, we may need to convene on IRC when you wake up to discuss an alternate strategy to the one from my comment #2 up there.

comment:5 Changed 9 years ago by karsten

Owner: changed from mikeperry to karsten
Status: assignedaccepted

Mike, the trivsocks-client.c patch in your ticket2672 branch looks good. Here's an output line of a Torperf run that succeeded:

1299755521 867154
1299755521 867170
1299755521 868593
1299755521 868667
1299755521 869520
1299755523 416614
1299755523 416751
1299755525 161757
1299755525 686460
75 51442 0

And here's the output of a Torperf run that was stopped by the timeout tool 5 seconds after the start:

1299755581 714996
1299755581 715012
1299755581 718483
1299755581 719167
1299755581 721378
0 0
0 0
0 0
1299755586 721767
0 0 1

Can you confirm that this new .data format will be useful for you? If so, I'll update the tools that process .data files and merge your patch.

comment:6 Changed 9 years ago by karsten

Owner: changed from karsten to mikeperry
Status: acceptedassigned

No, no, Trac, there's no reason to assign this ticket to me. But thanks.

comment:7 Changed 9 years ago by mikeperry

Well I need to update consolidate_stats.py using that .data file format and the .extradata file that goes along with to be 100% sure. Do you have these yet?

comment:8 Changed 9 years ago by karsten

Not yet. I was just testing whether trivsocks-client.c breaks apart.

I now configured Torperf to write an .extradata file and restarted it. Will have some data in an hour or two, depending on how much data you want. This Torperf is starting a new run every minute and aborting them after 5 seconds if they didn't succeed.

comment:9 Changed 9 years ago by mikeperry

Ok, I think I fixed this. Now we store unmatched lines from either .data or .extradata, and should also be better about skipping missed lines from either. The result is a file that contains everything from .data and .extradata, even if they do not match.

I also added the didtimeout field to the mergedata.

All of this + the earlier changes should be in mikeperry/ticket2672.

Changed 9 years ago by karsten

Attachment: 50kb.mergedata added

Sample .mergedata

comment:10 Changed 9 years ago by tomb

Cc: tomb@… added

tomb has now started making happy R parsing of .mergedata, I will be iterating on this, and don't mind if the input format changes as long as it is documented.

comment:11 in reply to:  9 ; Changed 9 years ago by karsten

Replying to mikeperry:

Ok, I think I fixed this. Now we store unmatched lines from either .data or .extradata, and should also be better about skipping missed lines from either. The result is a file that contains everything from .data and .extradata, even if they do not match.

Yes, I tried the new consolidate_stats.py script, and it seems to work fine.

I also added the didtimeout field to the mergedata.

Looks good.

All of this + the earlier changes should be in mikeperry/ticket2672.

Some comments on that branch:

  • Can the note "This script will strip all failure information from the output. [...]" in consolidate_stats.py go away now?
  • I noticed that 158 of the 2428 lines in my sample .data file don't have a matching .extradata line with a USED_AT entry, but only 22 of these 158 runs actually timed out. What could be the reason for extra_stats.py not setting USED_AT even though the circuit was actually used for a successful Torperf run? I attached my data to this ticket; see 50kb.data-not-used for the runs for which extra_stats.py finds no matching circuit.
  • You have a comment "Add purpose to the .extradata?" in your code. Should we just add it? Or add a ticket for this?
  • You have another comment "Torperf only uses 3 hop paths.". This seems to be correct, or at least I didn't find a case when we used a circuit with fewer or more than 3 hops in my sample data. But why is your comment prefixed with XXX?

Changed 9 years ago by karsten

Sample data from March 15 mentioned in comment 11

comment:12 in reply to:  11 Changed 9 years ago by mikeperry

Replying to karsten:

Replying to mikeperry:

All of this + the earlier changes should be in mikeperry/ticket2672.

Some comments on that branch:

  • Can the note "This script will strip all failure information from the output. [...]" in consolidate_stats.py go away now?

Yes. I will fix it.

  • I noticed that 158 of the 2428 lines in my sample .data file don't have a matching .extradata line with a USED_AT entry, but only 22 of these 158 runs actually timed out. What could be the reason for extra_stats.py not setting USED_AT even though the circuit was actually used for a successful Torperf run? I attached my data to this ticket; see 50kb.data-not-used for the runs for which extra_stats.py finds no matching circuit.

This is very strange. I will have to dig through the tarball and find out what is going on..

  • You have a comment "Add purpose to the .extradata?" in your code. Should we just add it? Or add a ticket for this?

Actually, I just checked extra_data.py, and it doesn't even bother to record streams whose PURPOSE is not USER. So I don't think this is needed.

  • You have another comment "Torperf only uses 3 hop paths.". This seems to be correct, or at least I didn't find a case when we used a circuit with fewer or more than 3 hops in my sample data. But why is your comment prefixed with XXX?

Hrmm. It is my estimation that we should now have no USED_AT lines with path lengths other than 3, because of the PURPOSE filter in extra_data.py, so I think both of these XXX checks and comments can go away.

comment:13 Changed 9 years ago by karsten

Resolution: fixed
Status: assignedclosed

So, we figured out that the strange cases without a USED_AT entry come from starting a new Torperf run every minute, but not setting MaxCircuitDirtiness to something smaller than 1 minute.

The other changes look good. Merged. Closing.

comment:14 Changed 9 years ago by mikeperry

Actual Points: 5
Note: See TracTickets for help on using tickets.