Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#2618 closed defect (fixed)

Consolidate-stats needs update

Reported by: mikeperry Owned by: mikeperry
Priority: High Milestone:
Component: Metrics/CollecTor Version:
Severity: Keywords: TorPerfIterationFires20110305 TorPerfIteration20110305 MikePerryIterationFires20110305
Cc: karsten Actual Points: 3
Parent ID: Points: 3
Reviewer: Sponsor:

Description

With the fixes to #2551 and #2590, we need to update consolidate-stats. We should get the code for #2551 and #2590 running for a while first though, so we can be sure we've built a robust enough consolidate-stats.

Child Tickets

Change History (16)

comment:1 Changed 8 years ago by mikeperry

Keywords: TorPerfIterationFires20110305 added

comment:2 Changed 8 years ago by karsten

Keywords: TorPerfIteration20110305 added

Adding TorPerfIteration20110305 to the keywords, so that this ticket shows up in the query. Please undo if that's stupid.

comment:3 Changed 8 years ago by mikeperry

Keywords: MikePerryIterationFires20110305 added

comment:4 Changed 8 years ago by mikeperry

We should also update the awk scriptlet at the top analyze_guards.py to handle the new format, too.

comment:5 Changed 8 years ago by mikeperry

Points: 23

comment:6 Changed 8 years ago by mikeperry

Cc: karsten added

Karsten: What sort of data format do we want here? I kind of like how extradata is now extensible because it has keyword=value pairs. Should I just merge the .data file in with its own keywords, corresponding to the column names from measurements-HOWTO?

comment:7 Changed 8 years ago by mikeperry

Also I just realized a typo bug in extra_data.py. The one line diff is in mikeperry/fixfields

comment:8 in reply to:  6 Changed 8 years ago by karsten

Replying to mikeperry:

Karsten: What sort of data format do we want here? I kind of like how extradata is now extensible because it has keyword=value pairs. Should I just merge the .data file in with its own keywords, corresponding to the column names from measurements-HOWTO?

Yes, that's a good idea. How would the example line in the HOWTO look like in that case?

Reading these files in R will be slightly more difficult, because we'll have to write our own parser function (instead of using read.csv()). But I think readLines() and some string parsing functions should work just fine.

comment:9 Changed 8 years ago by tomb

I am happy to work on the format and parsing issues.  It makes sense for me to do this because this overlaps heavily with what I was just working on in #2563

I have a low level of availability this weekend since Benessa is coming to town, but I will be able to at least get started before Monday.  If anybody objects to me grabbing the R side of this, please let me know.

comment:10 Changed 8 years ago by mikeperry

tomb: Ok I will try to get consolidate-stats updated for you today, then. The R stuff should probably be a new ticket.

comment:11 Changed 8 years ago by mikeperry

Actual Points: 3
Status: newneeds_review

karsten and tomb: Ok, this is all done and test in mikeperry/ticket2618. That branch also contains my extra_stats.py fix from mikeperry/fixfields, so you only need to do one merge now.

Note that the script behavior has changed. It no longer requires a manual sort step, and it also only outputs matched successful fetches. The script header informs people to keep the original .data and .extradata for failure statistics analysis.

comment:12 in reply to:  11 Changed 8 years ago by tomb

Replying to mikeperry:

karsten and tomb: Ok, this is all done and test in mikeperry/ticket2618. That branch also contains my extra_stats.py fix from mikeperry/fixfields, so you only need to do one merge now.

Excellent. Should I open a new ticket for the R/parsing part of this task, or should I work on it here in this ticket?

comment:13 in reply to:  11 Changed 8 years ago by karsten

Replying to mikeperry:

karsten and tomb: Ok, this is all done and test in mikeperry/ticket2618. That branch also contains my extra_stats.py fix from mikeperry/fixfields, so you only need to do one merge now.

Note that the script behavior has changed. It no longer requires a manual sort step, and it also only outputs matched successful fetches. The script header informs people to keep the original .data and .extradata for failure statistics analysis.

When you say sorting isn't necessary, what do you mean by that? I tried the script with modified .data and .extradata files that had non-sorted entries. The script failed for me. Or do you mean that the files are already sorted by Torperf?

Also, I think there's a bug in this script: Whenever we skip a line in a .data file, because that line represents a failure, we might get out of sync with the .extradata file and stop writing any data to .mergedata. You should be able to reproduce this bug with the Torperf 50KB run (.data, .extradata). The last line in the result has CIRC_ID=4384. If I delete the line in .extradata starting with CIRC_ID=4397, the result has more entries than before. I think the fix is to distinguish between absolute slack of up to 1 second and a time difference of, say, more than 1 minute.

comment:14 Changed 8 years ago by karsten

See ticket2618 in my public repository for an update of the measurements-HOWTO.

Also, is there a way to include timed out runs in the .mergedata, too? We do include failures, so by including timeouts, we wouldn't have to parse the original files for timeouts/failures anymore. This could be a new ticket, I'd just like to know whether it's possible.

comment:15 Changed 8 years ago by mikeperry

Resolution: fixed
Status: needs_reviewclosed

Karsten: Yes, I meant that no additional sorting step should be required. Under what conditions might the torperf output data not be sorted? If we're talking about combining multiple data files, shouldn't consolidating them first be a workable option? Do they need to be sorted in the output format too?

I replied to the other issues in #2672.

comment:16 in reply to:  15 Changed 8 years ago by karsten

Replying to mikeperry:

Karsten: Yes, I meant that no additional sorting step should be required. Under what conditions might the torperf output data not be sorted? If we're talking about combining multiple data files, shouldn't consolidating them first be a workable option? Do they need to be sorted in the output format too?

Sorry for the confusion. I thought you meant in your comment that your script does the sorting now, and I wasn't sure if you're aware that it does not do that. Ignore my question, everything works fine!

I replied to the other issues in #2672.

OK.

Note: See TracTickets for help on using tickets.