Opened 8 years ago

Closed 8 years ago

#2563 closed enhancement (implemented)

Add R code for processing Torperf data to the Torperf repository

Reported by: karsten Owned by: tomb
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Keywords: TorPerfIteration20110305
Cc: Actual Points: 3
Parent ID: Points: 3
Reviewer: Sponsor:


Torperf writes its output to text files that are quite hard to comprehend and visualize. The completion time needs to be calculated from columns 1, 2, 17, and 18. Also, we only recognize a timeout from the 17th column being 0 and a failure from the 20th column being smaller than the expected data size.

Yes, this is documented somewhere. The measurement HOWTO has some bits of information, and the Overview of Statistical Data in the Tor Network explains it some more.

Still, it's not really intuitive to process Torperf data.

We should take the R code from #2543 (and possibly other code?), generalize it a bit, document it for R-noobs, and put it in the Torperf repository.

Child Tickets

Change History (11)

comment:1 Changed 8 years ago by karsten

This tasks contains three subtasks:

  1. Generalize filter.R from the #2543 code, so that it can be run without modifying the source. This includes:
    • Pass the .data files to process as command-line arguments to the R script. This script would be called by Rscript filter.R *.data or Rscript filter.R, and the arguments would be extracted using args <- commandArgs(trailingOnly = TRUE). If the script doesn't have any command-line arguments, it could simply attempt to read all files in the current directory ending with .data.
    • Change the read function to extract the guards, filesize, and bytes parameters from the filename. For example, if we parse, guards could be torperf, and filesize would be 50kb; bytes is a bit more complicated, because we need to parse the int value out of 50kb and need to know that kb means times 1024.
    • Remove the part that cuts off observations before and after hard-coded dates and have the user do this particular filter step manually.
  2. Generalize timematrix.R from the #2543 code. I think we can use it unchanged, but someone should confirm that. I don't think we can make this file configurable from the command line easily. Not sure if we should try.
  3. Modify the #2543 HOWTO to remove the #1919 specifics.

Tom, after giving you the quick R overview on this code last Saturday, would you like to tackle this?

comment:2 Changed 8 years ago by karsten

Owner: karsten deleted
Status: newassigned

comment:3 Changed 8 years ago by karsten

Keywords: TorPerfIteration20110305 added

comment:4 Changed 8 years ago by karsten

We might want look at plot_results.R and maybe reuse parts of it when implementing this ticket. Once we're sure we have everything, let's delete plot_results.R in the repository.

comment:5 Changed 8 years ago by tomb

Owner: set to tomb

comment:6 Changed 8 years ago by dchasteen

Points: 3

Added points from sprint planning meeting.

comment:7 Changed 8 years ago by tomb

Status: assignedneeds_review

I have done a git push public (tomb/torperf.git) of what I hope is the desired output of this task. If someone would be so kind as to look at it, and give me a pointer to something that will tell me how I should merge this into something so that it can be useful to other people I would appreciate it greatly.

comment:8 Changed 8 years ago by karsten

Status: needs_reviewassigned

Your branch looks great and works for me! Here are a few minor comments:

  • Can you update README to list the three new files and describe what they do?
  • The usage example in your HOWTO (line 81) should have the --slave flag, too.
  • You have a FIXME comment in your filter.R, but I couldn't reproduce this problem. Can you try to reproduce it? If not, we should take out the FIXME comment.
  • I'd like to squash all your commits into one, because the development history would be very confusing for people. Do you mind? And can you write a commit message in your last commit that you'd want to see for the single squashed commit?

Please set the needs_review bit again when you're happy with your branch, and I'll merge it into the official Torperf branch. Thanks!

comment:9 Changed 8 years ago by tomb

Actual Points: 3
Status: assignedneeds_review

README and HOWTO updated
FIXME in filter.R left in place: it reproduces on the snapshot of the data that I have. I added a comment in the FIXME that this bug is probably rare and therefore low priority. If other people run into it then it will be worth my time to chase down and squash. Please let me know if you think it is higher priority.

I have committed and pushed. Feel free to squash commits, I hope that the last commit message is explanatory of the whole ticket.

I have left the Actual Points set to 3 because I do not think it makes sense to include the overhead of me learning R. I anticipate future R related tickets taking me closer to the expected amount of time.

comment:10 Changed 8 years ago by tomb

I have placed a copy of data that stimulates the bug in

With this data I can replicate the bug as follows:

03/08/11 !09:35:05 tomb@crypt:~/torperf/metrics$ R --slave -f filter.R --args -start=2011-02-01 -end=2011-02-03 ../data/*.data
Error in data.frame(started = as.POSIXct(x$V1, origin = "1970-01-01"),  :

arguments imply differing number of rows: 0, 1

Calls: FilterMain -> rbind -> read -> data.frame
Execution halted

comment:11 Changed 8 years ago by karsten

Resolution: implemented
Status: needs_reviewclosed

The bug was that by picking specific dates we could end up with partially empty data frames. Specifically, some of the vectors of a data frame had length 0 (the filtered data), but others had length 1 (the constants we were passing). We need to check whether something's left after filtering out the dates we want and return NULL, if not.

Fixed, squashed, rebased, and pushed to master. Yay! Thanks. :)

Note: See TracTickets for help on using tickets.