Opened 9 years ago
Closed 9 years ago
#2563 closed enhancement (implemented)
Add R code for processing Torperf data to the Torperf repository
Reported by: | karsten | Owned by: | tomb |
---|---|---|---|
Priority: | Medium | Milestone: | |
Component: | Metrics/CollecTor | Version: | |
Severity: | Keywords: | TorPerfIteration20110305 | |
Cc: | Actual Points: | 3 | |
Parent ID: | Points: | 3 | |
Reviewer: | Sponsor: |
Description
Torperf writes its output to text files that are quite hard to comprehend and visualize. The completion time needs to be calculated from columns 1, 2, 17, and 18. Also, we only recognize a timeout from the 17th column being 0 and a failure from the 20th column being smaller than the expected data size.
Yes, this is documented somewhere. The measurement HOWTO has some bits of information, and the Overview of Statistical Data in the Tor Network explains it some more.
Still, it's not really intuitive to process Torperf data.
We should take the R code from #2543 (and possibly other code?), generalize it a bit, document it for R-noobs, and put it in the Torperf repository.
Child Tickets
Change History (11)
comment:1 Changed 9 years ago by
comment:2 Changed 9 years ago by
Owner: | karsten deleted |
---|---|
Status: | new → assigned |
comment:3 Changed 9 years ago by
Keywords: | TorPerfIteration20110305 added |
---|
comment:4 Changed 9 years ago by
We might want look at plot_results.R
and maybe reuse parts of it when implementing this ticket. Once we're sure we have everything, let's delete plot_results.R
in the repository.
comment:5 Changed 9 years ago by
Owner: | set to tomb |
---|
comment:7 Changed 9 years ago by
Status: | assigned → needs_review |
---|
I have done a git push public (tomb/torperf.git) of what I hope is the desired output of this task. If someone would be so kind as to look at it, and give me a pointer to something that will tell me how I should merge this into something so that it can be useful to other people I would appreciate it greatly.
comment:8 Changed 9 years ago by
Status: | needs_review → assigned |
---|
Your branch looks great and works for me! Here are a few minor comments:
- Can you update
README
to list the three new files and describe what they do? - The usage example in your
HOWTO
(line 81) should have the--slave
flag, too. - You have a FIXME comment in your
filter.R
, but I couldn't reproduce this problem. Can you try to reproduce it? If not, we should take out the FIXME comment. - I'd like to squash all your commits into one, because the development history would be very confusing for people. Do you mind? And can you write a commit message in your last commit that you'd want to see for the single squashed commit?
Please set the needs_review bit again when you're happy with your branch, and I'll merge it into the official Torperf branch. Thanks!
comment:9 Changed 9 years ago by
Actual Points: | → 3 |
---|---|
Status: | assigned → needs_review |
README and HOWTO updated
FIXME in filter.R left in place: it reproduces on the snapshot of the data that I have. I added a comment in the FIXME that this bug is probably rare and therefore low priority. If other people run into it then it will be worth my time to chase down and squash. Please let me know if you think it is higher priority.
I have committed and pushed. Feel free to squash commits, I hope that the last commit message is explanatory of the whole ticket.
I have left the Actual Points set to 3 because I do not think it makes sense to include the overhead of me learning R. I anticipate future R related tickets taking me closer to the expected amount of time.
comment:10 Changed 9 years ago by
I have placed a copy of data that stimulates the bug in
http://cryptocracy.net/metrics/data.tar.bz2
With this data I can replicate the bug as follows:
03/08/11 !09:35:05 tomb@crypt:~/torperf/metrics$ R --slave -f filter.R --args -start=2011-02-01 -end=2011-02-03 ../data/*.data
Error in data.frame(started = as.POSIXct(x$V1, origin = "1970-01-01"), :
arguments imply differing number of rows: 0, 1
Calls: FilterMain -> rbind -> read -> data.frame
Execution halted
comment:11 Changed 9 years ago by
Resolution: | → implemented |
---|---|
Status: | needs_review → closed |
The bug was that by picking specific dates we could end up with partially empty data frames. Specifically, some of the vectors of a data frame had length 0 (the filtered data), but others had length 1 (the constants we were passing). We need to check whether something's left after filtering out the dates we want and return NULL, if not.
Fixed, squashed, rebased, and pushed to master. Yay! Thanks. :)
This tasks contains three subtasks:
Rscript filter.R *.data
orRscript filter.R first.data second.data
, and the arguments would be extracted usingargs <- commandArgs(trailingOnly = TRUE)
. If the script doesn't have any command-line arguments, it could simply attempt to read all files in the current directory ending with.data
.guards
,filesize
, andbytes
parameters from the filename. For example, if we parsetorperf-50kb.data
,guards
could betorperf
, andfilesize
would be50kb
;bytes
is a bit more complicated, because we need to parse the int value out of50kb
and need to know thatkb
means times 1024.Tom, after giving you the quick R overview on this code last Saturday, would you like to tackle this?