It would be great if we had some sort of output of the 15 torperf runs that are running somewhere. Right now, afaik, all that data will just go into a hole until someone decides to dig it up.
If this is too much work to be done in a reasonable amount of time, I'd settle having just the timing graphs for each of the 15 runs in a directory I can access somewhere.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
FYI: It's also pretty likely we will also very quickly conclude than many of these 15 runs are useless, with no reason to run. So the sooner I can look at at least some output, the better.
I agree that the R code posted above should be in the Torperf repository. But it's very specific to this problem of visualizing the #1919 (moved) Torperfs. I should generalize the code a bit, so that it doesn't require 15 Torperf data files and doesn't attempt to plot 3 x 5 graphs in a large PNG. I opened the new ticket #2563 (moved) to work on this.
Mike, you have the data and the R scripts for analyzing the #1919 (moved) Torperfs. Do you need anything else here?
Note that, IMHO, this ticket does not include adding these graphs to the metrics website and update them automatically. We set up the #1919 (moved) Torperf as an experiment that we might want to stop or change at any point. We should only add graphs to the website if we know we'll want to track something over a longer period of time. And if we decide we want these graphs on the website, we should add a new ticket.
Yes, we can close this. I do think we should have these runs graphed on the metrics website in some form, possibly the same as what exists there now at https://metrics.torproject.org/performance.html, but with all 15 options in a dropdown, not just the three.
I also think we should capture the 1st and 4th quartiles in a different color, so we can keep an eye on the total spread over time (since variance is the pain point of tor use).
Does this sound right? Should this be two new tickets, or just one?
I do think we should have these runs graphed on the metrics website in some form, possibly the same as what exists there now at https://metrics.torproject.org/performance.html, but with all 15 options in a dropdown, not just the three.
I don't think that adding 12 new graphs to the metrics website will do what you want. The current graphs have the purpose of comparing Torperf results over time. The most useful Torperf graph right now is the one that combines the results from siv, moria, and ferrinii. That's a single graph telling you the story how Tor performance has evolved over time. (Or rather, three graphs for the three file sizes.)
Now, adding 12 graphs for the custom guard node selections won't be useful without comparing them to each other. So, if I added these 3x4 graphs to the website, you'd open 4 browsers, load a graph in each of them, and compare the four graphs. We should really come up with something better than that.
Also, it's more complicated than it probably seems to add these new graphs without breaking the existing graphs.
My plan is to do the analysis offline and not touch the website code at all. If we find out that we like certain graphs and if we decide we want to track these graphs over time, then we should add new graphs to the website. But I'm not even convinced that we'll keep the Torperfs with custom guard node selections running in the future. It might be that we'll learn something from the experiment and decide we want to do a new experiment. In that case we'll be angry about wasting programming effort to get new graphs on the website.
I also think we should capture the 1st and 4th quartiles in a different color, so we can keep an eye on the total spread over time (since variance is the pain point of tor use).
I don't understand. The 1st quartile is the area below the dark line and the 3rd quartile is the area above it. What do you mean?
Does this sound right? Should this be two new tickets, or just one?
Adding more graphs to the website shouldn't be a new ticket. Not sure what you mean with the color thing, but that would be a new ticket.
Trac: Status: new to closed Resolution: N/Ato implemented
Replying to mikeperry:
Also, it's more complicated than it probably seems to add these new graphs without breaking the existing graphs.
My plan is to do the analysis offline and not touch the website code at all. If we find out that we like certain graphs and if we decide we want to track these graphs over time, then we should add new graphs to the website. But I'm not even convinced that we'll keep the Torperfs with custom guard node selections running in the future. It might be that we'll learn something from the experiment and decide we want to do a new experiment. In that case we'll be angry about wasting programming effort to get new graphs on the website.
Ok. How about generating these quartile over time graphs offline. Does code to do so exist anywhere? I would love to see these 15 graphs over time as we perform a few experiments.
I also think we should capture the 1st and 4th quartiles in a different color, so we can keep an eye on the total spread over time (since variance is the pain point of tor use).
I don't understand. The 1st quartile is the area below the dark line and the 3rd quartile is the area above it. What do you mean?
Technically you are correct, but the 1st quartile in particular also has a lower bound, which is the fastest request to complete for that timeslice. The 4th quartile has an upper bound, which is the slowest request to complete.
But what I really want to visualize over time is the variance in that top quartile, which I suppose is also a function of its density.. I want to run a few CBT experiments and a handful of bandwidth authority experiments while we are gathering these metrics, and I want to observe if they improve or worsen our variance in that top, 4th quartile. This may mean we need multiple quantiles, or maybe just stick to density plots like we have in timematrix.png.
Ok. How about generating these quartile over time graphs offline. Does code to do so exist anywhere? I would love to see these 15 graphs over time as we perform a few experiments.
This code doesn't exist yet, but I can write it. Is a 3x5 graph matrix okay, too, or do you want 15 separate graphs?
I also think we should capture the 1st and 4th quartiles in a different color, so we can keep an eye on the total spread over time (since variance is the pain point of tor use).
I don't understand. The 1st quartile is the area below the dark line and the 3rd quartile is the area above it. What do you mean?
Technically you are correct, but the 1st quartile in particular also has a lower bound, which is the fastest request to complete for that timeslice. The 4th quartile has an upper bound, which is the slowest request to complete.
Aha! I was thinking about the 1st and 3rd quartile as the data points after 25% and 75% of the values. And confusingly, I talked about them as the 25-50% and 50-75% ranges. And I think you were talking about the 1st and 4th quartile as the 0-25% and 75-100% ranges, right? Yes, we can plot the minimum and maximum line, too.
But what I really want to visualize over time is the variance in that top quartile, which I suppose is also a function of its density.. I want to run a few CBT experiments and a handful of bandwidth authority experiments while we are gathering these metrics, and I want to observe if they improve or worsen our variance in that top, 4th quartile. This may mean we need multiple quantiles, or maybe just stick to density plots like we have in timematrix.png.
How about I write an R script that allows you to define the percentiles that you want to have lines for? The current graphs would be percentiles 25, 50, and 75. For your experiments you could also pick 75, 85, 95 or 97, 98, 99, etc. Would that be useful?