Basically, we want to have a standard way to graph results from key metrics from before, during, and after the experiment.
In this case, we want CDF-TTFB, CDF-DL from onionperf results.
We also want CDF-Relay-Stream-Capacity and CDF-Relay-Utilization for the consensus, as well as from the votes, to see if the votes from TorFlow drastically differ from sbws during the experiment.
Update from June 10, 2020: We finished the CDF-TTFB and CDF-DL portions by adding these graphs to OnionPerf's visualize mode. The remaining parts are the CDF-Relay- graphs that are based on consensuses and votes. Keep this in mind when reading comments up to June 10, 2020.*
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Okay, let's continue this discussion on this ticket.
I'm attaching new graphs with CDF-TTFB for all OnionPerfs running during that time.
These graphs are using colors from ColorBrewer that are supposed to be easier to distinguish for colorblind people, but we can still colors that as we move forward.
CDF-DL requires some more processing, and I'm yet not sure how to do the other two. I'll see when I get there. Posting updates as I have them.
This is a bit embarrassing, but the reason for the 50% bump was that I mixed public and onion server results... Fixed here! That file also contains CDF-DL. Will try to do the other two later tonight.
And here's another document with both CDF-Relay-* graphs; without votes. Expect bugs!
This might be a good point for you to provide feedback whether this is roughly going into the right direction. Setting to needs_review for this purpose. I'll pause working on this until I hear back. Thanks!
Hrmm. CDF-Relay-Capacity should have a X axis range of [0.0, 1.0]. I just realized that there was some incorrect wording in the definition of the metric on https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics. It should be average read/write history divided by peak observed bandwidth, for each relay in the network. In other words, you average the read/write history over time for a relay, and divide it by the peak advertised bandwidth over that period of time. This should produce a value between 0 and 1.
There is also a bug in the CDF-Relay-Stream-Capacity, though I am not sure what it is. It should be centered around 1.0, not 0.01. Can you write the formula you used for this? Perhaps you just forgot to include the scale multiplier for the measured bandwidth?
There's one line per consensus entry, which is where we get the following columns from: fingerprint,validafter,hasexitflag,hasguardflag,measured. The read and write columns come from bandwidth histories contained in extra-info descriptors. rate,burst,observed come from the server descriptor referenced by the consensus entry.
There's one line per consensus entry, which is where we get the following columns from: fingerprint,validafter,hasexitflag,hasguardflag,measured. The read and write columns come from bandwidth histories contained in extra-info descriptors. rate,burst,observed come from the server descriptor referenced by the consensus entry.
Ideally, peak would be that peak-in-30-days thing we grinded out in Whistler, but for this we actually wanna see what the instantaneous change to peak that the experiment caused did to results.The of plot should still be between 0 and 1.0. Any relay that has a value over 1.0 would be very interesting to look at.> For CDF-Relay-Stream-Capacity I used:> > ```> plot -> measured / observed> }}}Yes, I think this is just off by a factor of 1000 then. It should be:{{{plot -> 1000*measured / observed
I attached a new set of graphs here. They are all cut off at percentile 95, and they all contain the plotted formula in the subtitle.
Regarding the max(rate, burst, observed) part, I'm worried that this number is not very meaningful. In theory, the operator can pick any numbers for rate and burst which the relay can never provide. I plotted one graph with that number, but I don't think we should use that.
The min(rate, burst, observed) number is what we typically use as advertised bandwidth. Maybe it's sufficient to ignore what the operator thought the relay could/should provide and look at observed bandwidth only. I included a plot for this, too.
I recall the peak advertised bandwidth thing we talked about in Whistler. It's significantly harder to compute than the current advertised (or observed) bandwidth, because we need to include lots of descriptors for that. We should pick a formula that we use for all experiments, not just for this one. Maybe we can start with the single value and leave it as a possible extension for the future to consider a moving window of 30 days.
I recall the peak advertised bandwidth thing we talked about in Whistler. It's significantly harder to compute than the current advertised (or observed) bandwidth, because we need to include lots of descriptors for that. We should pick a formula that we use for all experiments, not just for this one. Maybe we can start with the single value and leave it as a possible extension for the future to consider a moving window of 30 days.
Yeah.. so we really need this 30 day peak of the observed value, as that gets us closer to the true network capacity and utilization. Rob's experiment is useful exactly because it forces relays closer to their peak capacity. Long term, I think metrics should be computing these 30 day maxes continually and providing them as an auxillary csv or other data stream for graphs like these.
For this experiment, it is interesting to see the direct change from 5 day peak observed to Rob's new values, but if I had to pick only one graph, I would still prefer using 30 day peaks.