Now that we have our scripts in #6232 (moved) extracting useful data, here's another graph that would be useful.
On the x axis is our set of 900 Exit relays, ordered by chance of being chosen. f(x) is the chance that the user's selected exit is in the first (biggest) x relays.
We'll likely find that we should zoom in on just the x \in [0..50] range or something, since otherwise the graph will just shoot up to 1.0 and stay there.
Auto generating this graph for the current consensus, and sticking it near consensus-health, might be wise.
Once we've done the basic graph, we might find that graphing f(10) over time tells us something interesting about #6232 (moved) (for various values of 10).
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Graphing cdf of exit probabilities using consensus weights, and also cdf using descriptor bandwidths, could be a good way of visualizing the tradeoff we're making by concentrating traffic onto the faster relays.
5 relays were 30% of the network on July 20, 2012. Two of them gained the Guard flag since then, cutting in half their chance of being chosen in the exit position.
The top 10 relays were 45% of the network; top 20 relays were 60% of the network; and top 40 relays were 80% of the network.
Out of these three, the cdf is way way easier to read.
As another attempt to graph progress over time, how about a cdf graph with four curves: a) today, b) a week ago, c) a month ago, and d) a year ago.
We should also ponder some sort of smoothing or averaging, since I don't want to know how things were on June 24 2012 at 19:00, I want to know how things were "in June 2012". I fear most such approaches will quickly turn into garbage science though.
R and I spent the afternoon together and painted three new bitmaps for U:
This graph shows the CDF with five curves (your four plus one more for "3 months before").
This graph and that one are the timeplots from last time, but with 1 data point per week instead of per day, and with fewer lines overall.
Next steps:
Compute exit probabilities based on advertised bandwidths as suggested in the first comment. Make new graphs to compare probabilities based on consensus weights and advertised bandwidths.
Wait until we have a final decision which graphs we'd want to be auto-generated, if any. Then automate generating them and add them to the metrics website.
Trac: Status: new to assigned Owner: N/Ato karsten Cc: karsten, gsathya, robgjansen, phw to gsathya, robgjansen, phw
Here's another graph that visualizes exit probabilities. The blank space is reserved for relays coming after the top-50. I think the plot is called "mosaic plot" or "tree map". We could group relays belonging to the same family, country, or AS together and assign a different colour to each group. We could also label rects with nicknames instead of probabilities, at least for the top-10 or top-20 relays. We could also add the remaining relays which didn't make it into the top-50. But this is just a quick prototype to discuss whether the graph type would by useful or not.
This graph shows the CDF with five curves (your four plus one more for "3 months before").
This graph would be more readable if we sorted the curves in the legend (so the first curve listed is the highest curve in the graph). Perhaps sorting them by the value at f(20) is a good approximation?
Reordering legend entries doesn't work so well for automatically generated graphs. It's technically possible, but it might be confusing for viewers. Here's a new graph that uses different shades of green, ordered by displayed date. Is that more readable?
Reordering legend entries doesn't work so well for automatically generated graphs. It's technically possible, but it might be confusing for viewers. Here's a new graph that uses different shades of green, ordered by displayed date. Is that more readable?
Wow, that's subtle. It wasn't until today (when I'm prepping graphs for a meeting with the funder) that I realized that the shades of green corresponded to time. So yes, now that I've realized it, it is better -- but before I realized it, I just thought they were horrible color choices. I guess that means the answer is 'no, not more readable'. :/
I still think it would be really useful to have some version of this graph on the fast-exits metrics page. But I can't figure out which one. Anybody else have suggestions on how to visualize this data usefully?
I'm running out of ideas. If someone has suggestions for a visualization, I can try to implement that in R/ggplot2. Unassigning this ticket from me, because I'm currently not working on it.
Trac: Owner: karsten toN/A Status: needs_information to assigned