Do something with advertised bandwidth distribution graphs

added component::metrics/website owner::metrics-team priority::medium severity::normal status::new type::enhancement labels

Trac:

Here are two examples with advertised bandwidth distribution (at the top of each graph) compared to consensus weight distribution (at the bottom of each graph). The first example is for the n-th fastest relays, the second for percentiles.

What should we do? Retain, rewrite, remove, or replace the two advertised bandwidth distribution graphs?

Trac:
Status: new to needs_review

Planned for review party tomorrow.

Trac:
Reviewer: N/A to irl

Switching to consensus weight is a good compromise where the alternative is removing the graphs. I don't think we need both percentiles and n-th fastest. Drop the n-th fastest and just have percentiles. Can we do 100, 99, 98, 95, 75, 50, 25, 3, 2, 1, 0? These don't need to be configurable, just fixed is OK.

Trac:
Status: needs_review to new

Trac:

Replying to irl:

Switching to consensus weight is a good compromise where the alternative is removing the graphs.

Works for me.

I don't think we need both percentiles and n-th fastest. Drop the n-th fastest and just have percentiles.

Works for me, too.

Can we do 100, 99, 98, 95, 75, 50, 25, 3, 2, 1, 0? These don't need to be configurable, just fixed is OK.

This one is tricky. We're looking at a distribution that is far from normal. I made a quick graph with those percentiles:

(That graph would need some more love, like using labels on the y axis that are not in scientific notation, reordering percentiles in the legend, and using more intuitive labels for the two subplots than TRUE and NA. I didn't spend the time on that yet, but those things would get fixed.)

The only really visible percentiles are 100, 99, 98, and maybe 95. All others are hard to distinguish in the graph.

I also tried a log scale, but you can imagine how that's rather unintuitive to read. Another uncool aspect of the log scale is that the minimum consensus weight (of unmeasured relays) is 0.

I'd say, if we switch to consensus weight percentiles, let's keep percentiles configurable. Maybe one person is interested in the extremes, and another person wants to look at the center. Giving them just a single graph might make at least one of them unhappy.

In fact, we could even keep the n-th fastest if that keeps folks happy. This part doesn't cost us much maintenance effort. It's the advertised bandwidth stuff that I'd really want to get rid of.

arma, what do you think?

I often use n-th fastest to work out the fastest relay(s) over time.

I expect to use n-th fastest a bit while developing PrivCount to answer questions like:

what's the highest bandwidth?
what do we get if we aggregate the top N relays?
what's the minimum relay count and consensus weight we should require to create an aggregate total? (we can't have a bandwidth requirement, because we don't know the bandwidth until after we aggregate)

I am ok with the 'replace' plan, where we switch from descriptor bandwidths to consensus weights.

These graphs are all about (or at least, were started for) visualizing how centralized the Tor network is. For example, they aimed to help answer the questions "how much of the Tor network are the top x relays, or the top x% of the relays?" There are many other ways we might visualize the centralization of the network over time, and which for me might be at least as good as these current graphs. For examples,

"how many relays are in the top 50% of the network by bandwidth or by consensus weight?"
"if we think of the current Tor network in terms of equally weighted relays, rather than the current wildly unbalanced weights, how many uniformly-weighted relays would it be the equivalent of?"
"if a client builds 100 circuits, what's the expected number of relays (maybe broken out into first / second / third hop) that it will interact with?"

Replying to teor:

I often use n-th fastest to work out the fastest relay(s) over time.

Assuming we keep the parameter for n-th fastest, you'd still learn the n-th fastest relay by consensus weight, just not by advertised bandwidth. Depending on how you define how fast a relay is, this would still be possible with the new graph.

I expect to use n-th fastest a bit while developing PrivCount to answer questions like:

what's the highest bandwidth?

After the suggested change you'd have to look at descriptors yourself for this. However, this particular question is relatively easy: just grep for bandwidth lines and compute the maximum advertised bandwidth.

what do we get if we aggregate the top N relays?

This question would require some more code. However, it's also not immediately answered by the Tor Metrics graphs, except maybe for N < 4.

what's the minimum relay count and consensus weight we should require to create an aggregate total? (we can't have a bandwidth requirement, because we don't know the bandwidth until after we aggregate)

This is another question that would probably require you to write code yourself.

In conclusion, the suggested graph would answer your questions just as well as the current graphs, right?

Replying to arma:

I am ok with the 'replace' plan, where we switch from descriptor bandwidths to consensus weights.

Good to know!

These graphs are all about (or at least, were started for) visualizing how centralized the Tor network is. For example, they aimed to help answer the questions "how much of the Tor network are the top x relays, or the top x% of the relays?" There are many other ways we might visualize the centralization of the network over time, and which for me might be at least as good as these current graphs. For examples,

"how many relays are in the top 50% of the network by bandwidth or by consensus weight?"

"if we think of the current Tor network in terms of equally weighted relays, rather than the current wildly unbalanced weights, how many uniformly-weighted relays would it be the equivalent of?"

"if a client builds 100 circuits, what's the expected number of relays (maybe broken out into first / second / third hop) that it will interact with?"

I'll start a new ticket for doing a one-off analysis to answer these questions. Then we can decide whether we want to add any of these graphs to the website.

Until then I'll move forward with this switch. I guess I'll first add the new graph and declare the existing two graphs as deprecated, and two weeks later I'll remove those two graphs.

Thanks!

Removing myself as reviewer for now. I'll probably be the reviewer when it comes back around but there is no reason a hypothetical third member of the metrics team couldn't also be a reviewer.

Trac:
Reviewer: irl to N/A

mentioned in issue #29523 (moved)

Do something with advertised bandwidth distribution graphs

Child items 0

Activity