Opened 2 years ago

Closed 2 years ago

Graph for number of relays that bwauths decided the median for

Reported by: Owned by: Sebastian tom Medium Metrics/Consensus Health Normal

Description

Making a ticket as requested and discussed during the metrics team meeting today :)

Child Tickets

comment:1 Changed 2 years ago by tom

I was thinking about this today. My plan is to create a stacked graph similar to the Fallback Dir graph. But how we handle overlaps is going to be determined by the information we want to derive from this graph.

An overlap is when, for a relay, the bwauths assign values X, Y, Z, W, X. The first and last bwauths chose X, and if X is the low-median, then the first and last bwauths are both the 'deciders'.

If what we want to get out of the graph is "How often does bwauth Alice come up with the value that is chosen as the median?" then we should not graph overlaps at all, and instead double-count overlaps so each assigning bwauth gets one 'point' for an overlapping relay.

If instead what we want to get out of the graph is "How often does bwauth Alice decide the median?" then we should separate out overlaps, and for an overlapping relay no individual bwauth gets a 'point'.

How we seperate out overlaps can be done two ways. We can lump all overlaps into a single category. Or we can create a unique category for all possible overlap combinations.

I think the best choice is to separate out overlaps but put all overlaps into a single category. There's so many combinations of overlaps that it'd be nigh-impossible to derive useful information from the tiny tiny stacks of overlap categories without adding data legends or other fancy chart features I'm not prepared to add at this time.

Version 0, edited 2 years ago by tom (next)

comment:2 Changed 2 years ago by tom

Thinking about this more, it might also make sense to create graphs that illustrate, for each bwauth, what the percentage of relays it was below, at, and above the finalized weight.

This could be done with a stacked area graph for each bwauth, but it'd be better to make a graph showing each bwauth... I don't think that will be feasible though, unless anyone has a brilliant graph idea.

comment:3 Changed 2 years ago by karsten

Thanks for sharing your thoughts above! I also gave this some thoughts today (also related to #21883) and tried out some graphs.

I attached the graph that I found most useful, which is based on your ideas in the second comment above.

Note that above and below stand for relays where the authority measured a bandwidth value above or below the one contained in the consensus, shared and exclusive stand for relays where the authority measured the value in the consensus either shared with other authorities ("overlaps") or exclusively, and unmeasured stands for relays that the authority did not measure.

If you have any other thoughts, let me know. Once I know how to graph this best, I'll heat the room by throwing a few months of descriptors into this for #21883.

comment:4 Changed 2 years ago by tom

That looks better than what I was imagining. I'm going to copy yours!

comment:5 Changed 2 years ago by karsten

Great! I also left a note on #21883 and will wait another day or so for more feedback. Thanks!

comment:6 Changed 2 years ago by teor

I had this idea, too, and I opened #21992. I've now closed it as a duplicate.

comment:7 Changed 2 years ago by Sebastian

This looks great! One thing that would also be of interest to me is how severe the variation was. Maybe we could have a thin black line where the measurement disagrees more than 10% with the median?

comment:8 Changed 2 years ago by karsten

Sebastian, I agree that it would be interesting to see how severe the variation was. One thing I'm worried about is that we're trying to put too much information into one poor graph, which is why I want to suggest a simplification: we drop the distinction between "shared" and "exclusive" and also drop the "unmeasured" area, and therefore we add multiple areas for how much above or below the measured value was. Reasons:

• The distinction between "shared" and "exclusive" is rather artifical and is just the result of rounding small integers. I didn't look at the data, but I would expect most "shared" measurements to be in the 10s or 20s, where measurements are really something like 19.5, 20.4, and 20.9, rounded to 19, 20, and 20. Do we care that those two 20s are the same and the 19 is not?
• Dropping the "unmeasured" area would not cause that information to be lost, it would just be displayed differently: the total colored area would be lower for an authority. We could just say that we're displaying measured relays only. (If this is critical information, we could paint that are dark gray or something, but that makes the graph with the next suggestion a bit less intuitive.)
• These two simplifications would permit us to use a "diverging" color schemes here, like this one, where we'd use different reds for values above and different blues for values below. We could still use classes like "at most 10% above", "at most 50% above", "more than 50% above". Or we could use fewer classes like the two you suggest.

I can make a sample graph if needed. Though I'd like to hear first whether the graph described above would still be useful.

comment:9 Changed 2 years ago by Sebastian

It is at least interesting to note that the bwauths have different amounts of unmeasured relays. Shared and exclusive aren't that interesting indeed, so losing that distinction is totally fine with me.

comment:10 Changed 2 years ago by tom

I have enabled experimental graphs at https://consensus-health.torproject.org/graphs.html (I had been working on this for a while and the discussion this morning was not taken into account.)

Some notes:

• There are missing values in the past few days, those can/will be corrected next week.
• maatuska has phantom spikes of 100% below. Those will also be corrected. (The bug is fixed, the historical data will need to be manually corrected)
• When a bwauth misses a single vote, the spike downward appears misleadingly as an 'above' spike on the 90 30 and 90 day graphs. I'm going to see what I can do about that.
• I have all the data (AFAIK) since 2015 loaded into the sqlite database (which is exported at https://consensus-health.torproject.org/historical.db ) - I just haven't made the visualizations for historical analysis.

comment:11 Changed 2 years ago by tom

Note: I fixed some of these:

• Fixed: There are missing values in the past few days, those can/will be corrected next week.--
• Fixed: maatuska has phantom spikes of 100% below. Those will also be corrected. (The bug is fixed, the historical data will need to be manually corrected)

I do need to confirm that the disappearance of gabelmoo's unmeasured relays is correct and not a bug.

comment:12 Changed 2 years ago by tom

Okay I fixed the unmeasured part of the graph; but I don't plan on backfilling corrected data for the past 5 days.

I don't think I can address the dip problem. I'm going to let this bake for another week or two. I should also confirm my generation code is correct from non-backfilled data. Then I can merge to master.

comment:13 Changed 2 years ago by tom

Resolution: → fixed new → closed

This is rebased and merged.

Note: See TracTickets for help on using tickets.