Opened 9 months ago

Closed 8 months ago

#33065 closed enhancement (fixed)

metrics dirbytes graph either underreports or leaves out dir auths?

Reported by: arma Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Website Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points: 3
Parent ID: Points: 3
Reviewer: irl Sponsor:

shows that on average about 1gbit/s is spent serving directory information.

But the investigations in #33018 show that moria1 by itself is pushing about 135mbit/s, and I hear gabelmoo is doing even more than that.

The text under the graph says "directory mirrors", so maybe we are leaving out dir auths in this graph? In that case, it would be cool to change it to one of those layered graphs so you can see how much of the total is dir auths and how much of the total are other relays.

Or maybe we're somehow not adding up the right things? Step zero is to spot-check the current data and the current graphs and become convinced that indeed we're presenting the right data.

Originally reported in

Child Tickets

Attachments (4)

dirbytes-2020-02-23.png (100.1 KB) - added by karsten 8 months ago.
dirbytes-2020-03-03.png (109.3 KB) - added by karsten 8 months ago.
inserts-2020-03-03.png (224.7 KB) - added by karsten 8 months ago.
dirbytes-2020-03-04.png (110.5 KB) - added by karsten 8 months ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 9 months ago by karsten

Cc: metrics-team added
Owner: changed from metrics-team to karsten
Status: newaccepted
Type: defectenhancement

The graph leaves out directory authorities, as the text indicates. Turning this into an enhancement rather than a defect, because the code is doing what it's supposed to be doing.

I'll see whether we can change the graph. The biggest blocker is that we'll have to reprocess the archives. Otherwise the directory authority numbers will suddenly jump from 0 to $bignumber which would probably cause a lot of confusion. But that reprocessing can happen in the background somewhere. Maybe it's a simple code change.

Changed 8 months ago by karsten

Attachment: dirbytes-2020-02-23.png added

comment:2 Changed 8 months ago by karsten

Actual Points: 1.5
Points: 3

The reprocessing part turned out to be harder than expected. Finally, here's a graph with dirbytes reported by directory mirrors and by directory authorities:

Do these numbers look plausible?

Is having this graph on the metrics website worth spending another 1.5 points on it (minor code changes, code review, documentation, checking results from reprocessing)?

Changed 8 months ago by karsten

Attachment: dirbytes-2020-03-03.png added

comment:3 Changed 8 months ago by karsten

I guess I was too curious myself how the numbers above compare to the past, so I reprocessed the archive and made another graph for the time since 2010:

Next step is to clean up the code and get it reviewed.

Changed 8 months ago by karsten

Attachment: inserts-2020-03-03.png added

comment:4 Changed 8 months ago by karsten

Status: acceptedneeds_review

Please review the last four commits in my task-33065-7 branch. One of these commits is a rewrite of the insert_bwhist function from PL/pgSQL to pure SQL that was necessary in order to reprocess the archives in reasonable time. The performance gain of inserting a few days of data into a fresh database is plotted here:

comment:5 Changed 8 months ago by irl

Status: needs_reviewmerge_ready

These look OK to merge as they are, but if you had time then I'd be interested to see how much more readable these graphs are if instead of split by auth/mirror they are split by read/write and use stacked area plots for auth/mirror. The total height is the total bytes by both, we can see how the load is distributed to mirrors more easily, and the read/write scales are then both more appropriately sized for their actual values.

Changed 8 months ago by karsten

Attachment: dirbytes-2020-03-04.png added

comment:6 Changed 8 months ago by karsten

I gave this a try this morning:

I'm not convinced yet that this visualization is better. I think the January 2020 situation is a typical example for stacked area graphs not working so well: it's very hard to say whether the "Directory mirrors" area changes during that time. I know that we're using stacked area plots in other cases, but in those cases the relative fractions remain roughly the same.

Another issue is that, while read/write scales are more appropriately sized for their actual values now, it's now more difficult to compare their absolute scales.

I'll leave this here for further discussion and continue deployment of the updated database in the background. Changing the graph is easy, even in one or two weeks from now.

Thanks for the fast review!

comment:7 Changed 8 months ago by irl

Reviewer: irl

Hmm ok, now I see the graph I think the graph in comment:3 is probably clearer, so those commits on your task-33065-7 branch are good.

comment:8 Changed 8 months ago by karsten

Actual Points: 1.53
Resolution: fixed
Status: merge_readyclosed

This is now merged and deployed. Closing.

Note: See TracTickets for help on using tickets.