Modify "Total consensus weights across bandwidth authorities" graph to only include relays that end up in the consensus

added component::metrics/statistics owner::metrics-team priority::medium resolution::fixed reviewer::irl severity::normal status::closed type::enhancement labels

Replying to karsten: [...[ ]

So, it does seem plausible that the totals by authority would be more useful if the underlying set of relays is the same.

yes, i didn't realize about this before, i think makes more sense to compare the same set

[...]

Are there alternatives, like only including relays from votes that have the Running flag?

hmm, i'm not sure how this could help to make the graph more useful

Maybe we should run this analysis once and separate from metrics-web and then decide.

if this is not more extra work, yeah :)

[...]

I wanted the total for the relays in the consensus: I didn't realise we were getting something different.

Are there alternatives, like only including relays from votes that have the Running flag?

If checking the Running flag in a vote is faster, we could do it as an initial fix.

Okay, I might have an idea how we can implement this with reasonable effort:

We import fingerprints into a table and assign numeric identifiers that we use in other tables.
We import all fingerprints in a consensus together with the votes referenced from the consensus.
We import all fingerprints in a vote together with a way to refer to the consensus coming out of it.
When aggregating, we join votes with the consensus they refer to.
After aggregating, we delete all votes that we aggregated in the previous step and we delete all consensuses if we aggregated all votes referenced from consensuses.

This approach effectively moves the aggregation step to the database. It was more convenient to just do it after parsing, but it's doable in the database as well.

This approach also ensures that tables don't grow forever. I would expect that there are just a few votes and consensuses missing from the archives, so we'd carry just those fingerprints in the database forever.

However, this is not a trivial change to the existing "totalcw" module. It's basically a rewrite.

Are there any other changes we might want to make in the near future? For example, does #28328 (moved) maybe add something here?

Trac:
Cc: metrics-team, teor, juga, pastly, arma to metrics-team, teor, juga, pastly, arma, starlight@binnacle.cx

Trac:

Alright, I implemented the idea above.

However, it turns out that matching all vote entries with all consensus entries cannot be done with reasonable effort, at least not with the current tools we use. For example, processing 3 days of descriptors takes quite reasonable 5 minutes, but processing 3 weeks of descriptors already takes almost 3 hours. This simply doesn't scale to 3 months or 3 years.

We do have these 3 weeks from my tests though, so let's look at the results:

The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.

What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.

Replying to karsten:

Alright, I implemented the idea above.

However, it turns out that matching all vote entries with all consensus entries cannot be done with reasonable effort, at least not with the current tools we use. For example, processing 3 days of descriptors takes quite reasonable 5 minutes, but processing 3 weeks of descriptors already takes almost 3 hours. This simply doesn't scale to 3 months or 3 years.

So the process scales non-linearly? When processing 3 days, each hour of consensus and votes takes about 4 seconds. But when processing 3 weeks, each hour of consensus and votes takes 21 seconds.

Can you do each consensus separately?

More precisely:

For each consensus, in a set of temporary tables:

We import all fingerprints in a consensus together with the votes referenced from the consensus.
We import fingerprints into a table and assign numeric identifiers that we use in other tables.
We import all fingerprints in a vote together with a way to refer to the consensus coming out of it.
When aggregating, we join votes with the consensus they refer to, then persist relevant data in permanent tables, with permanent identifiers.
After aggregating, we delete all votes that we aggregated in the previous step and we delete all consensuses if we aggregated all votes referenced from consensuses.
If there is any data left, we persist that data in permanent tables, with permanent identifiers.

If this is going to take a lot of effort, then don't worry about it: the difference isn't important in this case.

We do have these 3 weeks from my tests though, so let's look at the results: ... The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.

I agree: we could account for it with some documentation.

What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.

Let's see the results for Running, then decide.

Trac:

Replying to teor:

Can you do each consensus separately?

I think that's a database optimization question in this case. I suspect that we're somehow matching vote entries with all consensus entries in the database, not just those entries that are connected via the same valid-after time, at least in an intermediate step. Maybe we'd have to do something with subselects here in order to assist the database.

If this is going to take a lot of effort, then don't worry about it: the difference isn't important in this case.

Right.

The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.

I agree: we could account for it with some documentation.

Okay.

What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.

Let's see the results for Running, then decide.

It's here:

I had to increase line size and make lines transparent in order to make the green line visible at all: it almost always overlaps with the blue line. One exception is maatuska at the beginning of November; possibly related to its own reachability testing not working very well? Another minor exception is Faravahar on the 4th and 10th, though the difference there is really small.

I tend towards green, because it adds way less code and delivers basically the same result.

I'd also break down vote totals by Guard/Exit flag combination (#28328 (moved)). Note, however, that these Guard and Exit flags would also be taken from the vote, not from the consensus. Mostly a documentation issue again.

And I'd include consensus totals (#28352 (moved)). Just mentioning those for completeness.

How does this sound?

Replying to karsten:

Replying to teor:

Can you do each consensus separately?

I think that's a database optimization question in this case. I suspect that we're somehow matching vote entries with all consensus entries in the database, not just those entries that are connected via the same valid-after time, at least in an intermediate step. Maybe we'd have to do something with subselects here in order to assist the database.

Or carefully constructed indexes? (For example, an index on all the join conditions, in the correct order.)

Or datetimes converted to integers? (I remember native datetimes being slower to match than integers, but that depends on how the DBMS stores and compares them.)

(My most recent experience is with SQL Server 2012, but I assume that DBMSs are roughly equivalent.)

What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.

Let's see the results for Running, then decide.

It's here:

I had to increase line size and make lines transparent in order to make the green line visible at all: it almost always overlaps with the blue line. One exception is maatuska at the beginning of November; possibly related to its own reachability testing not working very well? Another minor exception is Faravahar on the 4th and 10th, though the difference there is really small.

I tend towards green, because it adds way less code and delivers basically the same result.

Seems fine to me.

I'd also break down vote totals by Guard/Exit flag combination (#28328 (moved)). Note, however, that these Guard and Exit flags would also be taken from the vote, not from the consensus. Mostly a documentation issue again.

And I'd include consensus totals (#28352 (moved)). Just mentioning those for completeness.

How does this sound?

These all seem fine to me.

irl, please review my tasks-28137-28328-28352 branch that implements this ticket along with the other two mentioned tickets. (It simply made more sense to have a single branch, not three, also to reduce the review work.)

Trac:
Status: new to needs_review

Trac:
Reviewer: N/A to irl

Looks good to me.

Trac:
Status: needs_review to merge_ready

Cool, I'll start deploying the change, which requires re-processing quite some data. Should be ready early next week.

A bit earlier than expected: done! Thanks, everyone! Closing.

Trac:
Status: merge_ready to closed
Resolution: N/A to fixed

closed

mentioned in issue #28352 (moved)

Modify "Total consensus weights across bandwidth authorities" graph to only include relays that end up in the consensus

Child items ...

Activity