Opened 2 months ago

Closed 3 days ago

#28137 closed enhancement (fixed)

Modify "Total consensus weights across bandwidth authorities" graph to only include relays that end up in the consensus

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Statistics Version:
Severity: Normal Keywords:
Cc: metrics-team, teor, juga, pastly, arma, starlight@… Actual Points:
Parent ID: Points:
Reviewer: irl Sponsor:

Description

This suggestion is based on a discussion with arma and Sebastian in #tor-dev:

07:00:41 <+armadev> would be interesting to compare to: "the sum of moria1's 
                    votes about each relay that ended up in the consensus"
07:00:51 <+armadev> since that would compare between bwauths better
07:01:12 <+armadev> right now if moria1 knows about a bunch of relays that used 
                    to be around, but aren't now, and other dir auths don't 
                    know about them, then moria1 votes a much higher total

So, it does seem plausible that the totals by authority would be more useful if the underlying set of relays is the same.

One issue is a technical one: we'd need to retain much more data in the database to implement this graph. The background is that we always need to match relays in a vote with the corresponding consensus in order to decide whether to include a relay in the total sum or not. However, we do not require descriptors to appear in a certain order, and we want the end result to be the same even if we process a consensus or vote a couple days or even weeks later.

Another, minor issue is that we'd have to reprocess the entire archive. This is doable and shouldn't stop us. Just saying that it's going to require some effort.

Are there alternatives, like only including relays from votes that have the Running flag?

Maybe we should run this analysis once and separate from metrics-web and then decide.

teor, juga, pastly, you were all involved in #25459 which led to the original graph. What do you think about this possible modification?

Child Tickets

Attachments (2)

totalcw-2018-11-22.png (138.7 KB) - added by karsten 4 weeks ago.
totalcw-2018-11-22a.png (147.9 KB) - added by karsten 4 weeks ago.

Download all attachments as: .zip

Change History (15)

comment:1 in reply to:  description Changed 8 weeks ago by juga

Replying to karsten:
[...[
]

So, it does seem plausible that the totals by authority would be more useful if the underlying set of relays is the same.

yes, i didn't realize about this before, i think makes more sense to compare the same set

[...]

Are there alternatives, like only including relays from votes that have the Running flag?

hmm, i'm not sure how this could help to make the graph more useful

Maybe we should run this analysis once and separate from metrics-web and then decide.

if this is not more extra work, yeah :)

[...]

comment:2 Changed 6 weeks ago by teor

I wanted the total for the relays in the consensus: I didn't realise we were getting something different.

Are there alternatives, like only including relays from votes that have the Running flag?

If checking the Running flag in a vote is faster, we could do it as an initial fix.

comment:3 Changed 6 weeks ago by karsten

Okay, I might have an idea how we can implement this with reasonable effort:

  • We import fingerprints into a table and assign numeric identifiers that we use in other tables.
  • We import all fingerprints in a consensus together with the votes referenced from the consensus.
  • We import all fingerprints in a vote together with a way to refer to the consensus coming out of it.
  • When aggregating, we join votes with the consensus they refer to.
  • After aggregating, we delete all votes that we aggregated in the previous step and we delete all consensuses if we aggregated all votes referenced from consensuses.

This approach effectively moves the aggregation step to the database. It was more convenient to just do it after parsing, but it's doable in the database as well.

This approach also ensures that tables don't grow forever. I would expect that there are just a few votes and consensuses missing from the archives, so we'd carry just those fingerprints in the database forever.

However, this is not a trivial change to the existing "totalcw" module. It's basically a rewrite.

Are there any other changes we might want to make in the near future? For example, does #28328 maybe add something here?

comment:4 Changed 5 weeks ago by starlight

Cc: starlight@… added

Changed 4 weeks ago by karsten

Attachment: totalcw-2018-11-22.png added

comment:5 Changed 4 weeks ago by karsten

Alright, I implemented the idea above.

However, it turns out that matching all vote entries with all consensus entries cannot be done with reasonable effort, at least not with the current tools we use. For example, processing 3 days of descriptors takes quite reasonable 5 minutes, but processing 3 weeks of descriptors already takes almost 3 hours. This simply doesn't scale to 3 months or 3 years.

We do have these 3 weeks from my tests though, so let's look at the results:


The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.

What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.

comment:6 in reply to:  5 Changed 4 weeks ago by teor

Replying to karsten:

Alright, I implemented the idea above.

However, it turns out that matching all vote entries with all consensus entries cannot be done with reasonable effort, at least not with the current tools we use. For example, processing 3 days of descriptors takes quite reasonable 5 minutes, but processing 3 weeks of descriptors already takes almost 3 hours. This simply doesn't scale to 3 months or 3 years.

So the process scales non-linearly?
When processing 3 days, each hour of consensus and votes takes about 4 seconds.
But when processing 3 weeks, each hour of consensus and votes takes 21 seconds.

Can you do each consensus separately?

More precisely:

For each consensus, in a set of temporary tables:

  • We import all fingerprints in a consensus together with the votes referenced from the consensus.
  • We import fingerprints into a table and assign numeric identifiers that we use in other tables.
  • We import all fingerprints in a vote together with a way to refer to the consensus coming out of it.
  • When aggregating, we join votes with the consensus they refer to, then persist relevant data in permanent tables, with permanent identifiers.
  • After aggregating, we delete all votes that we aggregated in the previous step and we delete all consensuses if we aggregated all votes referenced from consensuses.
  • If there is any data left, we persist that data in permanent tables, with permanent identifiers.

If this is going to take a lot of effort, then don't worry about it: the difference isn't important in this case.

We do have these 3 weeks from my tests though, so let's look at the results:
...
The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.

I agree: we could account for it with some documentation.

What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.

Let's see the results for Running, then decide.

Changed 4 weeks ago by karsten

Attachment: totalcw-2018-11-22a.png added

comment:7 Changed 4 weeks ago by karsten

Replying to teor:

Can you do each consensus separately?

I think that's a database optimization question in this case. I suspect that we're somehow matching vote entries with all consensus entries in the database, not just those entries that are connected via the same valid-after time, at least in an intermediate step. Maybe we'd have to do something with subselects here in order to assist the database.

If this is going to take a lot of effort, then don't worry about it: the difference isn't important in this case.

Right.

The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.

I agree: we could account for it with some documentation.

Okay.

What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.

Let's see the results for Running, then decide.

It's here:


I had to increase line size and make lines transparent in order to make the green line visible at all: it almost always overlaps with the blue line. One exception is maatuska at the beginning of November; possibly related to its own reachability testing not working very well? Another minor exception is Faravahar on the 4th and 10th, though the difference there is really small.

I tend towards green, because it adds way less code and delivers basically the same result.

I'd also break down vote totals by Guard/Exit flag combination (#28328). Note, however, that these Guard and Exit flags would also be taken from the vote, not from the consensus. Mostly a documentation issue again.

And I'd include consensus totals (#28352). Just mentioning those for completeness.

How does this sound?

comment:8 in reply to:  7 Changed 4 weeks ago by teor

Replying to karsten:

Replying to teor:

Can you do each consensus separately?

I think that's a database optimization question in this case. I suspect that we're somehow matching vote entries with all consensus entries in the database, not just those entries that are connected via the same valid-after time, at least in an intermediate step. Maybe we'd have to do something with subselects here in order to assist the database.

Or carefully constructed indexes?
(For example, an index on all the join conditions, in the correct order.)

Or datetimes converted to integers?
(I remember native datetimes being slower to match than integers, but that depends on how the DBMS stores and compares them.)

(My most recent experience is with SQL Server 2012, but I assume that DBMSs are roughly equivalent.)

What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.

Let's see the results for Running, then decide.

It's here:


I had to increase line size and make lines transparent in order to make the green line visible at all: it almost always overlaps with the blue line. One exception is maatuska at the beginning of November; possibly related to its own reachability testing not working very well? Another minor exception is Faravahar on the 4th and 10th, though the difference there is really small.

I tend towards green, because it adds way less code and delivers basically the same result.

Seems fine to me.

I'd also break down vote totals by Guard/Exit flag combination (#28328). Note, however, that these Guard and Exit flags would also be taken from the vote, not from the consensus. Mostly a documentation issue again.

And I'd include consensus totals (#28352). Just mentioning those for completeness.

How does this sound?

These all seem fine to me.

comment:9 Changed 3 weeks ago by karsten

Status: newneeds_review

irl, please review my tasks-28137-28328-28352 branch that implements this ticket along with the other two mentioned tickets. (It simply made more sense to have a single branch, not three, also to reduce the review work.)

comment:10 Changed 2 weeks ago by irl

Reviewer: irl

comment:11 Changed 4 days ago by notirl

Status: needs_reviewmerge_ready

Looks good to me.

comment:12 Changed 4 days ago by karsten

Cool, I'll start deploying the change, which requires re-processing quite some data. Should be ready early next week.

comment:13 Changed 3 days ago by karsten

Resolution: fixed
Status: merge_readyclosed

A bit earlier than expected: done! Thanks, everyone! Closing.

Note: See TracTickets for help on using tickets.