This suggestion is based on a discussion with arma and Sebastian in #tor-dev:
07:00:41 <+armadev> would be interesting to compare to: "the sum of moria1's votes about each relay that ended up in the consensus"07:00:51 <+armadev> since that would compare between bwauths better07:01:12 <+armadev> right now if moria1 knows about a bunch of relays that used to be around, but aren't now, and other dir auths don't know about them, then moria1 votes a much higher total
So, it does seem plausible that the totals by authority would be more useful if the underlying set of relays is the same.
One issue is a technical one: we'd need to retain much more data in the database to implement this graph. The background is that we always need to match relays in a vote with the corresponding consensus in order to decide whether to include a relay in the total sum or not. However, we do not require descriptors to appear in a certain order, and we want the end result to be the same even if we process a consensus or vote a couple days or even weeks later.
Another, minor issue is that we'd have to reprocess the entire archive. This is doable and shouldn't stop us. Just saying that it's going to require some effort.
Are there alternatives, like only including relays from votes that have the Running flag?
Maybe we should run this analysis once and separate from metrics-web and then decide.
teor, juga, pastly, you were all involved in #25459 (moved) which led to the original graph. What do you think about this possible modification?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Okay, I might have an idea how we can implement this with reasonable effort:
We import fingerprints into a table and assign numeric identifiers that we use in other tables.
We import all fingerprints in a consensus together with the votes referenced from the consensus.
We import all fingerprints in a vote together with a way to refer to the consensus coming out of it.
When aggregating, we join votes with the consensus they refer to.
After aggregating, we delete all votes that we aggregated in the previous step and we delete all consensuses if we aggregated all votes referenced from consensuses.
This approach effectively moves the aggregation step to the database. It was more convenient to just do it after parsing, but it's doable in the database as well.
This approach also ensures that tables don't grow forever. I would expect that there are just a few votes and consensuses missing from the archives, so we'd carry just those fingerprints in the database forever.
However, this is not a trivial change to the existing "totalcw" module. It's basically a rewrite.
Are there any other changes we might want to make in the near future? For example, does #28328 (moved) maybe add something here?
However, it turns out that matching all vote entries with all consensus entries cannot be done with reasonable effort, at least not with the current tools we use. For example, processing 3 days of descriptors takes quite reasonable 5 minutes, but processing 3 weeks of descriptors already takes almost 3 hours. This simply doesn't scale to 3 months or 3 years.
We do have these 3 weeks from my tests though, so let's look at the results:
The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.
What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.
However, it turns out that matching all vote entries with all consensus entries cannot be done with reasonable effort, at least not with the current tools we use. For example, processing 3 days of descriptors takes quite reasonable 5 minutes, but processing 3 weeks of descriptors already takes almost 3 hours. This simply doesn't scale to 3 months or 3 years.
So the process scales non-linearly?
When processing 3 days, each hour of consensus and votes takes about 4 seconds.
But when processing 3 weeks, each hour of consensus and votes takes 21 seconds.
Can you do each consensus separately?
More precisely:
For each consensus, in a set of temporary tables:
We import all fingerprints in a consensus together with the votes referenced from the consensus.
We import fingerprints into a table and assign numeric identifiers that we use in other tables.
We import all fingerprints in a vote together with a way to refer to the consensus coming out of it.
When aggregating, we join votes with the consensus they refer to, then persist relevant data in permanent tables, with permanent identifiers.
After aggregating, we delete all votes that we aggregated in the previous step and we delete all consensuses if we aggregated all votes referenced from consensuses.
If there is any data left, we persist that data in permanent tables, with permanent identifiers.
If this is going to take a lot of effort, then don't worry about it: the difference isn't important in this case.
We do have these 3 weeks from my tests though, so let's look at the results:
...
The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.
I agree: we could account for it with some documentation.
What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.
I think that's a database optimization question in this case. I suspect that we're somehow matching vote entries with all consensus entries in the database, not just those entries that are connected via the same valid-after time, at least in an intermediate step. Maybe we'd have to do something with subselects here in order to assist the database.
If this is going to take a lot of effort, then don't worry about it: the difference isn't important in this case.
Right.
The red line is what's currently on the Tor Metrics website: it contains measured bandwidths of all relays in a vote, regardless of whether a relay made it into the consensus. The blue line only contains those relays in a vote that also appeared in the consensus. I'd say that the difference is almost negligible.
I agree: we could account for it with some documentation.
Okay.
What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.
Let's see the results for Running, then decide.
It's here:
I had to increase line size and make lines transparent in order to make the green line visible at all: it almost always overlaps with the blue line. One exception is maatuska at the beginning of November; possibly related to its own reachability testing not working very well? Another minor exception is Faravahar on the 4th and 10th, though the difference there is really small.
I tend towards green, because it adds way less code and delivers basically the same result.
I'd also break down vote totals by Guard/Exit flag combination (#28328 (moved)). Note, however, that these Guard and Exit flags would also be taken from the vote, not from the consensus. Mostly a documentation issue again.
And I'd include consensus totals (#28352 (moved)). Just mentioning those for completeness.
I think that's a database optimization question in this case. I suspect that we're somehow matching vote entries with all consensus entries in the database, not just those entries that are connected via the same valid-after time, at least in an intermediate step. Maybe we'd have to do something with subselects here in order to assist the database.
Or carefully constructed indexes?
(For example, an index on all the join conditions, in the correct order.)
Or datetimes converted to integers?
(I remember native datetimes being slower to match than integers, but that depends on how the DBMS stores and compares them.)
(My most recent experience is with SQL Server 2012, but I assume that DBMSs are roughly equivalent.)
What I'd like to try out is add a third line "Running in vote", which would at least kick out relays in a vote that the authority didn't find to be running. I'd expect that line to show up between red and blue. However, a relay that doesn't have the Running flag in one vote can still go into the consensus if the others think it's running. And a relay that has the Running flag from one authority can still not show up in the consensus if the others disagree. So, I'm unclear whether this really helps. Worth trying, and a much smaller change, because it doesn't require us to match vote entries with consensus entries.
Let's see the results for Running, then decide.
It's here:
I had to increase line size and make lines transparent in order to make the green line visible at all: it almost always overlaps with the blue line. One exception is maatuska at the beginning of November; possibly related to its own reachability testing not working very well? Another minor exception is Faravahar on the 4th and 10th, though the difference there is really small.
I tend towards green, because it adds way less code and delivers basically the same result.
Seems fine to me.
I'd also break down vote totals by Guard/Exit flag combination (#28328 (moved)). Note, however, that these Guard and Exit flags would also be taken from the vote, not from the consensus. Mostly a documentation issue again.
And I'd include consensus totals (#28352 (moved)). Just mentioning those for completeness.
irl, please review my tasks-28137-28328-28352 branch that implements this ticket along with the other two mentioned tickets. (It simply made more sense to have a single branch, not three, also to reduce the review work.)