Add all bwauth measurements (from votes)

added component::metrics/onionoo owner::metrics-team parent::24834 priority::medium severity::normal status::assigned tor-bwauth-needs type::enhancement labels

Trac:
Username: tyseom
Cc: N/A to tyseom
Description: As discussed in #16020 (moved), it would be handy to have all measurements from bwauths included in onionoo data.

The main concert is the additional workload, Karsten wrote:

Including all bwauth measurements would certainly be handy, but that would require parsing votes which we don't do right now. Onionoo is already choking on parsing all the descriptors published every hour, and votes are not exactly tiny. I'd say don't expect this to happen anytime soon. But I agree that it would be really useful to have.

to

As discussed in #16020 (moved), it would be handy to have all measurements from bwauths included in onionoo data.

The main concert is the additional workload, Karsten wrote: " Including all bwauth measurements would certainly be handy, but that would require parsing votes which we don't do right now. Onionoo is already choking on parsing all the descriptors published every hour, and votes are not exactly tiny. I'd say don't expect this to happen anytime soon. But I agree that it would be really useful to have. "

Capitalize summary.

Trac:
Severity: N/A to Normal
Sponsor: N/A to N/A
Summary: add all bwauth measurements (from votes) to Add all bwauth measurements (from votes)
Reviewer: N/A to N/A

Trac:
Status: new to assigned
Owner: N/A to metrics-team

There are a few tickets that want this feature, #24834 (moved) is one of them. We would really like this feature so we can map bwauth bias.

Trac:
Keywords: N/A deleted, tor-bwauth-needs added
Parent: N/A to #24834 (moved)

Replying to teor:

There are a few tickets that want this feature, #24834 (moved) is one of them. We would really like this feature so we can map bwauth bias.

Understood. However, this feature is still unrealistic to implement without doing major code changes first. Earlier today I parsed votes for something unrelated (delay between relays announcing a new version and authorities including that in their vote), and I was once more surprised how big votes are compared to other descriptors. We simply cannot handle this data in Onionoo right now. Sorry!

Replying to karsten:

Replying to teor:

There are a few tickets that want this feature, #24834 (moved) is one of them. We would really like this feature so we can map bwauth bias.

Understood. However, this feature is still unrealistic to implement without doing major code changes first. Earlier today I parsed votes for something unrelated (delay between relays announcing a new version and authorities including that in their vote), and I was once more surprised how big votes are compared to other descriptors. We simply cannot handle this data in Onionoo right now. Sorry!

Do you need more developer time, more disk, more CPU, or more RAM? Because we can try to make these things happen.

(Another alternative for #24834 (moved) is that we build a quick stem script to export the data we need, and import it into Relay Search.)

Replying to teor:

Do you need more developer time, more disk, more CPU, or more RAM? Because we can try to make these things happen.

So, I wrote a quick hack of an Onionoo that downloads and reads votes in addition to all the other descriptors. Here's what I learned:

The local descriptor cache without votes is 569M. With votes it's 3.3G. That's an increase of 494%!
The time to read 3 days of descriptors in the hourly run without votes is 10 minutes. With votes it's 16 minutes. That's an increase of 60%. (But we typically only read 1 hour of descriptors per hour.)

So, I'd say we'd need these things to make this happen:

Some more disk space on the Onionoo hosts, but not really much more RAM or CPUs. (Reading descriptors is still done in a single thread, so reading more descriptors simply takes more time.)
Some developer time to write a specification patch for this new Onionoo feature, an implementation, and tests.
Some review time.

teor: Would you want to work on that specification patch, so that we have a better idea what we're supposed to build here?

If we want to do this properly, we should also resolve a few related issues that become a bit more important by adding this feature:

Ability to use compression when downloading descriptors from CollecTor using metrics-lib (no ticket yet)
Ability to process descriptors in parallel using metrics-lib (#21365 (moved) is related)
Ability to handle large descriptor files in metrics-lib (#20395 (moved))

(Another alternative for #24834 (moved) is that we build a quick stem script to export the data we need, and import it into Relay Search.)

For a short-term thing, sure. But long-term this sounds like a maintenance nightmare. (After all, Onionoo is the tool to export the data we need and import it into Relay Search.)

As I've just commented on #24834 (moved), if there is a concrete plan for this and I can see there are the resources to make it happen, then I'm happy to produce a short term solution using Stem for adding bandwidth votes to Relay Search.

Before making a plan we'd need to know what pieces of information we'd want to extract from votes in the near future. In particular, we'll have to decide whether we want to extract things from just the latest set of votes (like most things in details documents) or keep a history of vote parts over time (like bandwidth, weights, etc. documents).

I'm happy to work on a spec patch. Where is the spec?

Replying to karsten:

Before making a plan we'd need to know what pieces of information we'd want to extract from votes in the near future. In particular, we'll have to decide whether we want to extract things from just the latest set of votes (like most things in details documents) or keep a history of vote parts over time (like bandwidth, weights, etc. documents).

For #24834 (moved), we just need the latest bandwidth votes for each relay. Doing historical per-relay bandwidth votes would be cool, but also way out of scope.

I can't think of anything else right now, maybe current votes for relay versions, if we confirm that the version bug is in Tor and not Onionoo?

Karsten, irl, you might be more familiar with the kinds of vote data that people ask for?

(I think we should focus on the map feature, and try not to duplicate functionality from consensus-health.)

Having flags available in Onionoo for the latest votes would be useful, but the only advantage I can see over consensus-health is not having to load a huge page to see the flags for only one relay. There are some synthentic flags that are currently only possible in RS or in consensus-health as they both have different views of the data. In the future, we may have a consensus-health that uses Onionoo as a data source but this is a long way off. For now, we can just have bandwidth votes and maybe flags if karsten tells us that's not too much extra work/data storage/processing.

I'd like to see something like:

{
  "votes": [
    {
      "authority": "authority_name",
      "flags": ["flag1", "flag2"],
      "measured_bandwidth": 100
    },
    ...
  ]
}

or

{
  "votes": {
    "authority_name": {
      "flags": ["flag1", "flag2"],
      "measured_bandwidth": 100
    },
    ...
  }
}

Is using the name OK? It's not guaranteed unique by the protocol, but I think it's ok to assume the public network will have this restriction.

How do we want to handle bridge authority votes? Do we want to exclude them from scope until there are multiple bridge authorities?

And I think we should add the IPv6 addresses and the tor versions from the votes,

mentioned in issue #24834 (moved)

mentioned in issue #27571 (moved)

Add all bwauth measurements (from votes)

Child items 0

Activity