Opened 3 years ago

Last modified 6 months ago

#16843 assigned enhancement

Add all bwauth measurements (from votes)

Reported by: cypherpunks Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Normal Keywords: tor-bwauth-needs
Cc: tyseom Actual Points:
Parent ID: #24834 Points:
Reviewer: Sponsor:

Description (last modified by tyseom)

As discussed in #16020, it would be handy to have all measurements from bwauths included in onionoo data.

The main concert is the additional workload, Karsten wrote:
"
Including all bwauth measurements would certainly be handy, but that would require parsing votes which we don't do right now. Onionoo is already choking on parsing all the descriptors published every hour, and votes are not exactly tiny. I'd say don't expect this to happen anytime soon. But I agree that it would be really useful to have.
"

Child Tickets

Change History (13)

comment:1 Changed 3 years ago by tyseom

Cc: tyseom added
Description: modified (diff)

comment:2 Changed 10 months ago by karsten

Severity: Normal
Summary: add all bwauth measurements (from votes)Add all bwauth measurements (from votes)

Capitalize summary.

comment:3 Changed 10 months ago by karsten

Owner: set to metrics-team
Status: newassigned

comment:4 Changed 6 months ago by teor

Keywords: tor-bwauth-needs added
Parent ID: #24834

There are a few tickets that want this feature, #24834 is one of them.
We would really like this feature so we can map bwauth bias.

comment:5 in reply to:  4 ; Changed 6 months ago by karsten

Replying to teor:

There are a few tickets that want this feature, #24834 is one of them.
We would really like this feature so we can map bwauth bias.

Understood. However, this feature is still unrealistic to implement without doing major code changes first. Earlier today I parsed votes for something unrelated (delay between relays announcing a new version and authorities including that in their vote), and I was once more surprised how big votes are compared to other descriptors. We simply cannot handle this data in Onionoo right now. Sorry!

comment:6 in reply to:  5 ; Changed 6 months ago by teor

Replying to karsten:

Replying to teor:

There are a few tickets that want this feature, #24834 is one of them.
We would really like this feature so we can map bwauth bias.

Understood. However, this feature is still unrealistic to implement without doing major code changes first. Earlier today I parsed votes for something unrelated (delay between relays announcing a new version and authorities including that in their vote), and I was once more surprised how big votes are compared to other descriptors. We simply cannot handle this data in Onionoo right now. Sorry!

Do you need more developer time, more disk, more CPU, or more RAM?
Because we can try to make these things happen.

(Another alternative for #24834 is that we build a quick stem script to export the data we need, and import it into Relay Search.)

comment:7 in reply to:  6 Changed 6 months ago by karsten

Replying to teor:

Do you need more developer time, more disk, more CPU, or more RAM?
Because we can try to make these things happen.

So, I wrote a quick hack of an Onionoo that downloads and reads votes in addition to all the other descriptors. Here's what I learned:

  • The local descriptor cache without votes is 569M. With votes it's 3.3G. That's an increase of 494%!
  • The time to read 3 days of descriptors in the hourly run without votes is 10 minutes. With votes it's 16 minutes. That's an increase of 60%. (But we typically only read 1 hour of descriptors per hour.)

So, I'd say we'd need these things to make this happen:

  • Some more disk space on the Onionoo hosts, but not really much more RAM or CPUs. (Reading descriptors is still done in a single thread, so reading more descriptors simply takes more time.)
  • Some developer time to write a specification patch for this new Onionoo feature, an implementation, and tests.
  • Some review time.

teor: Would you want to work on that specification patch, so that we have a better idea what we're supposed to build here?

If we want to do this properly, we should also resolve a few related issues that become a bit more important by adding this feature:

  • Ability to use compression when downloading descriptors from CollecTor using metrics-lib (no ticket yet)
  • Ability to process descriptors in parallel using metrics-lib (#21365 is related)
  • Ability to handle large descriptor files in metrics-lib (#20395)

(Another alternative for #24834 is that we build a quick stem script to export the data we need, and import it into Relay Search.)

For a short-term thing, sure. But long-term this sounds like a maintenance nightmare. (After all, Onionoo is the tool to export the data we need and import it into Relay Search.)

comment:8 Changed 6 months ago by irl

As I've just commented on #24834, if there is a concrete plan for this and I can see there are the resources to make it happen, then I'm happy to produce a short term solution using Stem for adding bandwidth votes to Relay Search.

comment:9 Changed 6 months ago by karsten

Before making a plan we'd need to know what pieces of information we'd want to extract from votes in the near future. In particular, we'll have to decide whether we want to extract things from just the latest set of votes (like most things in details documents) or keep a history of vote parts over time (like bandwidth, weights, etc. documents).

comment:10 in reply to:  9 Changed 6 months ago by teor

I'm happy to work on a spec patch. Where is the spec?

Replying to karsten:

Before making a plan we'd need to know what pieces of information we'd want to extract from votes in the near future. In particular, we'll have to decide whether we want to extract things from just the latest set of votes (like most things in details documents) or keep a history of vote parts over time (like bandwidth, weights, etc. documents).

For #24834, we just need the latest bandwidth votes for each relay.
Doing historical per-relay bandwidth votes would be cool, but also way out of scope.

I can't think of anything else right now, maybe current votes for relay versions, if we confirm that the version bug is in Tor and not Onionoo?

Karsten, irl, you might be more familiar with the kinds of vote data that people ask for?

comment:11 Changed 6 months ago by teor

(I think we should focus on the map feature, and try not to duplicate functionality from consensus-health.)

comment:12 Changed 6 months ago by irl

Having flags available in Onionoo for the latest votes would be useful, but the only advantage I can see over consensus-health is not having to load a huge page to see the flags for only one relay. There are some synthentic flags that are currently only possible in RS or in consensus-health as they both have different views of the data. In the future, we may have a consensus-health that uses Onionoo as a data source but this is a long way off. For now, we can just have bandwidth votes and maybe flags if karsten tells us that's not too much extra work/data storage/processing.

I'd like to see something like:

{
  "votes": [
    {
      "authority": "authority_name",
      "flags": ["flag1", "flag2"],
      "measured_bandwidth": 100
    },
    ...
  ]
}

or

{
  "votes": {
    "authority_name": {
      "flags": ["flag1", "flag2"],
      "measured_bandwidth": 100
    },
    ...
  }
}

comment:13 Changed 6 months ago by teor

Is using the name OK?
It's not guaranteed unique by the protocol, but I think it's ok to assume the public network will have this restriction.

How do we want to handle bridge authority votes?
Do we want to exclude them from scope until there are multiple bridge authorities?

And I think we should add the IPv6 addresses and the tor versions from the votes,

Note: See TracTickets for help on using tickets.