Opened 3 months ago

Last modified 3 weeks ago

#25999 needs_information enhancement

Build an abstraction layer over different consensus flavours

Reported by: teor Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/Stem Version:
Severity: Normal Keywords: descriptor
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In #25979, there was a bug in sbws because the ns consensus flavour has exit policy summaries in the consensus, but the microdescriptor consensus has exit policy summaries in microdescriptors.

The code to abstract over these differences is reasonably simple, but it's hard for people to read the specs, find out all the details, and implement it correctly.

For an example, see:
https://trac.torproject.org/projects/tor/ticket/25979#comment:4

It would be great if Stem included an abstraction layer over the consensus and descriptors, and just returned the attribute regardless of where it came from. (Maybe we could include this code in Tor instead, but it would be a major effort.)

Child Tickets

Change History (6)

comment:1 Changed 2 months ago by atagar

Keywords: descriptor added
Status: newneeds_information

Hi teor. I'm a little unclear, are you asking for the consensus class to call controller methods to fetch microdescritpors?

The proper solution I suspect is for tor to provide two controller methods so it's unambiguous what kind of consensus the caller gets.

comment:2 in reply to:  1 Changed 4 weeks ago by teor

Replying to atagar:

Hi teor. I'm a little unclear, are you asking for the consensus class to call controller methods to fetch microdescritpors?

No, I'm asking for Stem (or Tor) to provide a class that abstracts over the differences between ns consensuses, full relay descriptors, microdesc consensuses, and microdescriptors.

The class should be generic, so it can take a stem.descriptor.remote instance, or a controller, or a list of directory document objects (or maybe even a data directory?).

For example, when I ask this class for a relay's IPv6 address:

  • if there is a ns consensus containing that relay, use that IPv6 address, or
  • if there is a microdesc consensus containing that relay, and the consensus method is 27 or later, use that IPv6 address, or
  • if there is a full descriptor for that relay, use that IPv6 address, or
  • if there is a microdescriptor for that relay, and the consensus method is 27 or earlier, use that IPv6 address.

IPv6 is one of the most complicated cases, because it moved between directory documents.

Here's a simpler example:

If I ask this class for a relay's ed25519 id:

  • if there is a ns consensus containing that relay, use that ed25519 id, or
  • if there is a full descriptor for that relay, use that ed25519 id, or
  • if there is a microdescriptor for that relay, use that ed25519 id.


The proper solution I suspect is for tor to provide two controller methods so it's unambiguous what kind of consensus the caller gets.

That's a good idea, but it doesn't solve the problem on clients, which only have one type of consensus. And it doesn't solve the general issue, which is that people need to know where relay attributes are located before they can write correct stem code.

comment:3 Changed 3 weeks ago by atagar

Hi teor, interesting idea. In Stem I could provide a higher level 'Relay' class that lazy loads whatever descriptors it needs to get commonly desired data (exit policy, contact info, etc). This would need to be based on stem.descriptor.remote (the controller interface relies too much on caching and the client's torrc to be reliable).

Honestly I wonder if we should rethink our dir-spec more fundamentally. It's grown organically and honestly the myriad of documents is more confusing than it probably needs to be.

comment:4 in reply to:  3 Changed 3 weeks ago by teor

Replying to atagar:

Hi teor, interesting idea. In Stem I could provide a higher level 'Relay' class that lazy loads whatever descriptors it needs to get commonly desired data (exit policy, contact info, etc). This would need to be based on stem.descriptor.remote (the controller interface relies too much on caching and the client's torrc to be reliable).

Ok, that would be very helpful. And it's good to know that we can't fix this issue in Tor by modifying the control spec.

Honestly I wonder if we should rethink our dir-spec more fundamentally. It's grown organically and honestly the myriad of documents is more confusing than it probably needs to be.

But the documents in the dir-spec primarily exist for Tor clients (including relays) to efficiently use the network. They don't exist for convenient information retrieval by analysts. (That's why we have Stem, Collector, Onionoo, Relay Search, and other tools.)

Here's why we have each document type:

  • ns (original) consensus flavour - a comprehensive consensus, used by old clients, and for detailed analysis by tools and people
  • directory authority certificates - validating consensus signatures, used by all Tor instances
  • relay descriptors - a signed record of relay attributes, used by bridge clients, and for detailed analysis by tools and people
  • relay extrainfo descriptors - a signed record of relay statistics, used by metrics
  • microdescriptor consensus flavour - a smaller consensus to save bandwidth, used by all recent clients, relays, and bridges
  • microdescriptors - a smaller record of unchanging relay attributes to save bandwidth, used by all recent clients, relays, and bridges
  • bridge authority legacy v2 consensus format - used by BridgeDB

I might have missed a document type or two, but I can't see any we could remove or even combine.

I think that we could redesign the directory URL scheme, but it would be a long time before we could get rid of legacy URLs.

Last edited 3 weeks ago by teor (previous) (diff)

comment:5 Changed 3 weeks ago by atagar

I might have missed a document type or two, but I can't see any we could remove or even combine.

At the end of the day data comes from three sources...

  • From relays via a server descriptor.
  • From relays via an extrainfo descriptor.
  • From authorities via the router status entry (ex. flags, bwauth measurements, etc).

Microdescriptors are nothing more than a distillation of the server descriptor so downloads are smaller. Unless I'm missing something there's no reason anyone beside tor itself should care about those.

The thing I think we *can* simplify is the consensus. I'm at a loss for a reason to have both a standard and microdescriptor consensus. Maybe the split's for historical backward compatibility?

ns (original) consensus flavour - a comprehensive consensus, used by old clients, and for detailed analysis by tools and people

That's what I'm unsure about. Microdescriptors were added enough years ago that we likely already cut them out of the network. As for analysis, the microdescriptor consensus and server descriptors have the same data.

Ok, that would be very helpful.

Do we have anyone eager to use such a class? It would be sad to implement such a thing only to see it go unused. ;)

comment:6 in reply to:  5 Changed 3 weeks ago by teor

Replying to atagar:

I might have missed a document type or two, but I can't see any we could remove or even combine.

At the end of the day data comes from three sources...

  • From relays via a server descriptor.
  • From relays via an extrainfo descriptor.
  • From authorities via the router status entry (ex. flags, bwauth measurements, etc).

Microdescriptors are nothing more than a distillation of the server descriptor so downloads are smaller. Unless I'm missing something there's no reason anyone beside tor itself should care about those.

The thing I think we *can* simplify is the consensus. I'm at a loss for a reason to have both a standard and microdescriptor consensus. Maybe the split's for historical backward compatibility?

ns (original) consensus flavour - a comprehensive consensus, used by old clients, and for detailed analysis by tools and people

That's what I'm unsure about. Microdescriptors were added enough years ago that we likely already cut them out of the network.

No, relays on 0.2.8 and earlier use descriptors for their circuits, and there are still a few of them around (even though they are unsupported, they still work). So do some really old clients, which at the very least will need a consensus substitute to avoid misbehaving and bringing down the network:
https://gitweb.torproject.org/torspec.git/tree/proposals/266-removing-current-obsolete-clients.txt

Also, Torflow and now sbws depend on the ns consensus. I bet Onionoo, depictor, and doctor would also fail if we got rid of the ns consensus. If we want to migrate away from it, that's a lot of work.

As for analysis, the microdescriptor consensus and server descriptors have the same data.

Ok, that would be very helpful.

Do we have anyone eager to use such a class? It would be sad to implement such a thing only to see it go unused. ;)

If it was available, sbws would have used it.
If it was available, we could more easily migrate sbws, depictor and doctor away from using ns consensuses.

Note: See TracTickets for help on using tickets.