Opened 4 years ago

Closed 3 years ago

#16599 closed enhancement (user disappeared)

Onionoo feature: family ID

Reported by: virgilgriffith Owned by:
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Normal Keywords:
Cc: saitosean@…, tyseom Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Whereas each relay is uniquely identified (by its fingerprint), families do not have unique identifiers. We want to change that. As it is currently unclear whether family ids will be implemented (prop 242), we (Virgil Griffith and Sean Saito) propose the stopgap solution that, at least within Onionoo, each family is assigned a unique positive integer (UUID) derived from the "first_seen" field of the oldest relay within its "effective_family" (symmetric family). If two families have exactly the same oldest "first_seen" field, the families are assigned UUIDs based on asciibetical order of the oldest relay's fingerprint.

How it works
To compute the UUIDs, Onionoo would iterate over all effective families and record the earliest "first_seen" value within each effective family. The effective families are then sorted by their first_seen value (ties broken by asciibetical order), and assigned an index (UUID) starting from 1. This UUID number will be invariant under relays leaving / being added to the family. When a new effective_family is discovered it is assigned the next UUID.

Effective families
Relays of an effective family are defined as a set of relays where every relay has a
symmetric connection to every other relay, or, in other words, every relay points to every other relay. Work is currently being done (ticket #16276) to resolve asymmetric connections between relays (where one relay points to a relay but that relay does not point back).

Finally, upon receiving a request with a specific relay's fingerprint, Onionoo would return:

  1. The symmetric family, a.k.a. the "effective family" of that relay. If there is none returns an empty list.
  2. The UUID (a positive integer) of that effective family.

References

Child Tickets

Change History (15)

comment:1 Changed 4 years ago by leeroy

When the family sets are computed they are referenced by an untyped family id. I'd say this should be quite doable.

comment:2 in reply to:  description ; Changed 4 years ago by cypherpunks

Replying to virgilgriffith:

If two families have exactly the same oldest "first_seen" field, the families are assigned UUIDs based on asciibetical order of the oldest relay's fingerprint.

Multiple families can have the same oldest relay (same first_seen + same fingerprint) as one relay might be part of multiple families (depending on the point of view).

This UUID number will be invariant under relays leaving / being added to the family.

Does that mean the following scenario will link all families over time and the family id will never change?

T0: a b c 
      b c d
        c d e
T3:       d e f

(a, b, c, d, e, f are all relays, 'a' is the oldest; time is progressing vertically from top to bottom)

Finally, upon receiving a request with a specific relay's fingerprint, Onionoo would return:

  1. The symmetric family, a.k.a. the "effective family" of that relay. If there is none returns an empty list.
  2. The UUID (a positive integer) of that effective family.

A relay might be part of multiple families, so it would have to return 0 to N family IDs?

comment:3 Changed 4 years ago by virgilgriffith

Does that mean the following scenario will link all families over time and the family id will never change?

Yes. They would have the family_id from relay a.

A relay might be part of multiple families, so it would have to return 0 to N family IDs?

It's news to me that effective_families can overlap. Is this true?

comment:4 Changed 4 years ago by karsten

Before we discuss possible implementations of such a family ID, which is certainly algorithmically fun, I'd first want to talk more about the intended use case. What do you need a unique family ID for? Onionoo already has a family parameter that accepts a fingerprints and returns that relay plus all relays in that relay's family that also list that relay in their family. What you could do in your Roster application is accept any relay fingerprint of a user's relay family and simply hand that fingerprint over to Onionoo. Then you wouldn't have to introduce the concept of family IDs at all.

Now to a possible implementation of such a family ID, assuming there's indeed a valid use case for it: I don't think we can implement the UUID schema you describe. One issue is that two Onionoo instances, one of which having intermittend network problems, may have a slightly different view on the Tor network and therefore assign UUIDs differently. And what if a relay that is around for years is suddenly reconfigured to join a family? And more generally, what if two families are merged, or if a family falls apart? I'm afraid we'd find many more edge cases where the proposed numbering schema would fail as soon as we start implementing it. The concept seems too fragile to me.

There's also a problem with the definition of "effective family". You write that "Relays of an effective family are defined as a set of relays where every relay has a symmetric connection to every other relay, or, in other words, every relay points to every other relay." But that's not the case right now. Consider a relay A with family A, B, C. If B's family is A, B, and C's family is A, C, we'd right now say that A's effective family is A, B, C. Maybe that means that our current definition of "effective family" is flawed.

Here's an alternative idea which is, admittedly, not fleshed out yet. Can we combine relay fingerprints of family members somehow into a family fingerprint and use that to identify the family? Of course, that family fingerprint would change once family members leave or new members join, but maybe we can use some kind of distance metric and identify the family that is closest to that fingerprint: "The family fingerprint XY is unknown, but there are two families with similar fingerprints XZ and WY." I could trying out bit-wise operations or probabilistic data structures for family fingerprints and then using something like Hamming distance for the comparison.

Another option to consider here is to just go ahead and implement proposal 242. This might be a good opportunity to do the research and the coding, and it's not only Roster/Onionoo users that would benefit from that, but all relay family operators.

comment:5 in reply to:  2 ; Changed 4 years ago by seansaito

Replying to cypherpunks:

A relay might be part of multiple families, so it would have to return 0 to N family IDs?

The issue is that asymmetric connections (where relay A points to relay B but not the other way around) do exist. This does not mean that a relay belongs to multiple families, rather, based on some test scripts I've run, the operator seems to have forgotten to refer a new relay to all its family members.

However, work is currently being done to resolve this issue.
https://trac.torproject.org/projects/tor/ticket/16276

Once this is implemented (thanks to karsten), an effective family will essentially be a graph theoretical perfect graph, where every pair of nodes has a symmetric connection. This would preclude any chances of a relay to be connected to two separate families, since an effective family can then be verified using the handshake theorem.

comment:6 in reply to:  3 Changed 4 years ago by cypherpunks

Replying to virgilgriffith:

A relay might be part of multiple families, so it would have to return 0 to N family IDs?

It's news to me that effective_families can overlap. Is this true?

Yes, have a look at DFRI families.

comment:7 in reply to:  5 ; Changed 4 years ago by cypherpunks

Replying to seansaito:

Replying to cypherpunks:

A relay might be part of multiple families, so it would have to return 0 to N family IDs?

The issue is that asymmetric connections (where relay A points to relay B but not the other way around) do exist. This does not mean that a relay belongs to multiple families, rather, based on some test scripts I've run, the operator seems to have forgotten to refer a new relay to all its family members.

I'm not referring to asymmetrically declared families (which I do not consider families at all) but multiple overlapping families that share some of their relays.

comment:8 in reply to:  4 Changed 4 years ago by cypherpunks

Replying to karsten:

The concept seems too fragile to me.

+1

comment:9 in reply to:  4 Changed 4 years ago by cypherpunks

Replying to karsten:

What do you need a unique family ID for?

I guess the main use case is 'identify and track a family as it changes over time'.

comment:10 in reply to:  5 ; Changed 4 years ago by karsten

Replying to seansaito:

Replying to cypherpunks:

A relay might be part of multiple families, so it would have to return 0 to N family IDs?

The issue is that asymmetric connections (where relay A points to relay B but not the other way around) do exist. This does not mean that a relay belongs to multiple families, rather, based on some test scripts I've run, the operator seems to have forgotten to refer a new relay to all its family members.

However, work is currently being done to resolve this issue.
https://trac.torproject.org/projects/tor/ticket/16276

Once this is implemented (thanks to karsten), an effective family will essentially be a graph theoretical perfect graph, where every pair of nodes has a symmetric connection. This would preclude any chances of a relay to be connected to two separate families, since an effective family can then be verified using the handshake theorem.

I'm afraid there's some confusion: #16276 adds the list of relays that are in a mutual family relationship with a given relay, but that doesn't mean that other members in that family must be in a mutual family relationship with each other.

I think what you envision is a "perfect family" where all members are in mutual family relationships with each other. But that's not what will be implemented.

Let me suggest something else: rather than hoping that I'll implement the right thing in Onionoo, let's first perform an analysis that takes, say, all Tor descriptors published in 2015 as input and computes families and family identifiers and evaluates how useful they are for your purpose. And once we come up with definitions that work well for you, I'll implement them in Onionoo. What do you think?

comment:11 Changed 4 years ago by tyseom

Cc: tyseom added
Summary: Onionoo featureOnionoo feature: family ID

comment:12 Changed 4 years ago by virgilgriffith

Let me suggest something else: rather than hoping that I'll implement the right thing in Onionoo, let's first perform an analysis that takes, say, all Tor descriptors published in 2015 as input and computes families and family identifiers and evaluates how useful they are for your purpose. And once we come up with definitions that work well for you, I'll implement them in Onionoo. What do you think?

Agreed. I'll devise a nice family definition for Roster's purposes and when those are settled will resume this discussion.

comment:13 in reply to:  10 Changed 4 years ago by seansaito

I'm afraid there's some confusion: #16276 adds the list of relays that are in a mutual family relationship with a given relay, but that doesn't mean that other members in that family must be in a mutual family relationship with each other.

I think what you envision is a "perfect family" where all members are in mutual family relationships with each other. But that's not what will be implemented.

I understand now. I apologize for the confusion.

Let me suggest something else: rather than hoping that I'll implement the right thing in Onionoo, let's first perform an analysis that takes, say, all Tor descriptors published in 2015 as input and computes families and family identifiers and evaluates how useful they are for your purpose. And once we come up with definitions that work well for you, I'll implement them in Onionoo. What do you think?

That sounds good to me as well.

comment:14 in reply to:  7 Changed 4 years ago by seansaito

I'm not referring to asymmetrically declared families (which I do not consider families at all) but multiple overlapping families that share some of their relays.

Right, thanks for the clarification. Do you know how often this occurs?

comment:15 Changed 3 years ago by virgil

Resolution: user disappeared
Severity: Normal
Status: newclosed
Note: See TracTickets for help on using tickets.