Opened 7 years ago

Closed 23 months ago

#6662 closed enhancement (wontfix)

Support grouping by family

Reported by: cypherpunks Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Relay Search Version:
Severity: Normal Keywords:
Cc: lunar, gsathya, nusenu@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Currently https://compass.torproject.org/
offers to group by country or AS number.

It would be nice to have an additional check box for family.

Child Tickets

Attachments (3)

families-2012-08-23-14-00-00.txt (15.7 KB) - added by karsten 7 years ago.
families-2012-08-23-14-00-00-fixed.txt (189.6 KB) - added by karsten 7 years ago.
families-2012-08-24-12-00-00.txt (208.9 KB) - added by karsten 7 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 7 years ago by karsten

I agree that it would be useful to have such a check box, and I really want to implement it, because I'm interested in the results, too.

But I'm more and more convinced that it's not possible to group relays by family, or at least not in an unambiguous way. What is possible is looking up relays in the same family of a given relay. But looking at all relays and grouping them by family seems hard if not even impossible.

Here's an example. Assume we have three relays: A, B, and C. These relays state the following family relationships:

  • A: A, B
  • B: A, B, C
  • C: B, C

We require mutual agreement about being in the same family, so we could either come up with family A, B or with family B, C. Which one is correct?

Of course, we could apply fancy heuristics to find largest families and break ties in favor of higher overall consensus weights, smaller fingerprints, or something. But that still sounds like hacking to me. Before we enter that stage, I'd like to know if this problem can be solved otherwise. Got any ideas?

comment:2 in reply to:  1 Changed 7 years ago by cypherpunks

Hi Karsten,

thank you for coming back to me so quick.

Replying to karsten:

I agree that it would be useful to have such a check box, and I really want to implement it, because I'm interested in the results, too.

But I'm more and more convinced that it's not possible to group relays by family, or at least not in an unambiguous way. What is possible is looking up relays in the same family of a given relay. But looking at all relays and grouping them by family seems hard if not even impossible.

Here's an example. Assume we have three relays: A, B, and C. These relays state the following family relationships:

  • A: A, B
  • B: A, B, C
  • C: B, C

We require mutual agreement about being in the same family, so we could either come up with family A, B or with family B, C. Which one is correct?

In this situation we have two overlapping families.
family1 = A,B
family2 = B,C

For compass (even if this is not the case in real path selection by tor clients) I would choose the simple approach: merge overlapping families:
family (as seen/interpreted by compass) = A, B, C

If you would go this path (merging such overlapping families) you could still ad an option that requires strict mutual agreement. (Than you would have two separate families: family1 & family2).

What do you think about this approach (merging overlapping families)?

comment:3 Changed 7 years ago by karsten

Owner: set to karsten
Status: newaccepted

Hey, I like both suggestions. I was so focused on resolving overlapping families and coming up with nicely separated families that I didn't think of either merging them into "extended families" or accepting the fact that they're overlapping. :) I'll experiment with both family definitions and let you know what I come up with. Compass integration will then be the next step.

comment:4 Changed 7 years ago by cypherpunks

Thank you, looking forward to seeing the first results!

Changed 7 years ago by karsten

comment:5 Changed 7 years ago by karsten

I just attached early results of overlapping/extended families as defined by you. These results look plausible to me, but I didn't confirm them as carefully as I'd like to, and I gotta run now and don't know if I have time today to continue working on this. Maybe you want to have a look?

comment:6 in reply to:  5 Changed 7 years ago by cypherpunks

Hi Karsten,

Replying to karsten:

Maybe you want to have a look?

Looks good.

I did some manual checks of your results, because the lines with only two family members were not clear to me,
but after checking them manually it was clear that the "overlapping node" was down and therefore not showing up in the second part (merged families) of the txt file.

3cce3a91f6a625~8DE5 bc1245cbe16d5ee9b2~2D25 overlapping node: A0EA6A3D1B4D30F5005E89501DB68D4E14A0E183 (down)
AMORPHIS~4B47 KAMAGURKA~4045   overlapping node: 30CC8E08C1B2B40B37A6FAF0E1DF08C007138135 (down)

I'm also going to contact these relay operators, so they can fix their family settings.

Configuring families with a high number of relays that are regularly extended is a PITA, why not using this definition of families for Tor directly? This would reduce the number of nodes to reconfigure to 2 regardless how many nodes you have, but I would be surprised if no one else had that thought already. Anyway I'll propose it on tor-dev.

comment:7 Changed 7 years ago by cypherpunks

I'm wondering why the following family is showing up in the results, because it seems to be a mutually setup family (one relay is down since 2012-03 though).

CamerasInTheSky~A9F3 MicsInTrees~5CFC RingThemBells~A0C7

A9F3
$5CFC04FFB4A95CBCC25DF03BC41BE4CAE0D78870
$A0C73FF383344BFD1217EE6CCB3160B6F189D850
$BC808734C6C7A166403C69F8F42A5B6EF8065F8C

5CFC
$A9F35D9A0E1186DA29393F277C796606085FCE74
$A0C73FF383344BFD1217EE6CCB3160B6F189D850
$BC808734C6C7A166403C69F8F42A5B6EF8065F8C

A0C7
$A9F35D9A0E1186DA29393F277C796606085FCE74
$5CFC04FFB4A95CBCC25DF03BC41BE4CAE0D78870
$BC808734C6C7A166403C69F8F42A5B6EF8065F8C

BC80 (down)
$A9F35D9A0E1186DA29393F277C796606085FCE74
$5CFC04FFB4A95CBCC25DF03BC41BE4CAE0D78870 
$A0C73FF383344BFD1217EE6CCB3160B6F189D850 

Changed 7 years ago by karsten

comment:8 Changed 7 years ago by karsten

Replying to cypherpunks:

I did some manual checks of your results, because the lines with only two family members were not clear to me,
but after checking them manually it was clear that the "overlapping node" was down and therefore not showing up in the second part (merged families) of the txt file.

3cce3a91f6a625~8DE5 bc1245cbe16d5ee9b2~2D25 overlapping node: A0EA6A3D1B4D30F5005E89501DB68D4E14A0E183 (down)
AMORPHIS~4B47 KAMAGURKA~4045   overlapping node: 30CC8E08C1B2B40B37A6FAF0E1DF08C007138135 (down)

That's true, we cannot confirm mutual family relationships if one of the nodes is down. That means that the two families A-B and B-C cannot be merged to A-B-C if B is down.

But that being said, I found a bug in my code where mutual checks were not performed correctly. I attached a fixed list based on the same consensus.

comment:9 in reply to:  7 ; Changed 7 years ago by karsten

Replying to cypherpunks:

I'm wondering why the following family is showing up in the results, because it seems to be a mutually setup family (one relay is down since 2012-03 though).

I'm not sure what you mean. I don't see where this is wrong, but it could be that the issue is fixed in the newly attached document. If not, can you explain where the problem lies?

Also, if you can, please go through the fixed list and let me know if there are any other problems. I looked through the list once or twice and it looked plausible (though the first one did that, too). Thanks!

comment:10 in reply to:  8 Changed 7 years ago by cypherpunks

Replying to karsten:

That's true, we cannot confirm mutual family relationships if one of the nodes is down. That means that the two families A-B and B-C cannot be merged to A-B-C if B is down.

I wouldn't mind declaring the family A-B-C even if B is down, as long as we are able to find a valid descriptor for B within the last X days.

comment:11 in reply to:  9 Changed 7 years ago by cypherpunks

Replying to karsten:

I'm not sure what you mean. I don't see where this is wrong, but it could be that the issue is fixed in the newly attached document. If not, can you explain where the problem lies?

OK, there was probably a misunderstanding on my side what your list actually contains. I thought it only contains "imperfect" families that have been merged afterwards but if it contains all families (even those that have complete and mutual agreements) than it is clear.

Changed 7 years ago by karsten

comment:12 Changed 7 years ago by karsten

Replying to cypherpunks:

I wouldn't mind declaring the family A-B-C even if B is down, as long as we are able to find a valid descriptor for B within the last X days.

Oh yes, we can and probably should do that. We have the descriptors of relays that have been running in the past seven days. I attached a new output file.

comment:13 Changed 6 years ago by karsten

Cc: lunar gsathya added
Owner: karsten deleted
Status: acceptedassigned

Unassigning from myself and cc'ing two people who have contributed Compass patches in the past.

comment:14 Changed 5 years ago by cypherpunks

Hi Karsten,

is it possible to find your preliminary code somewhere so that others can continue your work?

thanks!

comment:15 Changed 5 years ago by tyseom

Cc: nusenu@… added

comment:16 in reply to:  14 Changed 5 years ago by karsten

Replying to cypherpunks:

is it possible to find your preliminary code somewhere so that others can continue your work?

I think it's this code: https://gitweb.torproject.org/metrics-tasks.git/tree/task-6662/Eval.java. Good luck!

comment:17 Changed 5 years ago by cypherpunks

The code to gather families is already there:
https://gitweb.torproject.org/compass.git/tree/compass.py#n39
(maybe it was not there back then?)

So we could reuse that filter and run every fingerprint against it + unique.

Last edited 5 years ago by cypherpunks (previous) (diff)

comment:19 Changed 2 years ago by karsten

Severity: Normal
Summary: group by familySupport grouping by family

Tweak summary a bit.

comment:20 Changed 2 years ago by karsten

Owner: set to metrics-team

comment:21 Changed 23 months ago by irl

Component: Metrics/CompassMetrics/Atlas

In #23517 it is planned to merge Compass functionality with Relay Search (formerly known as Atlas). These tickets may be relevant to that work and so these are being reassigned to the Metrics/Atlas component.

comment:22 Changed 23 months ago by irl

Resolution: wontfix
Status: assignedclosed

Grouping by family is a hard problem.

Relay Search supports the following queries:

Aggregate simple family data in single table row:
https://atlas.torproject.org/#aggregate/all/family:FINGERPRINT

Show all relays in a family in multiple table rows:
https://atlas.torproject.org/#search/family:FINGERPRINT

A details view with aggregated graphs may be added in #23509.

I think in general this addresses most use cases. A true grouping by family may be enabled by using a database to back Onionoo but until we get there, this is just not implementable.

Note: See TracTickets for help on using tickets.