here is [2]'s relay descriptor family line from https://collector.torproject.org/recent/relay-descriptors/server-descriptors/2015-06-04-16-05-52-server-descriptors
proving that it does NOT list [1] as a member:
{{{
family $0E3DE4D738C91A8BB56432EF2DA7C28D4108F4A2 $169019BFFECBBB9A35878BCF4C7AE1FF37AD8452 $231C2B9C8C31C295C472D031E06964834B745996 $29FA22A7AEE4D8AAEA5BF9271834642EEDA849D7 $3EE7D6BEAE015552C05E6AA40EBBCC69B251EC23 $4C7AA094EF5072F08C89A72353227FACBE58092F $4C7BF55B1BFF47993DFF995A2926C89C81E4F04A $530E1038C3D2F746657B51F97831F9F422747180 $552996B383110A00DE456E010E3E74A4E54E54B0 $580A91DAE04CD402AE70E85E01C3B1FD5BE5136E $5CC55AF14792CE805CA486A2CA1EC25A4016474A $6A97482F1B228AF847A8FE83C75F68902C2DCA4D $6B7191639E179965FD694612C9B2C8FB4267B27D $7589D87E6669A04EE9EA081342006E2E5571AD43 $79D6A70B5E42DB93563074E8A385EDDB46FBB6DA $8B3D6AB6A692C8C92AC33E47C6F75C593F5AAF42 $91D23D8A539B83D2FB56AA67ECD4D75CC093AC55 $935BABE2564F82016C19AEF63C0C40B5753BA3D2 $93E8D956042F1C3572EFF87313F4794910584D3C $9643FE742A609FBE332E4D512A88DECF2E5207C2 $A37C47B03FF31CA6937D3D68366B157997FE7BCD $A55D07F6D78CE790E7D75DE665DF8365866BA870 $A703C8DB0466AB16375403D798005F568372E2D8 $A705EFD586626F690A71A215E6233D7E8581C374 $A99ED2CC51850744D8CFF194CF6B06900B8B13AD $BD5899EA64D83F7E885B2495DE0842F1A77E99AF $C9933B3725239B6FAB5227BA33B30BE7B48BB485 $CEDD9C7C371E753CDD14D292BCE1C7B8F4734D25 $E26AFC5F718E21AC502899B20C653AEFF688B0D2 $ECC17C0BA74113FB7A4F58DAF2FF1EE23CE5DF06 $FB40DC3C724E5AA6ECD3EA1F64C0F559B48F21B5 $FC37A7B1C264497BA01A08BF6D2C982A99D5B963
}}}
If you own two relays you can easily reproduce/cause this bug in onionoo by following these steps:
configure a mutual family between two relays [A] <> [B]
publish the descriptor to dir auths
wait for a bit (days)
remove the family config line on [B] + publish new descriptors
onionoo will still show the family on [A] (but removes it correctly from [B])
so apparently it doesn't verify the reverse path [A] < [B] again once it has been established once.
So I was in the area, and saw this ticket update, so I thought I would investigate. Is it possible you're not using the family parameter, but rather looking at the details document itself? Onionoo's protocol page describes the family parameter as a filter which: Return only the relay whose fingerprint matches the parameter value and all relays that this relay has listed in its family by fingerprint and that in turn have listed this relay in their family by fingerprint.
Which is different from the details document: Array of fingerprints or nicknames of relays in the same family as this relay. Omitted if empty or if a descriptor containing this information cannot be found.
The details document is a representation of the descriptor published. Which means it will match whatever is published by the OR, even if that includes an incomplete family declaration. You'll recall that such a situation is possible on tor's network. So if you look at the details page, for example by using the fingerprint, you'll see exactly what the OR declared in it's last published descriptor.
The index, which makes the filtering of data possible, is computed in NodeIndexer. If all this is true, the question(s) still remains, should details documents include the effective family or the declared family ? Should the protocol description, or the details document (produced by Atlas) be re-worded ?
(I would do more but Onionoo is down at the moment)
Yes, the description of the family field in the relay details object:
family array of strings optionalArray of fingerprints or nicknames of relays in the same family as this relay. Omitted if empty or if descriptor containing this information cannot be found.
Is definitely suggesting that it is not the raw family string from the descriptor, but the effective family (due to "in the same family").
So there are basically two (3?) options:
make the implementation do what the documentation says
update the documentation to match the implementation
or: have two explicit fields: declaredfamily and effectivefamily
Okay. One way of implementing this would be to include an (optional) effective_family_diff key to the details document. This could store the set difference between the declared family (from parsing the descriptor) and effective family (as in computing index for the family parameter).
If a node, W_OR, declares in it's descriptor a family including X_OR, Y_OR, Z_OR, the effective family for all four nodes is { W_OR, X_OR, Y_OR, Z_OR } iff a bidirectional relationship exists when checking the declared family of each. The effective_family_diff key would not exist in this case.
If, when checking a node in the family of W_OR, say Y_OR, the bidirectional relation does not exist, it is added to effective_family_diff. In this case the family key still contains { X_OR, Y_OR, Z_OR } but now effective_family_diff, for W_OR, contains { Y_OR }.
A client of Onionoo, which hasn't been updated, would only know to use the family key. An up-to-date client would be able to use effective_family_diff when rendering a view. Preserving backwards compatibility and giving the client a choice in how to visualize the family. How does it sound?
make the implementation do what the documentation says
or
or: have two explicit fields: declaredfamily and effectivefamily
In the proposed design I would have to check the diff key (or its absence) to tell me whether the family key is actually valid, I'd prefere something that is more straight forward. Your design is supperior over (3) when it comes to entry size since 3 would store more duplicate data.
Regarding backwards compatibility:
I guess there is no one seriously using family as it is documented since that would have triggered this discussion earlier. Better have something new and clean than to have some backward compatibility for something that has been broken.
Wait, doesn't (1) imply making the family key of the details document strictly effective family? Similarly, while (3) is simpler, it doesn't help an operator determine where there are problems. In comment:7, cypherpunks, mentions that (3) could be used by operators.
It's not that I'm opposed to simplicity, it's that two explicit fields require computing the difference to check for problems (or do it manually). If (1) means strictly displaying effective family, it too doesn't make it easy to find problems. For the use-case described by cypherpunks, in comment:7, an operator who checks via Onionoo wouldn't be able to easily determine where the problems are. Thoughts?
Wow, this looks like an excellent analysis above. Sorry for arriving late, but it seems you have this pretty much under control.
Regarding the bug, Onionoo indeed copies whatever relays define in their server descriptors to the details document without performing any checks. It's only the family parameter that is checked for actually mutual family relations.
As for the fix, let's start with the easy fix, which means fixing the documentation. How about:
"Array of fingerprints or nicknames of relays in the same family as this relay as reported by the relay. Omitted if empty or if descriptor containing this information cannot be found."
As long-term fix, let's think about adding separate field "effective_family" or similar for the sake of backward compatibility and to enable clients to detect misconfigurations. It's not that making backward-incompatible changes wouldn't be possible, but they would require a 1-month or better 2-month warning, knowing that some clients won't be updated at all in practice. Happy to do that if necessary, but even happier not to.
The implementation would probably go into NodeDetailsStatusUpdater, more precisely into phase 3 of the update process there. Ideally, we would be able to remove the related code from NodeIndexer and use the results from the newly added code. If anybody wants to start hacking on this, please say so here and go ahead.
Regarding the bug, Onionoo indeed copies whatever relays define in their server descriptors to the details document without performing any checks. It's only the family parameter that is checked for actually mutual family relations.
As for the fix, let's start with the easy fix, which means fixing the documentation. How about:
"Array of fingerprints or nicknames of relays in the same family as this relay as reported by the relay. Omitted if empty or if descriptor containing this information cannot be found."
I would prefer something more obvious without "in the same family as this relay".
Example:
Array of fingerprints or nicknames found in the family line of the relay's descriptor. This is basically a copy of the relay's MyFamily torrc configuration. Fingerprints and nicknames listed in this array do not necesarily have a mutual (effective) family with this relay.
As long-term fix, let's think about adding separate field "effective_family"
I would prefer something more obvious without "in the same family as this relay".
Example:
{{{
Array of fingerprints or nicknames found in the family line of the relay's descriptor. This is basically a copy of the relay's MyFamily torrc configuration. Fingerprints and nicknames listed in this array do not necesarily have a mutual (effective) family with this relay.
}}}
Let me try to rephrase that a bit by taking out technical terms as much as possible and instead explain this conceptually:
"Array of fingerprints or nicknames of relays that this relay considers to be part of its family. There are no cross-checks whether the listed relays exist or consider this relay part of their family, so that the effective family of this relay may be smaller. Omitted if empty or if descriptor containing this information cannot be found."
As long-term fix, let's think about adding separate field "effective_family" or similar for the sake of backward compatibility and to enable clients to detect misconfiguration.
is that in favor of both keys proposed? Something like an effective_family and a key indicating problems with family, like the diff key, or only the key for effective_family? Just checking first. I have to wait on archive data importing, so this would be good for testing the addition of new tracked OR data.
As long-term fix, let's think about adding separate field "effective_family" or similar for the sake of backward compatibility and to enable clients to detect misconfiguration.
is that in favor of both keys proposed? Something like an effective_family and a key indicating problems with family, like the diff key, or only the key for effective_family? Just checking first. I have to wait on archive data importing, so this would be good for testing the addition of new tracked OR data.
Well, do we need another field like "ineffective_family", if we keep both "family" and "effective_family"? In theory, if those two arrays don't have the same length, there's something wrong, and every entry that's missing in "effective_family" is something for the relay operator to look into; unless I missed a case. I'd rather not want to add redundant data but rely on Onionoo clients to be smart enough to display information in a way that's useful to users. Onionoo is not meant to be used by humans. What do you think?
"Array of fingerprints or nicknames of relays that this relay considers to be part of its family. There are no cross-checks whether the listed relays exist or consider this relay part of their family, so that the effective family of this relay may be smaller. Omitted if empty or if descriptor containing this information cannot be found."
Well, do we need another field like "ineffective_family", if we keep both "family" and "effective_family"? In theory, if those two arrays don't have the same length, there's something wrong, and every entry that's missing in "effective_family" is something for the relay operator to look into; unless I missed a case. I'd rather not want to add redundant data but rely on Onionoo clients to be smart enough to display information in a way that's useful to users. Onionoo is not meant to be used by humans. What do you think?
What about operators with really big families? Suppose effective_family is only defined if there is a difference. How are they supposed to make use of declared family being length n, and effective family being length n-1? That is by definition redundant. So maybe it would be better to just have the one key, family, and update the family key behavior. An operator would then need only to check the length against the expected value which they should know from torrc.
Alternatively, update the family key behavior to be effective family, and an optional key to indicate declared relatives that are not in the effective family. Opposite of what was proposed above, not retaining backwards compatibility.
Those all sound like plausible implementations, even though I fail to understand why they would be better than having a "family" and an "effective_family" field. Let's try to retain backwards compatibility if we can, because that allows us to deploy new features immediately vs. having to wait for 1 or 2 months for clients to be updated.
I didn't say they were better. You said you want to avoid redundancy. I merely point out that both a family and effective_family is a contradiction of your stated objectives. I'm just trying to make the implementation concrete before trying to write any code.
Detecting misconfiguration can be done with the current system. Ask for query by family, then ask for details document for one relay that was returned a member of the set. Done. You say you would prefer lack of redundancy, the ability to detect misconfiguration, retaining backwards compatibility. So I'm trying to understand how family and effective_family is better than the current implementation. If Onionoo isn't a service used by humans, then why change it at all in this case.
Furthermore, let's also look at a worst case (possibly more realistic), not n-length and n-1. A relay can declare all relatives up front, for every member. Each member may or may not be running/up/online when a consensus is taken. The end result being that on an hour-to-hour basis the declared family for a properly defined family will tend to be constant but the effective family, as seen by the network (the result of bidirectional relation), may change frequently. Do you really want to update the data stores each time? This applies equally to the current implementation and any alternatives.
If all this stems from an impression that the indexing for search is a limitation, and so developers should move away from relying on the index, I beg to differ.
First of all, you're right that having both "family" and "effective_family" fields is a contradiction to the redundancy statement. The primary idea of leaving "family" untouched was to be backwards-compatible, and the argument of avoiding redundancy was supposed to avoid yet another field with the declared family members that did not make it into the family.
Regarding your first comment about detecting misconfigurations, the goal would be to do this with a single details document for a given relay. When Atlas or Globe display the details, they should have an easy way to detect misconfigurations in the relay's family configuration. It's easy for them to compute set diffs, so we wouldn't have to pre-compute a set of unused family members for them. That's what I meant by "not a service used by humans". The approach you mention might be too complicated for Atlas/Globe. All they do is display data, and that's okay.
About your comment about considering worst cases, I agree that this might get more complex than expected. And maybe we're overengineering this. Maybe the better solution would be to make that backward-incompatible change where we only list effective family members in the "family" field. All I say is that this would require a major protocol change and 1 or 2 months time for Onionoo clients to adapt. But maybe it's the better way, because people (in this case Onionoo client developers) won't be confused by two different family fields.
By the way, my plan was indeed to update effective family sets every hour. We already update details files of all running relays and of all relays that just stopped running. Computing those sets every hour shouldn't be difficult.
How do we proceed? What's your favorite solution? And would you want to start hacking on this?
Yes, please compute the family sets every hour.
I'd would find it convenient to have declared and effective family sets in the details document - as you propopsed earlier.
That branch adds a new "effective_family" field to details documents specified as follows: "Array of $-prefixed fingerprints of relays that are in an effective, mutual family relationship with this relay. Omitted if empty or if descriptor containing this information cannot be found."
Note this is just a possible implementation and that we can still rename the field, replace the old "family" field with it, or do something else. But that should be easier now that we have a (hopefully working) implementation. Feedback welcome!
It adds effective family to details documents; it replaces the family of summary documents with effective family? I guess a good way to test it would be to compare effective family of details to a search by family on master. This branch is also an improvement for moving the computation of effective family to the document write stage, which makes more sense than during search index.
it replaces the family of summary documents with effective family?
It does, because there's no need to keep the declared family in summary documents anymore. The only thing it was used for was for the node indexer to compute effective families. But if we add effective families to summary documents, the declared families can go away.
Speaking of, I think I'll have to revise that branch and put the part back where the node indexer computes effective families itself. Otherwise, it'll require a coordinated update of hourly updater and server. I'll update the branch later today. We can always take out that code in a month from now when we're more certain that all Onionoo service instances have been updated.
I guess a good way to test it would be to compare effective family of details to a search by family on master. This branch is also an improvement for moving the computation of effective family to the document write stage, which makes more sense than during search index.
Agreed. I'll let others comment on this ticket over the weekend, and if nobody says it's a crazy idea, I'll deploy on the Onionoo mirror for people to test.
Speaking of, I think I'll have to revise that branch and put the part back where the node indexer computes effective families itself. Otherwise, it'll require a coordinated update of hourly updater and server.
What do you mean by coordinated update? The node indexer checks the store for update time then rebuilds itself. The only thing that changed is that previously Onionoo got the declared family from the store, made a copy in index, computed effective, and retained effective family in index. Now the effective family is computed during document write, after updating the store, and during index the effective family is copied.
The problem is that there's only one way to upgrade to this branch correctly: 1) upgrade the hourly updater, let it run, watch until it's done, and 2) immediately stop the server process, upgrade it, and restart it. If the operator waits with 2) after 1) is done, the server will return empty results for the family parameter. That makes upgrading unnecessarily complicated.
What we should do is wait a bit with taking out the code from the node indexer until we're certain that all Onionoo services are upgraded. Please find my updated branch with (untested) code for that.
It breaks the unit tests but not a huge deal until you're sure you want to commit. In your fixup you add back some of what was removed but you don't add back setFamilyFingerprints, which means getFamilyFingerprints will always return null. If your intention is to support both families fully you'll need to add back the setter and set it from NodeStatus (like before).
It breaks the unit tests but not a huge deal until you're sure you want to commit.
Yes, I noticed that too but didn't push the local fix yet. I just did that, together with a few other minor fixes. Thanks for pointing this out!
In your fixup you add back some of what was removed but you don't add back setFamilyFingerprints, which means getFamilyFingerprints will always return null. If your intention is to support both families fully you'll need to add back the setter and set it from NodeStatus (like before).
That should work, because Gson would access that attribute directly without using its setter.
Okay, I didn't hear any major concerns about this plan, so I'll proceed: Onionoo mirror at https://onionoo.thecthulhu.com/ will soon (1 hour from now?) have an "effective_family" field.
The attribute is private, not protected. How would GSON be able to access it? You rely on the getter in NodeIndexer. Thank you in advance for the clarification.
The attribute is private, not protected. How would GSON be able to access it? You rely on the getter in NodeIndexer. Thank you in advance for the clarification.
Gson probably uses reflection to access private attributes.
Karsten, are you sure we're talking about the same ff member data? Why would GSON be setting it? That's the task of metrics-lib not GSON. GSON is used to construct the JSON response isn't it?
The cardinality of effective_family for the relay should be 4.
Does onionoo's view on effective family match a tor clients view?
If it's working, Onionoo's view is ideal. The tor client uses family as a hint for path selection. Meaning, assuming you have exactly these latest descriptors, and your client can connect to all the relay, the path selection builds bi-directional relationships for you during circuit build time. So no, in general, it doesn't.
Karsten, are you sure we're talking about the same ff member data? Why would GSON be setting it? That's the task of metrics-lib not GSON. GSON is used to construct the JSON response isn't it?
SummaryDocument is an internal document used only by Onionoo and is serialized using Gson. metrics-lib's task is to parse descriptors produced by Tor relays and bridges and provided by CollecTor as input to Onionoo.
But I wasn't entirely right above. I overlooked that the new code would write SummaryDocuments without the ff attribute, so even if Gson is capable of deserializing objects with that attribute, if the attribute's value is null, it's null. Pushed a fix to my branch.
The cardinality of effective_family for the relay should be 4.
Hmm?
Does onionoo's view on effective family match a tor clients view?
If it's working, Onionoo's view is ideal. The tor client uses family as a hint for path selection. Meaning, assuming you have exactly these latest descriptors, and your client can connect to all the relay, the path selection builds bi-directional relationships for you during circuit build time. So no, in general, it doesn't.
I'd still say that Onionoo's view is a reasonable approximation to a client's view.
Oh, I see what you're trying to get across. Okay, thank you. You do realize reflection is horribly slow. It shouldn't be used for handling large quantities of data. Getters and setters are eligible for automatic compiler optimization. Reflection is not.
I see a total of 5 relay for the fingerprint mentioned. One of which doesn't show up, let alone running. The bidirectional relation therefore would hold for 4.
An approximation is not truth. In general, no, isn't meant as a criticism, it's just the facts.
This feature is not deployed on tpo's instance yet, because it's unclear whether this is the most useful definition of effective_family. At least I think we need another definition which could be called extended_family that would include all relays that can be reached by following mutual family relationships. It might be that we'll want both new fields in Onionoo, or it might be that we only want one of them. I could imagine that we add both for now, see which of them are useful, and later decide to remove one of them.
Since tor clients know only one definition of MyFamily I'm wondering why we need two?
I assume the extended_family arises from #16599 (moved) and they are only wanted because MyFamily setup is non-trivial without automation and most operators apparently don't make use of automated tools and therefore fail to configure them properly.
Yes, this has to do with inconsistent family definitions and #16599 (moved). The question is how we want to include relay family information in Onionoo, and that doesn't so much depend on what tor clients would do but more on how that information would be used by Onionoo clients. Suggestions:
family: contain all relays that a relay considers to be part of its family, regardless of what those relays think about this relay.
effective_family: only contain relays that have a mutual family relationship with this relay.
extended_family: include all relays that can be reached by following mutual family relationships.
I hope to hear back from Onionoo client developers what fields they would need.
Ah well, I just merged and deployed the new version on tpo, because I'd like to make other changes without having to re-base this branch onto those new ones. If we figure out later that we don't need the effective_family field anymore, we can just take it out again.
I think this concludes the ticket, which is why I'm closing it. Please re-open if anything remains to be done here, but please open a new ticket for unrelated stuff.
Trac: Status: needs_review to closed Resolution: N/Ato implemented