Based on the key pinning journal from tor26 (thanks, weasel!) it appears that since June, tor26 has seen 11254 RSA key IDs that never ever had a problem with key pinning, and 38 that did have a problem with key pinning. Here is a list: The first column has the RSA ID digest; the second column has the number of times that the RSA ID has changed, and the third column is the total number of distinct RSA IDs that we saw:
I also tried looking at the time distribution of when the different Ed25519 keys appeared, to see if adding a grace period to the code would help. That doesn't seem to be the case: no more than a third of the problems occurred within a week.
I suggest that we email these operators (or these operators filtered by some characteristic, like "bandwidth over 1MByte/second"), and let them know their relay is misconfigured, and they will soon be excluded from the consensus.
One of these is a dirauth (dizum). How will this all work, by the way? My key pinning journal goes back one year and has more entries than what is written above, including more than just the dirauth above.
Should we maybe throw away all the journals and email those above anyway, informing them that they would be excluded in the future if they kept doing this?
I suggest that we email these operators (or these operators filtered by some characteristic, like "bandwidth over 1MByte/second"), and let them know their relay is misconfigured, and they will soon be excluded from the consensus.
IMO this is fine to do, but we need to explain it right.
When we turn on pinning, the most recent journal entry will rule. So a relay will only be excluded from the consensus if its most recently pinned Ed25519 key is not the one it uses. So if somebody switched Ed keys once a few months ago, they won't get penalized here. This only affects them if they are switching frequently, or if they switch keys again.
The rule for relays becomes:
Always use the same Ed25519 identity with the same RSA identity.
So, don't switch one unless you also switch the other. If you lose one, don't try to retain the other.
Sebastian says:
One of these is a dirauth (dizum).
We should probably make sure that whatever made Dizum change its ed25519 key won't happen again.
How will this all work, by the way? My key pinning journal goes back one year and has more entries than what is written above, including more than just the dirauth above.
Once key pinning is turned on, an authority will believe the latest entry for any given RSA key. They will not accept a descriptor signed with that RSA identity key unless it also has the provided Ed25519 identity. So it only affects the voting, not the consensus.
Should we maybe throw away all the journals and email those above anyway, informing them that they would be excluded in the future if they kept doing this?
IMO we should not throw away the journals; they're all correct information.
Ah, so if someone with a relay that has key pinning data stored uploads a descriptor with a previously unknown ed key, the dirauths will refuse to vote for that descriptor. But if at a later point in time the relay uploads another descriptor again with the previously recorded ed key, then the dirauth will vote for the relay again, yes?
Ok, that's great. Then my worry was unfounded. The dirauths changed fingerprints because they too hastily upgraded without generating a key offline first, which should be rectified for all of them now.
When we turn on pinning, the most recent journal entry will rule. So a relay will only be excluded from the consensus if its most recently pinned Ed25519 key is not the one it uses. So if somebody switched Ed keys once a few months ago, they won't get penalized here. This only affects them if they are switching frequently, or if they switch keys again.
The rule for relays becomes:
{{{
Always use the same Ed25519 identity with the same RSA identity.
}}}
So, don't switch one unless you also switch the other. If you lose one, don't try to retain the other.
Sebastian says:
...
How will this all work, by the way? My key pinning journal goes back one year and has more entries than what is written above, including more than just the dirauth above.
Once key pinning is turned on, an authority will believe the latest entry for any given RSA key. They will not accept a descriptor signed with that RSA identity key unless it also has the provided Ed25519 identity. So it only affects the voting, not the consensus.
...
I see from the manual that AuthDirPinKeys is set on a per-authority basis, so it only affects that authority's votes (and so it's not like a consensus method, where every authority uses it at the same time).
Activation Timing
What if I run a relay that changes ed keys during the changeover?
If authorities A, B, C, D set key pinning at hour 1,
& authorities E, F, G, H set key pinning at hour 2,
then I have a different ed key pinned on some authorities compared to others.
I guess I need to regenerate both RSA & ed keys in this instance.
Keeping State
Will authorities need to back up their key pinning file?
If an authority is restored with an empty pinning file, it will regenerate its key pinning file based on the descriptors it sees at that time, and those descriptors could be different after the restore. (But the other authorities will anchor the pinning, if a majority keep their files.)
Test Network / Testing
I've just set AuthDirPinKeys on some of the authorities in the test network, and asked the other operators to do the same. It seems to work fine. But we don't have any current mismatching or RSA-only relays, so this is not as good a test as it could be.
(It also works fine in chutney, but I'd like to try to match the public dirauth options in chutney going forward, see #20513 (moved).)
Requiring Ed25519
Also, what are we going to do about DISABLE_DISABLING_ED25519?
It's currently #undef, which means that a relay can drop its ed25519 key whenever it wants.
When are we going to turn it on? When 0.2.5 is no longer recommended?
When we turn on pinning, the most recent journal entry will rule. So a relay will only be excluded from the consensus if its most recently pinned Ed25519 key is not the one it uses. So if somebody switched Ed keys once a few months ago, they won't get penalized here. This only affects them if they are switching frequently, or if they switch keys again.
The rule for relays becomes:
{{{
Always use the same Ed25519 identity with the same RSA identity.
}}}
So, don't switch one unless you also switch the other. If you lose one, don't try to retain the other.
Sebastian says:
...
How will this all work, by the way? My key pinning journal goes back one year and has more entries than what is written above, including more than just the dirauth above.
Once key pinning is turned on, an authority will believe the latest entry for any given RSA key. They will not accept a descriptor signed with that RSA identity key unless it also has the provided Ed25519 identity. So it only affects the voting, not the consensus.
...
I see from the manual that AuthDirPinKeys is set on a per-authority basis, so it only affects that authority's votes (and so it's not like a consensus method, where every authority uses it at the same time).
Activation Timing
What if I run a relay that changes ed keys during the changeover?
If authorities A, B, C, D set key pinning at hour 1,
& authorities E, F, G, H set key pinning at hour 2,
then I have a different ed key pinned on some authorities compared to others.
I guess I need to regenerate both RSA & ed keys in this instance.
Yes. If you are a relay, you should never keep one key and change the other. The consequences for doing it during the changeover are weirder than usual.
Keeping State
Will authorities need to back up their key pinning file?
If an authority is restored with an empty pinning file, it will regenerate its key pinning file based on the descriptors it sees at that time, and those descriptors could be different after the restore. (But the other authorities will anchor the pinning, if a majority keep their files.)
IMO authorities should probably back these up, but it isn't crucial.
Requiring Ed25519
Also, what are we going to do about DISABLE_DISABLING_ED25519?
It's currently #undef, which means that a relay can drop its ed25519 key whenever it wants.
When are we going to turn it on? When 0.2.5 is no longer recommended?
That sounds plausible to me. Or another option would be to look at historical metrics data to see how often relays run a recent version for a while, then drop back to an older one. If the answer is "almost never" then we can just turn it on now.
Also, what are we going to do about DISABLE_DISABLING_ED25519?
It's currently #undef, which means that a relay can drop its ed25519 key whenever it wants.
When are we going to turn it on? When 0.2.5 is no longer recommended?
That sounds plausible to me. Or another option would be to look at historical metrics data to see how often relays run a recent version for a while, then drop back to an older one. If the answer is "almost never" then we can just turn it on now.
Split off #20522 (moved). I'm in favour of doing it soon, because it makes key pinning consistent.
This seems to be working fine (and consistently) on the test network:
Nov 02 01:58:09.000 [warn] http status 400 ("Looks like your keypair does not match its older value.") response from dirserver 'REDACTED1'. Please correct.Nov 02 01:58:09.000 [warn] http status 400 ("Looks like your keypair does not match its older value.") response from dirserver 'REDACTED2'. Please correct.
Has anyone checked that each directory authority's current key pairs are pinned consistently by every other directory authority?
When we ran into this issue in the test network, I had to delete the RSA and ed keys for the broken authority, and regenerate them (and then we had to update all the torrc authority lines). If this happened in the public network, we would have to update the tor source code.
When the first authority deploys this code, we'll find some inconsistencies, but it will take a majority of authorities (ideally with consistent pairings) to affect the consensus.