Opened 22 months ago

Closed 11 months ago

Last modified 11 months ago

#18319 closed defect (implemented)

Exclude relays that don't match pinned RSA/Ed key pairs

Reported by: teor Owned by: nickm
Priority: High Milestone: Tor: 0.3.0.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-ed25519-proto, nickm-deferred-20160905, review-group-15
Cc: Actual Points:
Parent ID: Points: 1
Reviewer: Sponsor: SponsorU-can

Description

We need to test this code as part of the #17668 changes, but we don't plan on using it for a few releases.

Child Tickets

Change History (33)

comment:1 Changed 22 months ago by teor

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:2 Changed 22 months ago by teor

Milestone: Tor: 0.2.8.x-finalTor: 0.2.7.x-final

comment:3 Changed 22 months ago by teor

Keywords: TorCoreTeam201602 added

comment:4 Changed 22 months ago by teor

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:5 Changed 22 months ago by nickm

This code exists; it's controlled by the AuthDirPinKeys option.

comment:6 Changed 22 months ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

Throw most 0.2.8 "NEW" tickets into 0.2.9. I expect that many of them will subsequently get triaged out.

comment:7 Changed 21 months ago by isabela

Sponsor: SponsorU-can

comment:8 Changed 21 months ago by nickm

Parent ID: #17668
Points: small

comment:9 Changed 21 months ago by nickm

Priority: MediumHigh

comment:10 Changed 21 months ago by nickm

Keywords: TorCoreTeam201602 removed

comment:11 Changed 20 months ago by nickm

Keywords: tor-ed25519-proto added

comment:12 Changed 19 months ago by isabela

Points: small1

comment:13 Changed 19 months ago by andrea

Owner: set to andrea
Status: newassigned

Taking ownership for 0.2.9 triage

comment:14 Changed 16 months ago by nickm

Keywords: nickm-deferred-20160905 added
Milestone: Tor: 0.2.9.x-finalTor: 0.3.0.x-final

Deferring some andrea-assigned items from 0.2.9. Andrea, please move any of these back if you disagree; this is just a first approximation.

comment:15 Changed 14 months ago by nickm

So, is it safe to turn this on? I say yes.

Based on the key pinning journal from tor26 (thanks, weasel!) it appears that since June, tor26 has seen 11254 RSA key IDs that never ever had a problem with key pinning, and 38 that did have a problem with key pinning. Here is a list: The first column has the RSA ID digest; the second column has the number of times that the RSA ID has changed, and the third column is the total number of distinct RSA IDs that we saw:

0xTX/OPySoQeQhcfYbmg7XKvPig	3	2
1fLGX0ExoUaNW2eog4qbftjASeI	3	3
1flPABP5uAKi10ISLSYoFE77lKE	2	2
2w28qPYF1huAZXHIG6ceFGBit0E	2	2
5NaImMhATrKKj2exPASMgk+Wlzg	5	2
5VWwnHcDZzPgCn1hzVNLqa9KojI	2	2
8B2NZEjIuIPYrNCNOqd4nV7ji3s	308	2
AwpuskclwF2OD84hkjy6UiPnXg4	2	2
c8lCVGJIf8a9PS37ADkWGGBPPSQ	2	2
CSNlpBRhGAUJKkKTa5iqJZmjS/o	2	2
Eam7Qvu+2koLC25/LgjTdhKEAeA	3	3
EIDZv2CSjzjtExqrPmMMGPfIY6k	971	3
eT78t3vCLFFyNpHD9Anold98rLI	3	2
fHPpbeWNhM2G4CAO6MDiMlbYwY4	2	2
FmNLMUUcPNs1DKcuv5VKmHfwzjQ	5	5
fqbq1v2DCDxTj0QDi7+gd1h911U	2	2
h4QTqd5axJeipfIA2hY/EtUD/Y0	361	2
hADNOhd+RqInQb+NI8r9/f9kPc4	4	4
hBmxi6hnwmv4EzD1Lw3lSzmqeng	5	5
IlhNU47BggOuIYwkmKnEy2VNWQ8	2	2
/kAz11CDHDKpVxdK3RHkD1WKFKk	2	2
k+wgCkGxwIa5SCA/kKyYEkehbRs	2	2
L3qCbMW35tg07KJeI2oTUdOJ0ZU	2	2
LmZJ2hypiJcX/6R7ne72F/lrAy0	6	3
LQSvfJ3GjHXJC1fPd0rcS0Uo6vM	2	2
n7VU4rU8XoC6sozAbXBUDiKuyPU	2	2
nzE2uXX/gPDcRirEaOdoL/6T9As	2	2
OENTblrXo3+jW1AeV0kI7FsuC4k	2	2
Oy4g/k61Ml3mKg91MEh93RqNYGc	2	2
pTxG9bFX3YM2bUWo6ZokSTShTEY	2	2
qj2bTEI7M31nALgV890iRN0CC+4	2	2
ur6aJJ2MxQsdUMIBBjPhlYjbQ5E	2	2
WXuzSmUToNRG7oewrDC3FuTBWG4	2	2
XcvYDQDicgqjLkAxf1FoG/UrWmM	3	2
y5iRq/UMnKjxRFqyHSfMXDGCCSI	3	2
YtLQNjkCJcHvZWIV3QZtAwSGH8M	2	2
yylv56Bq+HhTRJW/OIF/4Ip3msA	2	2

I also tried looking at the time distribution of when the different Ed25519 keys appeared, to see if adding a grace period to the code would help. That doesn't seem to be the case: no more than a third of the problems occurred within a week.

comment:16 Changed 14 months ago by teor

I suggest that we email these operators (or these operators filtered by some characteristic, like "bandwidth over 1MByte/second"), and let them know their relay is misconfigured, and they will soon be excluded from the consensus.

comment:17 Changed 14 months ago by Sebastian

One of these is a dirauth (dizum). How will this all work, by the way? My key pinning journal goes back one year and has more entries than what is written above, including more than just the dirauth above.

comment:18 Changed 14 months ago by Sebastian

Should we maybe throw away all the journals and email those above anyway, informing them that they would be excluded in the future if they kept doing this?

comment:19 Changed 14 months ago by nickm

Teor says:

I suggest that we email these operators (or these operators filtered by some characteristic, like "bandwidth over 1MByte/second"), and let them know their relay is misconfigured, and they will soon be excluded from the consensus.

IMO this is fine to do, but we need to explain it right.

When we turn on pinning, the most recent journal entry will rule. So a relay will only be excluded from the consensus if its most recently pinned Ed25519 key is not the one it uses. So if somebody switched Ed keys once a few months ago, they won't get penalized here. This only affects them if they are switching frequently, or if they switch keys again.

The rule for relays becomes:

Always use the same Ed25519 identity with the same RSA identity.

So, don't switch one unless you also switch the other. If you lose one, don't try to retain the other.

Sebastian says:

One of these is a dirauth (dizum).

We should probably make sure that whatever made Dizum change its ed25519 key won't happen again.

How will this all work, by the way? My key pinning journal goes back one year and has more entries than what is written above, including more than just the dirauth above.

Once key pinning is turned on, an authority will believe the latest entry for any given RSA key. They will not accept a descriptor signed with that RSA identity key unless it also has the provided Ed25519 identity. So it only affects the voting, not the consensus.

Should we maybe throw away all the journals and email those above anyway, informing them that they would be excluded in the future if they kept doing this?

IMO we should not throw away the journals; they're all correct information.

comment:20 Changed 14 months ago by Sebastian

Ah, so if someone with a relay that has key pinning data stored uploads a descriptor with a previously unknown ed key, the dirauths will refuse to vote for that descriptor. But if at a later point in time the relay uploads another descriptor again with the previously recorded ed key, then the dirauth will vote for the relay again, yes?

comment:21 Changed 14 months ago by nickm

That's correct.

comment:22 Changed 14 months ago by Sebastian

Ok, that's great. Then my worry was unfounded. The dirauths changed fingerprints because they too hastily upgraded without generating a key offline first, which should be rectified for all of them now.

comment:23 in reply to:  19 ; Changed 14 months ago by teor

Replying to nickm:

...

When we turn on pinning, the most recent journal entry will rule. So a relay will only be excluded from the consensus if its most recently pinned Ed25519 key is not the one it uses. So if somebody switched Ed keys once a few months ago, they won't get penalized here. This only affects them if they are switching frequently, or if they switch keys again.

The rule for relays becomes:

Always use the same Ed25519 identity with the same RSA identity.

So, don't switch one unless you also switch the other. If you lose one, don't try to retain the other.

Sebastian says:

...
How will this all work, by the way? My key pinning journal goes back one year and has more entries than what is written above, including more than just the dirauth above.

Once key pinning is turned on, an authority will believe the latest entry for any given RSA key. They will not accept a descriptor signed with that RSA identity key unless it also has the provided Ed25519 identity. So it only affects the voting, not the consensus.
...

I see from the manual that AuthDirPinKeys is set on a per-authority basis, so it only affects that authority's votes (and so it's not like a consensus method, where every authority uses it at the same time).

Activation Timing

What if I run a relay that changes ed keys during the changeover?

If authorities A, B, C, D set key pinning at hour 1,

& authorities E, F, G, H set key pinning at hour 2,

then I have a different ed key pinned on some authorities compared to others.

I guess I need to regenerate both RSA & ed keys in this instance.

Keeping State

Will authorities need to back up their key pinning file?

If an authority is restored with an empty pinning file, it will regenerate its key pinning file based on the descriptors it sees at that time, and those descriptors could be different after the restore. (But the other authorities will anchor the pinning, if a majority keep their files.)

Test Network / Testing

I've just set AuthDirPinKeys on some of the authorities in the test network, and asked the other operators to do the same. It seems to work fine. But we don't have any current mismatching or RSA-only relays, so this is not as good a test as it could be.

(It also works fine in chutney, but I'd like to try to match the public dirauth options in chutney going forward, see #20513.)

Requiring Ed25519

Also, what are we going to do about DISABLE_DISABLING_ED25519?
It's currently #undef, which means that a relay can drop its ed25519 key whenever it wants.
When are we going to turn it on? When 0.2.5 is no longer recommended?

comment:24 in reply to:  23 ; Changed 14 months ago by nickm

Replying to teor:

Replying to nickm:

...

When we turn on pinning, the most recent journal entry will rule. So a relay will only be excluded from the consensus if its most recently pinned Ed25519 key is not the one it uses. So if somebody switched Ed keys once a few months ago, they won't get penalized here. This only affects them if they are switching frequently, or if they switch keys again.

The rule for relays becomes:

Always use the same Ed25519 identity with the same RSA identity.

So, don't switch one unless you also switch the other. If you lose one, don't try to retain the other.

Sebastian says:

...
How will this all work, by the way? My key pinning journal goes back one year and has more entries than what is written above, including more than just the dirauth above.

Once key pinning is turned on, an authority will believe the latest entry for any given RSA key. They will not accept a descriptor signed with that RSA identity key unless it also has the provided Ed25519 identity. So it only affects the voting, not the consensus.
...

I see from the manual that AuthDirPinKeys is set on a per-authority basis, so it only affects that authority's votes (and so it's not like a consensus method, where every authority uses it at the same time).

Activation Timing

What if I run a relay that changes ed keys during the changeover?

If authorities A, B, C, D set key pinning at hour 1,

& authorities E, F, G, H set key pinning at hour 2,

then I have a different ed key pinned on some authorities compared to others.

I guess I need to regenerate both RSA & ed keys in this instance.

Yes. If you are a relay, you should never keep one key and change the other. The consequences for doing it during the changeover are weirder than usual.

Keeping State

Will authorities need to back up their key pinning file?

If an authority is restored with an empty pinning file, it will regenerate its key pinning file based on the descriptors it sees at that time, and those descriptors could be different after the restore. (But the other authorities will anchor the pinning, if a majority keep their files.)

IMO authorities should probably back these up, but it isn't crucial.

Requiring Ed25519

Also, what are we going to do about DISABLE_DISABLING_ED25519?
It's currently #undef, which means that a relay can drop its ed25519 key whenever it wants.
When are we going to turn it on? When 0.2.5 is no longer recommended?

That sounds plausible to me. Or another option would be to look at historical metrics data to see how often relays run a recent version for a while, then drop back to an older one. If the answer is "almost never" then we can just turn it on now.

comment:25 in reply to:  24 Changed 14 months ago by teor

Replying to nickm:

Replying to teor:

Requiring Ed25519

Also, what are we going to do about DISABLE_DISABLING_ED25519?
It's currently #undef, which means that a relay can drop its ed25519 key whenever it wants.
When are we going to turn it on? When 0.2.5 is no longer recommended?

That sounds plausible to me. Or another option would be to look at historical metrics data to see how often relays run a recent version for a while, then drop back to an older one. If the answer is "almost never" then we can just turn it on now.

Split off #20522. I'm in favour of doing it soon, because it makes key pinning consistent.

comment:26 Changed 14 months ago by teor

This seems to be working fine (and consistently) on the test network:

Nov 02 01:58:09.000 [warn] http status 400 ("Looks like your keypair does not match its older value.") response from dirserver 'REDACTED1'. Please correct.
Nov 02 01:58:09.000 [warn] http status 400 ("Looks like your keypair does not match its older value.") response from dirserver 'REDACTED2'. Please correct.

comment:27 Changed 12 months ago by nickm

Owner: changed from andrea to nickm
Status: assignedaccepted

ticket18319 is the clear-cut change here.

comment:28 Changed 12 months ago by nickm

Status: acceptedneeds_review

comment:29 Changed 12 months ago by nickm

Keywords: review-group-14 added

comment:30 Changed 11 months ago by nickm

Keywords: review-group-15 added; review-group-14 removed

comment:31 Changed 11 months ago by dgoulet

Status: needs_reviewmerge_ready

With teor's comment about the testnet, this patch is straight forward. Let's get it merged and we'll deploy it asap in the testnet.

lgtm;

comment:32 Changed 11 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

merged

comment:33 Changed 11 months ago by teor

Just a reminder for when we deploy this code:

Has anyone checked that each directory authority's current key pairs are pinned consistently by every other directory authority?

When we ran into this issue in the test network, I had to delete the RSA and ed keys for the broken authority, and regenerate them (and then we had to update all the torrc authority lines). If this happened in the public network, we would have to update the tor source code.

When the first authority deploys this code, we'll find some inconsistencies, but it will take a majority of authorities (ideally with consistent pairings) to affect the consensus.

Note: See TracTickets for help on using tickets.