Opened 19 months ago

Closed 4 months ago

#26288 closed enhancement (implemented)

prop289: Implement authenticated SENDME

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: prop289, 035-roadmap-master, 035-triaged-in-20180711, prop289-assigned-sponsor-v, 041-proposed-on-roadmap, network-team-roadmap-2019-Q1Q2, tor-spec, postfreeze-ok, 041-should
Cc: dmr Actual Points:
Parent ID: Points: 21
Reviewer: nickm Sponsor: SponsorV

Description

This is the master ticket for implementing prop289 which is the authenticated SENDMEs cell.

https://gitweb.torproject.org/torspec.git/tree/proposals/289-authenticated-sendmes.txt

Child Tickets

TicketTypeStatusOwnerSummary
#26839enhancementcloseddgouletprop289: Make a relay remember last cell digests before SENDME
#26840enhancementcloseddgouletprop289: Modify SENDME cell to have a version and payload
#26841enhancementcloseddgouletprop289: Have tor handle the new SENDME cell format and validate
#26842enhancementcloseddgouletprop289: Add consensus parameters to control new SENDME behavior
#26846enhancementclosednickmprop289: Leave unused random bytes in relay cell payload
#26871enhancementcloseddgouletprop289: randomize the unused part of relay payloads
#29536enhancementclosednickmprng: Add a global fast PRNG object
#29541defectclosednickmRe-enable util/mmap_anon_no_fork
#30363enhancementcloseddgouletprop289: Implement FlowCtrl protover value
#30364enhancementcloseddgouletprop289: Add consensus param to use or not SENDME v1 for one-hop directory download
#30365taskcloseddgouletprop289: Update tor-spec.txt with authenticated SENDME spec
#30428defectcloseddgouletsendme: Failure to validate authenticated SENDMEs client side
#30467defectcloseddgouletsendme: Fix coverity CID 1445000
#30557taskcloseddgouletprop289: Update proposal with new deployment plan

Change History (48)

comment:1 Changed 18 months ago by dgoulet

Owner: set to dgoulet
Status: newassigned

comment:2 Changed 17 months ago by nickm

Keywords: 035-triaged-in-20180711 added

comment:3 Changed 17 months ago by dmr

Cc: dmr added

comment:4 Changed 17 months ago by teor

Keywords: prop289-assigned-sponsor-v added
Sponsor: SponsorV

I think all prop#289 tickets are Sponsor V.

comment:5 Changed 15 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.6.x-final

Deferring various feature-y things to 0.3.6. If one of these is actually happening in 0.3.5, please let me know!

comment:6 Changed 13 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:7 Changed 11 months ago by dgoulet

Milestone: Tor: 0.4.0.x-finalTor: 0.4.1.x-final

prop289 won't make it in 040. Feature freeze is in 7 days.

comment:8 Changed 11 months ago by teor

Keywords: 041-proposed-on-roadmap added

Let's review these tickets at the next meeting using our 041-proposed process.
They're on the roadmap, so the review should focus on ticket size and team capacity (and sponsor expectations).

comment:9 Changed 11 months ago by dgoulet

On going work of the entire prop289 implementation is in: ticket26288_041_01. It contains all the child ticket code since they all depend on each other.

comment:10 Changed 10 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added
Points: 21

comment:11 Changed 10 months ago by dgoulet

I'm waiting on #29536 to be merged before submitting a PR. I've rebased the latest branch on #29536 so it can have a full integration: See ticket26288_041_02.

PR will arrive once everything is in master.

There will be one last thing to discuss: how to do this with protover. See the spec change that I propose to the proposal about this: ticket26288_01 (torspec).

comment:12 Changed 10 months ago by dgoulet

Reviewer: ahf
Status: assignedneeds_review

Ok! We got all the pieces!

PR: https://github.com/torproject/tor/pull/725
Branch: ticket26288_041_02
Spec: ticket26288_01

So a piece missing is the integration with protover. I'm not entirely sure how to proceed code wise because what I've tried with SendMe=1 and it was not working. Basically, what I need is a confirmation that what is proposed makes sense and is doable that way. If so, I'll push the commit that implements this and will ask nickm to hunt down why it is failing.

comment:13 in reply to:  12 ; Changed 10 months ago by teor

Replying to dgoulet:

...

So a piece missing is the integration with protover. I'm not entirely sure how to proceed code wise because what I've tried with SendMe=1 and it was not working. Basically, what I need is a confirmation that what is proposed makes sense and is doable that way. If so, I'll push the commit that implements this and will ask nickm to hunt down why it is failing.

SENDMEs are part of circuits and streams, so we could increment the Relay protocol version:

9.3. "Relay"

   The "relay" protocols are those used to handle CREATE/CREATE2
   cells, and those that handle the various RELAY cell types received
   after a CREATE/CREATE2 cell.  (Except, relay cells used to manage
   introduction and rendezvous points are managed with the "HSIntro"
   and "HSRend" protocols respectively.)

https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2012

comment:14 in reply to:  13 ; Changed 10 months ago by dgoulet

Replying to teor:

Replying to dgoulet:

...

So a piece missing is the integration with protover. I'm not entirely sure how to proceed code wise because what I've tried with SendMe=1 and it was not working. Basically, what I need is a confirmation that what is proposed makes sense and is doable that way. If so, I'll push the commit that implements this and will ask nickm to hunt down why it is failing.

SENDMEs are part of circuits and streams, so we could increment the Relay protocol version:

Hmmmm the only reason I created a SendMe here is because it would have made Relay a bit messier... but I guess overall that is what we've designed Protover to support anyway.

Edit: After some discussions with Nick on IRC, problem with Relay is that we would need two new versions, that is "Auth. SENDME + tap" and "Auth. SENDME + ntor"... and that means using Relay implies a large matrix of versions every time we change a different cell type.

So the suggestion would be something like FlowCtrl=, have an implicit "1" that is current situation and add the value for 2 that would be for prop289.

We already have a SENDME version (0) that all tor supports. And now we want to support v1. In order for protover to "stop" the use of v0, we then need to introduce two new versions to Relay which right now would be 3 and 4.

Then to remove the usage of v0, we would advertise Relay=1-2,4 which should effectively exit() every client that does NOT support v1 that is Relay=4.

Doable!

Last edited 10 months ago by dgoulet (previous) (diff)

comment:15 Changed 10 months ago by ahf

Status: needs_reviewneeds_revision

Currently there is a compilation failure on AppVeyor:

                 from ../src/core/or/or.h:32,
                 from ../src/core/or/sendme.c:12:
../src/core/or/sendme.c: In function 'sendme_connection_edge_consider_sending':
../src/core/or/sendme.c:326:27: error: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Werror=format=]
     log_debug(log_domain, "Outbuf %lu, queuing stream SENDME.",
                           ^
../src/lib/log/log.h:243:48: note: in definition of macro 'log_debug'
       log_fn_(LOG_DEBUG, domain, __FUNCTION__, args, ##__VA_ARGS__); \
                                                ^~~~

Looks like the test failures on Travis isn't related to this code.

comment:16 Changed 10 months ago by dgoulet

Status: needs_revisionneeds_review

comment:17 Changed 10 months ago by ahf

Reviewer: ahfnickm

The code and spec changes looks good to me. Moving this over to Nick for some additional review.

comment:18 in reply to:  14 Changed 10 months ago by teor

Replying to dgoulet:

Replying to teor:

Replying to dgoulet:

...

So a piece missing is the integration with protover. I'm not entirely sure how to proceed code wise because what I've tried with SendMe=1 and it was not working. Basically, what I need is a confirmation that what is proposed makes sense and is doable that way. If so, I'll push the commit that implements this and will ask nickm to hunt down why it is failing.

SENDMEs are part of circuits and streams, so we could increment the Relay protocol version:

Hmmmm the only reason I created a SendMe here is because it would have made Relay a bit messier... but I guess overall that is what we've designed Protover to support anyway.

Edit: After some discussions with Nick on IRC, problem with Relay is that we would need two new versions, that is "Auth. SENDME + tap" and "Auth. SENDME + ntor"... and that means using Relay implies a large matrix of versions every time we change a different cell type.

So the suggestion would be something like FlowCtrl=, have an implicit "1" that is current situation and add the value for 2 that would be for prop289.

You can do it this way: just like HSIntro etc.

We already have a SENDME version (0) that all tor supports. And now we want to support v1. In order for protover to "stop" the use of v0, we then need to introduce two new versions to Relay which right now would be 3 and 4.

Then to remove the usage of v0, we would advertise Relay=1-2,4 which should effectively exit() every client that does NOT support v1 that is Relay=4.

I think there's some confusion here.

The current Relay protocols are:

  1. TAP and all the features in Tor 0.2.3 (including whatever flow control was in 0.2.3)
  2. ntor and all the features in Tor 0.2.4.19, including TAP and all the features in 0.2.3 (including whatever flow control was in 0.2.4.19)

https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2012

But I think you're right overall: we don't know if we want to turn off TAP first, or the old flow control first. So a new protocol is a good idea.

comment:19 Changed 10 months ago by teor

Status: needs_reviewneeds_revision

I reviewed the protocol parts of this patch:

Phase 3 of the transition plan requires old clients and relays to download a consensus so they learn that they should stop trying to connect to the network. But since 0.2.8, clients (and censored relays that can't access any DirPorts) will try to use the ORPort to download a consensus. But ORPort circuits from legacy clients will fail during phase 3.

Here's what I think we need to do:

  1. always allow legacy sendmes for BEGINDIR for the consensus, and everything that is required to validate a consensus:
    • authority certificates,
    • relay descriptors (for bridge clients),
    • anything else?
  2. Revise the transition plan, so it includes the protover changes and the consensus parameter changes
  3. Don't remove the section about extensive testing using chutney:
    -   We'll want to do a bunch of testing in chutney before flipping the
    -   switches in the real network: I've long suspected we still have bugs
    -   in our sendme timing, and this proposal might expose some of them.
    
  4. Do the chutney tests now, and do them again when we want to implement each phase on the public network

The spec and the code are also out of sync: the spec talks about FlowCtrl, but the code doesn't have FlowCtrl.

Here are the changes I think we need to make:

  1. Add FlowCtrl=1 to the protocols advertised by relays in C
  2. Add FlowCtrl=1 to the protocols advertised by relays in Rust
  3. Clarify "FlowCtrl" in the spec:
       Tor clients and relays that don't support this protover version from the
       consensus "required-client-protocols" or "required-relay-protocols" lines
       will exit and thus not try to join the network. Here is the proposed value:
    
          "FlowCtrl"
    
          Describes the flow control protocol at the circuit and stream level.
          If there is no FlowCtrl protocol version, tor supports the unauthenticated
          flow control features from its supported Relay protocols.
    

comment:20 in reply to:  19 ; Changed 10 months ago by dgoulet

Replying to teor:

I reviewed the protocol parts of this patch:

Phase 3 of the transition plan requires old clients and relays to download a consensus so they learn that they should stop trying to connect to the network. But since 0.2.8, clients (and censored relays that can't access any DirPorts) will try to use the ORPort to download a consensus. But ORPort circuits from legacy clients will fail during phase 3.

This is what the new protover version is all about. We would flip FlowCtrl=1 _before_ the consensus param sendme_accept_min_version=1 is set. This would effectively exit() every clients/relays that can't support v1.

Then we can enforce *only* accepting v1 through the consensus (if that param is needed at all).

Here's what I think we need to do:

  1. always allow legacy sendmes for BEGINDIR for the consensus, and everything that is required to validate a consensus:
    • authority certificates,
    • relay descriptors (for bridge clients),
    • anything else?

I'm quite reluctant to keep legacy SENDMEs even when the protover _and_/_or_ the consensus param is flipped. The whole point of that protover is that we don't have to deal with clients not supporting v1. And we can NOT flip the "min accept" param before that protover.

  1. Don't remove the section about extensive testing using chutney:
    -   We'll want to do a bunch of testing in chutney before flipping the
    -   switches in the real network: I've long suspected we still have bugs
    -   in our sendme timing, and this proposal might expose some of them.
    
  2. Do the chutney tests now, and do them again when we want to implement each phase on the public network

I don't see the point of that in a proposal to be honest. Chutney tests have been done extensively to develop that branch so I'm not sure what this (4) is about here?...

Any switch we flip in the consensus, especially something like this, needs to be tested a lot. Not doing so is a bit reckless on our part and I doubt having it in the proposal saying "we need to test this" is useful at all...

What I recommend we do is actually describe the "ordering" of all the pieces in the transition plan. And then document what we should NOT do so in 10 years, we have a reminder of what to do (because I don't see Phase 3 being done until many years!).

The spec and the code are also out of sync: the spec talks about FlowCtrl, but the code doesn't have FlowCtrl.

Yes, that is what the original comment mean on my part is that I didn't do this yet because I wasn't sure it was the right approach with the SendMe.

comment:21 in reply to:  20 ; Changed 10 months ago by teor

Replying to dgoulet:

Replying to teor:

I reviewed the protocol parts of this patch:

Phase 3 of the transition plan requires old clients and relays to download a consensus so they learn that they should stop trying to connect to the network. But since 0.2.8, clients (and censored relays that can't access any DirPorts) will try to use the ORPort to download a consensus. But ORPort circuits from legacy clients will fail during phase 3.

This is what the new protover version is all about. We would flip FlowCtrl=1 _before_ the consensus param sendme_accept_min_version=1 is set. This would effectively exit() every clients/relays that can't support v1.

You're right, old clients and relays that download a consensus while FlowCtrl=1 is required, and while v1 sendmes are accepted, will cache that consensus. Then they will refuse to connect to the network as long as they have that cached consensus.

But when we start rejecting v1 sendmes, old clients will fail to download a consensus. (Most old relays will still be able to fetch consensuses, because they use DirPorts.) These old clients may be new installs of an old version, clients or relays which clear their data directories, or clients that are only launched occasionally.

How will we handle them?
We'll shut down most old clients that are already installed. Is that enough?
How much time do we need between setting the protocol version requirement, and rejecting v1 sendmes?

What will an old client do if we reject its sendmes?
How many old clients will it take to bring down the network?
Do we need to keep testing old clients without a cached consensus, but with v1 sendmes rejected?

We need to talk about these issues in the proposal.

Then we can enforce *only* accepting v1 through the consensus (if that param is needed at all).

Here's what I think we need to do:

  1. always allow legacy sendmes for BEGINDIR for the consensus, and everything that is required to validate a consensus:
    • authority certificates,
    • relay descriptors (for bridge clients),
    • anything else?

I'm quite reluctant to keep legacy SENDMEs even when the protover _and_/_or_ the consensus param is flipped. The whole point of that protover is that we don't have to deal with clients not supporting v1. And we can NOT flip the "min accept" param before that protover.

I agree: having to keep legacy code is a really bad outcome.

  1. Don't remove the section about extensive testing using chutney:
    -   We'll want to do a bunch of testing in chutney before flipping the
    -   switches in the real network: I've long suspected we still have bugs
    -   in our sendme timing, and this proposal might expose some of them.
    
  2. Do the chutney tests now, and do them again when we want to implement each phase on the public network

I don't see the point of that in a proposal to be honest. Chutney tests have been done extensively to develop that branch so I'm not sure what this (4) is about here?...

Future Tor changes can break our transition plans.

Any switch we flip in the consensus, especially something like this, needs to be tested a lot. Not doing so is a bit reckless on our part and I doubt having it in the proposal saying "we need to test this" is useful at all...

What I recommend we do is actually describe the "ordering" of all the pieces in the transition plan. And then document what we should NOT do so in 10 years, we have a reminder of what to do (because I don't see Phase 3 being done until many years!).

You're right, writing it down doesn't help that much.

Here's one way to regularly test future transition plans:

  1. We write a script that runs chutney tests that test each part of the transition plan, for old and new clients, and clients with and without cached consensuses
  2. We set up a repository on Travis with a cron job that runs once a week, to make sure that future changes don't break the transition plan

Are there other ways to make sure we don't break our future transition plans?

comment:22 in reply to:  21 Changed 10 months ago by dgoulet

Keywords: tor-spec added

Replying to teor:

We need to talk about these issues in the proposal.

Indeed. I'll make the changes as soon as I can to the proposal.

Are there other ways to make sure we don't break our future transition plans?

This current issue reminds me of the tap vs ntor problem. Once we stop using tap, what will old client do? I recall analysis being done by nickm there and I'm expecting a very similar conclusion(s) about clients who just can't use the network.

comment:23 Changed 10 months ago by dgoulet

Status: needs_revisionneeds_review

(Only spec change, no code for now)

I've force pushed quite a big fix on the last commit. I recommend a complete re-read of section 4 (Deployment).

I've expanded it considering teor's comment and a discussion I had with Nick this morning on IRC. Basically, keeping v0 support on *directory one-hop* circuit could be desirable as a buffer once v0 is not accepted anymore.

I've also added a Timeline section to try to give us a better idea of how it looks.

Last edited 10 months ago by dgoulet (previous) (diff)

comment:24 Changed 9 months ago by nickm

wow, this is a big one. I've reviewed the first N commits here. I'll do the next N commits soon.

comment:25 Changed 9 months ago by nickm

Woo, I've finished reviewing. Looks nice! I've made some suggestions on the branch.

In addition to the review comments, here is something I'd like to see: I'd like to see a unit test that makes sure that the calculated SENDME values match SENDME values that are calculated by a different process, like a python script or something. This will help us make sure that we're calculating the right function over the correct portions of the correct cells. (I didn't see a test like that, but I could have missed it.) What do you think?

Also, this branch needs a changes file.

comment:26 Changed 9 months ago by nickm

Status: needs_reviewneeds_revision

comment:27 Changed 9 months ago by nickm

Two more requests:

  • travis seems to be failing on the streamwrap unit test. That's probably related to changes here.
  • there's no coverage information; I'd like to know how the coverage is on the tests.

comment:28 Changed 9 months ago by nickm

See #29668 for a branch to make the tests pass here.

comment:29 in reply to:  25 Changed 9 months ago by dgoulet

Status: needs_revisionneeds_review

Replying to nickm:

Woo, I've finished reviewing. Looks nice! I've made some suggestions on the branch.

All pushed now.

In addition to the review comments, here is something I'd like to see: I'd like to see a unit test that makes sure that the calculated SENDME values match SENDME values that are calculated by a different process, like a python script or something. This will help us make sure that we're calculating the right function over the correct portions of the correct cells. (I didn't see a test like that, but I could have missed it.) What do you think?

So essentially, you would like a test that makes sure the digest we compute and match is valid outside of Tor code? That is basically the cell rolling digest validation?

Also, this branch needs a changes file.

I'll add it once the protover commit is done. We seems to have reached a consensus with FlowCtrl so that is what I'll go for.

travis seems to be failing on the streamwrap unit test. That's probably related to changes here.

As discussed on IRC, #29668 should be merged upstream instead of within this branch.

there's no coverage information; I'd like to know how the coverage is on the tests.

I'll get you that once I have the protover commit.

comment:30 Changed 8 months ago by nickm

Status: needs_reviewneeds_revision

Looks good! There are a few more changes I think we should do, mostly around encapsulation and efficiency.

comment:31 Changed 8 months ago by nickm

(I've commented on the PR)

comment:32 in reply to:  31 Changed 8 months ago by dgoulet

Status: needs_revisionneeds_review

Replying to nickm:

(I've commented on the PR)

Responded. Some more commits were pushed there changing many things. My integration tests still pass successfully after this.

comment:33 Changed 8 months ago by dgoulet

As a reminder:

PR: ​https://github.com/torproject/tor/pull/725
Branch: ticket26288_041_02
Spec: ticket26288_01

comment:34 Changed 8 months ago by nickm

Status: needs_reviewneeds_revision

Okay! I have some tiny requests about function names, and I still think that code outside relay_crypto shouldn't access its fields.

comment:35 in reply to:  34 Changed 8 months ago by dgoulet

Status: needs_revisionneeds_review

Replying to nickm:

Okay! I have some tiny requests about function names, and I still think that code outside relay_crypto shouldn't access its fields.

All fixed!

comment:36 Changed 8 months ago by nickm

lgtm now; let's squash & rebase, then re-run all the tests we can think of :)

comment:37 in reply to:  36 Changed 8 months ago by dgoulet

Replying to nickm:

lgtm now; let's squash & rebase, then re-run all the tests we can think of :)

Here we go! New PR based on latest master (206d28ff152f2df5). Branch has been squashed also thus free of fixups:

PR: https://github.com/torproject/tor/pull/986
Branch: ticket26288_041_03

comment:38 Changed 8 months ago by cypherpunks

does this mean, we can no longer cheat bandwidth tokens by spamming

SendMe

's ?

comment:39 Changed 8 months ago by nickm

cypherpunks: for more info proposal 289.

comment:40 Changed 8 months ago by nickm

Dgoulet: this looks good; before we merge, can you re-verify your chutney tests?

comment:41 Changed 8 months ago by dgoulet

Keywords: asn-merge added
Status: needs_reviewmerge_ready

Integration tests went well! Now, this lacks two things:

  • FlowCtrl protover value.
  • The one-hop circuit to a directory cache (download consensus) and its consensus param.

nickm and I discussed it on IRC and we'll do that once this is merged so we don't pile up more things on this already big/non-trivial branch. However, we will not release both features apart, they need to all go in the same release. I'll open these tickets and made them child of this parent ticket.

Which means that once this is merged, keep this parent ticket open until we merge the children.

Final PR: https://github.com/torproject/tor/pull/986
Branch: ticket26288_041_03
Spec: ticket26288_01 <-- be careful, has a fixup commit.

Once everything is good, we'll need a tor-spec.txt (or maybe a new sendme-spec.txt?) patch to integrate the proposal officially into the spec.

comment:42 Changed 8 months ago by asn

Merged! Keeping the ticket open. I haven't merged the spec changes since comment:41 says that there are needed additions.

comment:43 Changed 8 months ago by asn

Keywords: asn-merge removed

comment:44 Changed 7 months ago by nickm

Keywords: postfreeze-ok added

comment:45 Changed 7 months ago by nickm

Keywords: 041-should added

comment:46 Changed 7 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

All child tickets are closed; authenticated sendmes are now a real thing. :)

comment:47 Changed 4 months ago by catalyst

Resolution: implemented
Status: closedreopened

reopening to edit a child ticket

comment:48 Changed 4 months ago by catalyst

Resolution: implemented
Status: reopenedclosed
Note: See TracTickets for help on using tickets.