Opened 3 years ago

Last modified 11 months ago

#17254 needs_revision enhancement

Scalable HSes by splitting intro/rendezvous

Reported by: TvdW Owned by: TvdW
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs multicore scaling
Cc: teor, special Actual Points:
Parent ID: #15251 Points: 6
Reviewer: Sponsor:

Description

asn suggested creating a ticket before the proposal is committed, so here I go.

Most relevant mail from the thread (contains the proposal): https://lists.torproject.org/pipermail/tor-dev/2015-October/009625.html

Child Tickets

Change History (36)

comment:1 Changed 3 years ago by TvdW

Had some time, implemented the rest of the proposal as well to check
feasibility and updated the proposal with some new insights. Looking
forward to all comments on this!

Patch 1, splitting the intro and rendezvous code (basically a no-op):
https://github.com/TvdW/tor/commit/0443e38d776a114458e2f56e435f324c38e7a17a

Patch 2, actually implementing the proposal:
https://github.com/TvdW/tor/commit/b8d41f66efdb856b7813c4394f8a81c82e1f2e07

Simple controller implementation that proves my proposal works:
https://gist.github.com/TvdW/3f720b9c6ffcd71967c1

N.B. Those patches obviously need some documentation, a changes file, and maybe some general sanity.

Last edited 3 years ago by TvdW (previous) (diff)

comment:2 Changed 3 years ago by TvdW

Github branch instead of separate commits (I will continue updating this branch): https://github.com/TvdW/tor/commits/split-hs-intro-rend

comment:3 Changed 3 years ago by nickm

Milestone: Tor: 0.2.8.x-final
Status: newneeds_review

comment:4 Changed 3 years ago by dgoulet

Keywords: tor-hs added
Points: medium
Sponsor: SponsorR

comment:5 Changed 3 years ago by nickm

Could we have a spec revision (based on tor-dev comments) before I review this? I'd like clarity on what the opaque blobs are supposed to contain so that I know whether they really do.

comment:6 Changed 3 years ago by teor

Cc: teor added

comment:7 Changed 3 years ago by TvdW

Revisions done based on comments sent to me through IRC and mail.

torspec branch :
https://github.com/TvdW/torspec/commits/split-hs-intro-rend

tor branch :
https://github.com/TvdW/tor/commits/split-hs-intro-rend

Sample controller :
https://gist.github.com/TvdW/3f720b9c6ffcd71967c1

The tor branch does not currently contain tests. I will add these as soon as I figure out how to actually test this through unit tests.

Last edited 3 years ago by TvdW (previous) (diff)

comment:8 Changed 3 years ago by nickm

Severity: Normal

Comments on the spec branch:

  • I still think "INTRODUCE" is a pretty vague name. How about INTRODUCE_HANDOFF?
  • I really don't like sending private keys around like this. Is there any way to avoid it?

Comments on the code's documentation:

  • I still believe you should specify the _actual_ contents and format of the blob. rend_service_handoff_introduce has this information, but only sort of.
  • Actually, rend_service_handoff_introduce's generation and parsing code is the kind of thing I made Trunnel for. Here is a trunnel specification that would let you get rid of 90% of the code for generating and parsing these:
    struct introduction_v0 {
       nulterm tor_version;
       u32 request_len;
       u8 request[request_len];
       u8 rend_pk_digest[20];
       u8 rsa_private_key[..];
     };
    

Though personally I would suggest something more like this instead:

struct introduction_v0 {
   u32 blob_magic;
   u16 blob_version IN [0];
   u16 request_len;
   u8 request[request_len];
   u8 rend_pk_digest[DIGEST_LEN];
   u8 rsa_privkey_len;
   u8 rsa_privkey[rsa_privkey_len];
};
Last edited 3 years ago by nickm (previous) (diff)

comment:9 Changed 3 years ago by nickm

Status: needs_reviewneeds_revision

comment:10 Changed 3 years ago by nickm

(Actually, the non-trunnel binary parsing/generating code is broken. It can overflow its buffer on bad inputs, and it checks for overflow only after copying data in some bases, and it has endianness issues. I'd suggest scrapping it for the trunnel-generated version.)

comment:11 in reply to:  8 ; Changed 3 years ago by teor

Replying to nickm:

Comments on the spec branch:

  • I really don't like sending private keys around like this. Is there any way to avoid it?

I discussed an alternative design with TvdW, where the introduction side decrypts the blob using the introduction key, then hands off the decrypted fields to the rendezvous side. This avoids sending the key, so if the decrypted fields are exposed, only that rendezvous is affected. The tradeoff is increased decryption load on the introduction side.

I think this change is worth it for the security improvement.

comment:12 Changed 3 years ago by nickm

(I went and merged the proposal to the proposals repository, so it can be under version control. We should merge the next version of it too.)

comment:13 in reply to:  11 Changed 3 years ago by asn

Cc: special added

Replying to teor:

Replying to nickm:

Comments on the spec branch:

  • I really don't like sending private keys around like this. Is there any way to avoid it?

I discussed an alternative design with TvdW, where the introduction side decrypts the blob using the introduction key, then hands off the decrypted fields to the rendezvous side. This avoids sending the key, so if the decrypted fields are exposed, only that rendezvous is affected. The tradeoff is increased decryption load on the introduction side.

I think this change is worth it for the security improvement.

It seems to me that if this new design goes forward, the feature will be almost equivalent to what #16059 needs for the "rendezvous approver" API.

comment:14 Changed 3 years ago by teor

The latest branch for this proposal leaves rend_service_relaunch_rendezvous untouched. It is called if the first rendezvous attempt fails in the middle of building the circuit to the rendezvous point.

Can you please either:

  • explain why rend_service_relaunch_rendezvous works as-is with this feature, or
  • modify rend_service_relaunch_rendezvous to work correctly with this feature.

comment:15 Changed 3 years ago by asn

Has a decision been taken here on what the final proposal should be doing? Whether the decryption happens on the introducer side or on the rendezvouer side?

comment:16 Changed 3 years ago by nickm

Priority: MediumHigh

comment:17 Changed 3 years ago by teor

Please don't review this yet, a code revision is needed.

comment:18 Changed 3 years ago by TvdW

I just pushed a new branch, now with the decryption done before the handoff, and the event renamed to INTRODUCE_HANDOFF. The blob logic is now moved to Trunnel.

https://github.com/TvdW/tor/commits/split-hs-intro-rend-v2

Spec branch will follow soon. Code branch still lacks tests (Sorry!)

@teor: I had a look at rend_service_relaunch_rendezvous. Looks like no change is needed, as rend_service_perform_rendezvous sets up the internal structures correctly.

Last edited 3 years ago by TvdW (previous) (diff)

comment:19 in reply to:  18 ; Changed 3 years ago by asn

Replying to TvdW:

I just pushed a new branch, now with the decryption done before the handoff, and the event renamed to INTRODUCE_HANDOFF. The blob logic is now moved to Trunnel.

https://github.com/TvdW/tor/commits/split-hs-intro-rend-v2

Spec branch will follow soon. Code branch still lacks tests (Sorry!)

@teor: I had a look at rend_service_relaunch_rendezvous. Looks like no change is needed, as rend_service_perform_rendezvous sets up the internal structures correctly.

Very nice!

I took a small peak at the code, and it looks nice!

Here are some things that crossed my mind in this initial review:

  • Some places are lacking documentation. e.g. rend_service_handoff_introduce()
  • It would be really great if there was some unittest that verified the correctness of the handoff-then-perform procedure.
  • As you noted with your XXX, in case of perform failure it would be better if we send a 550 or 551 control failure instead of send_control_done(conn).
  • uint8_t buffer[2048]; I wonder if this stack buffer needs to be so big. Or maybe this is premature optimization and we should just move on.

Will review more seriously after the spec branch is also updated!

comment:20 Changed 3 years ago by asn

I'm also curious to see some documentation on how HS operators can use this feature :)

comment:21 in reply to:  19 ; Changed 3 years ago by TvdW

I took a small peak at the code, and it looks nice!

Thanks!

Here are some things that crossed my mind in this initial review:

  • Some places are lacking documentation. e.g. rend_service_handoff_introduce()

You're right. I'll add some

  • It would be really great if there was some unittest that verified the correctness of the handoff-then-perform procedure.

Haven't figured out how to do this yet :-( I suppose I could make rend_service_perform_rendezvous() mock-able, and then call the rend_service_handoff_introduce()/handle_control_perform_rendezvous() combo. That wouldn't really test the behavior though, which is something I think only Chutney could test.

  • As you noted with your XXX, in case of perform failure it would be better if we send a 550 or 551 control failure instead of send_control_done(conn).

Patched and pushed!

  • uint8_t buffer[2048]; I wonder if this stack buffer needs to be so big. Or maybe this is premature optimization and we should just move on.

Allocating 2k on the stack is essentially free. memwipe() is not (I forgot a few; patched those just now) but it's still cheap compared to the other code. I've changed the stack size to 1k, as that should also be fine, though I'd rather not 'optimize' that further as it seems premature.

Will review more seriously after the spec branch is also updated!

Thanks! I'll push something later this week.

comment:22 in reply to:  21 Changed 3 years ago by teor

Replying to TvdW:

  • It would be really great if there was some unittest that verified the correctness of the handoff-then-perform procedure.

Haven't figured out how to do this yet :-( I suppose I could make rend_service_perform_rendezvous() mock-able, and then call the rend_service_handoff_introduce()/handle_control_perform_rendezvous() combo. That wouldn't really test the behavior though, which is something I think only Chutney could test.

If I understand correctly, the unit test you suggest sounds useful to make sure that rend_service_perform_rendezvous() still behaves like it used to. That's an important property.

But it would also be useful to have a unit test that synthetically performs a handoff by acting like a fake controller:

  • calls rend_service_perform_rendezvous() with "real" encrypted rendezvous response data
  • by mocking rend_service_handoff_introduce() or similar:
    • takes the parameters that would be in the control event
    • manually sends those parameters through handle_control_perform_rendezvous()
  • checks that the resulting actions and state are as expected, probably by mocking functions called by handle_control_perform_rendezvous()

(Or is this what you're suggesting?)

This will help us make sure we don't accidentally break rendezvous handoff as we refactor the hidden service code for prop224.

comment:23 in reply to:  20 Changed 3 years ago by asn

Replying to asn:

I'm also curious to see some documentation on how HS operators can use this feature :)

[answering IRC question]

I think this kind of documentation can be nicely hosted in a wiki page (or alternatively as a tor-dev post or a blog post).

If you want to do it in the Tor source code, you could use the doc/ directory but I'm not sure if it's the right place.

comment:24 Changed 3 years ago by TvdW

I've been working on other projects so progress on this has been slow, but I found a bit of time to work on this and am now close to changing the status of the ticket to needs_review :)

Spec v2: https://github.com/TvdW/torspec/blob/split-hs-intro-rend-v2/proposals/255-hs-load-balancing.txt
Branch v2: https://github.com/TvdW/tor/commits/split-hs-intro-rend-v2

I've found that it's really hard to write tests for code that doesn't already have tests. https://github.com/TvdW/tor/commit/3a8c941752ff56712633b771831aae50675a9f42 is a good example of the horror I'll have to commit to if I write these. Am I on the right track?

A chutney branch will follow... at some point.

comment:25 Changed 3 years ago by TvdW

<nickm> when you say you're close to putting it in needs_review, could you add a comment saying what parts you think are ready for review (if any) and which aren't?

The implementation is done. The spec branch is, afaik, also good. The thing that is missing are these annoying TESTS. :)

comment:26 Changed 3 years ago by nickm

Owner: set to TvdW
Status: needs_revisionassigned

Setting TvdW as the owner of this needs_revision ticket.

comment:27 Changed 3 years ago by nickm

Status: assignedneeds_revision

comment:29 Changed 3 years ago by dgoulet

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final
Priority: HighMedium
Sponsor: SponsorR

Passed feature freeze for 028, changing milestone to 029.

comment:30 Changed 3 years ago by dgoulet

Milestone: Tor: 0.2.9.x-finalTor: 0.2.???
Sponsor: SponsorR-can

comment:31 Changed 2 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:32 Changed 23 months ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:33 Changed 17 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:34 Changed 17 months ago by dgoulet

Points: medium6
Sponsor: SponsorR-can

comment:35 Changed 16 months ago by nickm

Keywords: multicore scaling added

comment:36 Changed 11 months ago by teor

Parent ID: #15251

This might help with #15251

Note: See TracTickets for help on using tickets.