prop224: Implementation of ESTABLISH_INTRO cell

changed milestone to %Tor: 0.3.0.x-final

added 6.s194 TorCoreTeam201612 component::core tor/tor milestone::Tor: 0.3.0.x-final owner::asn parent::17241 points::6 priority::medium prop224 reporter::alec.heif resolution::implemented review-group-13 reviewer::asn reviewer::dgoulet severity::normal sponsor::R-must status::closed tor-hs type::enhancement labels

Trac:
Username: alec.heif
Status: new to needs_revision

Trac:
Component: - Select a component to Core Tor/Tor

Trac:
Reviewer: N/A to asn, dgoulet
Sponsor: N/A to SponsorR-must
Version: Tor: unspecified to N/A
Parent: N/A to #17241 (moved)
Keywords: N/A deleted, tor-hs added
Status: needs_revision to needs_review
Milestone: N/A to Tor: 0.2.9.x-final

Hello Alec,

thanks for all the code. Looks like a great start, and I think it will be very useful in the future!

Now for some logistics: are you planning to work more on this? If yes, when should we expect updates here? These days we are actively working on implementing prop224, and the ESTABLISH_INTRO cell is one of our next steps, so I'd like to see if your schedule is compatible with our plans.

As a first step, I took a look at your code and did a bit of review. You can find my comments and changes in my branch prop224-alec at: https://gitweb.torproject.org/user/asn/tor.git/log/?h=prop224-alec

Note that there have been some recent prop224 torspec changes (like the keylen size) that impact the ESTABLISH_INTRO cell slightly.

Here are some future steps on this area:

Address the XXXs I added to the code.
Write unit tests.
Move prop224 functions like rend_mid_establish_intro_p224() to their own file. Maybe call new file hs_intro_point.c or something. Also maybe rename that function to rend_mid_handle_establish_intro_prop224()?
Write function-level documentation.
Run make check-spaces and fix the various whitespace/style issues.

Let me know if you plan to work on these, and if you have an ETA on delivery, so that we can count you in our implementation roadmap :)

Cheers!

Trac:
Status: needs_review to needs_revision

Hey asn,

I am indeed planning to work more on this and finish up whatever I need to get this in.

Shooting to address these fixes by Monday, as well as start some unit testing, does that fit with the roadmap schedule?

Looking forward to getting this merged!

-Alec

Trac:
Username: alec.heif

Replying to alec.heif:

Hey asn,

I am indeed planning to work more on this and finish up whatever I need to get this in.

Shooting to address these fixes by Monday, as well as start some unit testing, does that fit with the roadmap schedule?

It does! I'll be alert on Monday/Tuesday to check your changes!

Looking forward to getting this merged!

Sounds good :) However we are still far away from merge time, since there are many many more components that need to be completed for prop224 to exist!

Ok, I'll comment here and/or ping you on IRC whenever it's ready (I'm alecheif on there)

Trac:
Username: alec.heif

Hello, another question: Do you have plans on working on more prop224 cells? Like INTRODUCE1 or RENDEZVOUS1? Or maybe are there other MIT students who are currently working on other cells?

Asking to know if we should move on with implementing INTRODUCE1 cells, or we should wait for more patches from you guys.

Cheers!

Trac:
Summary: Implementation of Proposal 224 Section 3.1 to Implementation of prop224 ESTABLISH_INTRO cell

Trac:
Points: N/A to 3

Trac:
Milestone: Tor: 0.2.9.x-final to Tor: 0.2.???
Keywords: N/A deleted, sponsorr-0.2.10 added

Trac:
Keywords: sponsorr-0.2.10 deleted, 0210-proposed added

OK, code architecture question here:

I'm working on the intro point code for ESTABLISH_INTRO. I'm now at the part where I need to connect intro circuits with auth keys. This is done so that when a client comes with an auth key, we can easily find the right circuit.

This is currently done with functions circuit_set_intro_point_digest() and circuit_clear_rend_token() using the digestmap_t intro_digest_map.

However, for prop224 I suspect it's bad form to use a SHA1 digestmap_t for storing ed25519 keys, so I'm trying to figure out what to do.

Here are some approaches:

a) Just define two more global maps that will store the prop224 rend circuit information. This is the cheap approach, since it's quite easy to implement. FWIW, we need two maps: one to store intro point circuit info, and another to store the rend cookies.

The downside here is that we are then left with maintaining 4 of those maps (two for old system, two for new system) which might become hairy.

b) Somehow refactor the current system so that we can do both old-style and prop224 mappings using the same data structure. For example, we could hash the old-style keys with sha256 and store them in a digest256 map, and do the same for new style keys as well (or just store the keys intact since they are 32bytes).

This seems like an approach that will be easier to maintain since we will only have two functions and data structures, instead of four. However, it's not a trivial refactoring. For example, we should be careful to not use new-style keys with clients trying to connect to old-style intro points.

Side question: Do we have a digestmap_t that can hold ed25519 keys? Can I just use digest256map_t for that since ed25519 keys are 32bytes anyway?

Replying to asn:

a) Just define two more global maps that will store the prop224 rend circuit information. This is the cheap approach, since it's quite easy to implement. FWIW, we need two maps: one to store intro point circuit info, and another to store the rend cookies.

The downside here is that we are then left with maintaining 4 of those maps (two for old system, two for new system) which might become hairy.

b) Somehow refactor the current system so that we can do both old-style and prop224 mappings using the same data structure. For example, we could hash the old-style keys with sha256 and store them in a digest256 map, and do the same for new style keys as well (or just store the keys intact since they are 32bytes).

This seems like an approach that will be easier to maintain since we will only have two functions and data structures, instead of four. However, it's not a trivial refactoring. For example, we should be careful to not use new-style keys with clients trying to connect to old-style intro points.

Hrm, I see circuit_set_intro_point_digest being called at only one place and circuit_get_intro_point two places. Very few call sites. I like the idea of using the same data structure and interface for this (with already working code). Refactoring to a digest256map with sha256(old_current_digest) and ed25519 keys sounds like a win. The only thing we would want I presume is a "flag" for the above function that is "Oh this is a old digest" or "New digest, do not hash".

Heck, we could even "sha256(digest)" in rend_mid_establish_intro() and do directly the query with this instead of passing a flag. In that function, pk_digest is only used for getting the intro points so we can just adapt it to a sha256 prior. Same goes for rend_mid_introduce(), hash and then call. Those functions will be deleted months/years after 224 so adding code there is a better idea imo than adding code to the circuit_ API that will actually be kept after.

Another idea, we can add two functions that specifically only takes the old digest, sha256 it and then call the circuit interface (circuit_set_intro_point_digest). As long as those functions are "old HS" specific though and not making them part of the circuit API.

Side question: Do we have a digestmap_t that can hold ed25519 keys? Can I just use digest256map_t for that since ed25519 keys are 32bytes anyway?

Yes digest256map_t is what you want here.

Replying to asn:

b) Somehow refactor the current system so that we can do both old-style and prop224 mappings using the same data structure. For example, we could hash the old-style keys with sha256 and store them in a digest256 map, and do the same for new style keys as well (or just store the keys intact since they are 32bytes).

This is what I'd suggest. I'd suggest an underlying data structure mapping a tagged structure to a circuit pointer. If you use the handle and the ht code, it should actually be easy.

Then you can define all the accessor functions, to actually make the code safer.

Trac:
Keywords: N/A deleted, TorCoreTeam201608 added

Replying to nickm:

Replying to asn:

b) Somehow refactor the current system so that we can do both old-style and prop224 mappings using the same data structure. For example, we could hash the old-style keys with sha256 and store them in a digest256 map, and do the same for new style keys as well (or just store the keys intact since they are 32bytes).

This is what I'd suggest. I'd suggest an underlying data structure mapping a tagged structure to a circuit pointer. If you use the handle and the ht code, it should actually be easy.

Then you can define all the accessor functions, to actually make the code safer.

Hm, I'm very unfamiliar with the HT_ and HANDLE_ code, but I'm definitely willing to get my hands dirty if this is the right way to do this. I'll start studying the HT_PROTOTYPE and HT_GENERATE2 functions.

BTW, the idea here is that we want to create a new map that maps from all sorts of HS tokens (hs_token_t) to circuits. Right? An hs_token_t in this case can hold both old-style and new-style tokens for both introduction and rendezvous.

For example here is a data structure definition:

/** Represents a token used in the HS protocol. Each such token maps to a
 *  specific introduction or rendezvous circuit. */
typedef struct hs_token_t {
  /* The HS protocol version that uses this token.
   *
   * The version value is 2 for the old HS version, and 3 for next generation
   * hidden services.
   *
   * The size of the hs_token depends on the HS protocol version and the type
   * of token:
   *  Old HS protocol uses 128bit tokens for introduction and rendezvous.
   *  New HS protocol uses 128bit tokens for rendezvous, and 256bit tokens for
   *  introductions. */
  int version;

  /* Is this a rendezvous cookie? Otherwise, it's an introduction token. */
  int is_rend_token;

  /* The token itself. Memory allocated at runtime depending on the HS version. */
  uint8_t *hs_token;
}

Does this look reasonable to you?

The alternative approach would be to remove is_rend_token from the structure, and have one map for introduction tokens, and another one for rend tokens. (Also maybe I should add an size_t hs_token_len to control the length of the token)

My two cents. I like the idea very much. Scales much better with future version. We can have a single version agnostic function that takes the token and re-route it to the right function.

Yes, tokens have different size depending on the version (well except rdv token but that might not be true for version 4 and onward). I would add an enum of the possible type (in this case only two exists) which will makes it nicer if we ever add more types. And the token length as well. You might also want to put a or_circuit_t in there? :)

Note: #19896 (moved) is relevant here

Trac:
Milestone: Tor: 0.2.??? to Tor: 0.3.0.x-final
Summary: Implementation of prop224 ESTABLISH_INTRO cell to prop224: Implementation of ESTABLISH_INTRO cell

Moving this to 030 and part of the September core team work.

Trac:
Keywords: 0210-proposed, TorCoreTeam201608 deleted, TorCoreTeam201609 added

Hello there friends,

I'm happy to push the first revision of code that handles prop224 ESTABLISH_INTRO cells. I started with MIT Alec's branch as a base, but most of the code has since been rewritten. All errors mine, etc.

You can find the code in branch bug19043_v1 in my repo. Here is a gitlab link as well: https://gitlab.com/asn/tor/merge_requests/3

A few words about the branch:

First of all, it's based on David's #20004 (moved) branch which adds all the relevant trunnel definitions. It then introduces some functions that will come handy later when we handle v3 ESTABLISH_INTRO cells. The biggest addition here is the HS circuitmap subsystem which maps introduction tokens to circuits. The branch then proceeds with functions that generate and handle ESTABLISH_INTRO cells, which is the main contribution. Finally, it adds tests for handling both v2 and v3 ESTABLISH_INTRO cells.

There are some XXXs sprinkled around the code aimed towards early reviewers. Please browse through them as most of them should be resolved before we reach the final review stages :)

WRT unittest LoC coverage: {{{ hs_circuitmap.c : 100.00% hs_intropoint.c : 80.95% hs_service.c : 80.39% }}} The numbers here will improve further upon resolving some of the XXXs.

David, feel free to grab the code mutex here if you feel like it. :)

Cheers!

Trac:
Status: needs_revision to needs_review

Add 0.3.0 needs_review items (and ones that haven't been in needs_revision for very long) to review-group-10.

Trac:
Keywords: N/A deleted, review-group-10 added

(Claiming ownership)

Trac:
Status: needs_review to assigned
Owner: N/A to asn

Trac:
Status: assigned to needs_review

Trac:
Keywords: TorCoreTeam201609 deleted, TorCoreTeam201610 added

Note: We should probably use the trunnel feature from #20138 (moved) in this branch. Will do so along with the first round of reviews.

Ok I've done a round of review on the merge request.

Trac:
Status: needs_review to needs_revision

Taking this out of review-group-10 for now. The branch needs to be revised based on david's review.

Trac:
Keywords: review-group-10 deleted, N/A added

Replying to dgoulet:

Ok I've done a round of review on the merge request.

Thanks for the review.

I addressed most of your comments and pushed a fixup branch at bug19043_v2_dgoulet_review. I also replied to the gitlab comments. This is just a ticket for you to review the fixes. Let me know when you feel OK with them and I can squash all the fixup commits back to the original commit structure.

Also, please let me know if you'd like to get the code mutex for now to do miscellaneous improvements (e.g. try out the hs_cells API). Personally, I found it useful when I took the HSDir code from you at the very end, so maybe you also find it useful.

Putting this in needs_review again for now.

Trac:
Status: needs_revision to needs_review

Trac:
Keywords: N/A deleted, review-group-11 added

I've reviewed and squashed-rebased on master in: ticket19043_030_01

This is merge ready imo but I'll set this one in needs_revision for now as I'll do an attempt to break it down with the hs_cell.{c|h} idea and see if we like it as we discussed. If not, we'll move this one in merge_ready for nickm's review.

Trac:
Points: 3 to 6
Keywords: TorCoreTeam201610 deleted, TorCoreTeam201611 added
Status: needs_review to needs_revision

Ok, dropping the hs_cell idea, it was way to confusing with the trunnel API prefixed hs_cell_.

I've made two extra commits on top. First one is some comments/define naming cleanup and second one is some code... let's call that "code enhancement/" :).

Still in ticket19043_030_01.

@asn, if you think this is fine, I think we can move on to merge_ready.

Trac:
Status: needs_revision to needs_review

OK! I have a branch that is mature enough for Nick to take a look. It can be found in my repo at ticket19043_030_02.

Please find the gitlab review link here: https://gitlab.com/asn/tor/merge_requests/5/commits

It basically allows Tor to parse and handle ESTABLISH_INTRO cells and also map them with circuits (using the new HS circuitmap subsystem). It's also able to generate ESTABLISH_INTRO cells, but this is just done to generate unit test data.

It's squashed and rebased on git master. When the prop224 HSDir branch (#17238 (moved)) gets merged, this branch will require some minimal edit so that we move some stuff to the hs_common.c file introduced by #17238 (moved) (there is an XXX about that). I will do that and rebase it when the time comes.

I also tested it on chutney today and the current HS functionality seems to work just fine. More manual testing is always good.

WRT unit testing, here is the coverage:

File 'src/or/hs_circuitmap.c'
Lines executed:99.05% of 105

File 'src/or/hs_intropoint.c'
Lines executed:84.52% of 84

File 'src/or/hs_service.c'
Lines executed:80.39% of 51

Please note that hs_service.c is unused by real code yet (it's just there basically to generate unittest vectors).

Please let me know if you need me to do anything else here.

Trac:
Status: needs_review to merge_ready

Trac:
Keywords: review-group-11 deleted, review-group-12 added

All of the new modules look great!

A couple minor points, feel free to take/leave any:

There are some magic numbers in encode_establish_intro_cell_legacy- some of these could be extracted into defines
the_hs_circuitmap might be renamed for specificity- the_ is a bit unclear.
In the comment for hs_circuits_have_same_token we use introduction and rendezvous tokens, but in the function we use first_token and second_token. Maybe introduction/rendezvous naming can be consistently used.
In commit f4db1ab5fcf3e48c6b4011e9f1cbae6db1aa8c5b it says that hs_received_establish_intro is the new entry point, but I think this is meant to be hs_intro_received_establish_intro
In general, it can be useful to write multiple unit tests per function under test if there are branches in the code. For example, it looks like we just test one path for generate_establish_intro_cell, but it might be worth writing additional unit test to cover error cases.
encode_establish_intro_cell_legacy might be able to be unit tested directly. It looks like it is tested indirectly but we could have several unit tests to test the different branches of this directly.

Trac:
Status: merge_ready to needs_revision

Replying to chelseakomlo:

All of the new modules look great!

A couple minor points, feel free to take/leave any:

Thanks for the review. I force pushed the fixes to the same branch.

There are some magic numbers in encode_establish_intro_cell_legacy- some of these could be extracted into defines

Hmm, that function is (very) old code. I tried to leave it alone as much as possible.

I'm inclined to suggest that any improvements to that old code should be done on a new ticket, so that reviewers of this ticket have to worry as little as possible about regressions.

the_hs_circuitmap might be renamed for specificity- the_ is a bit unclear.

Hmm, I could rename that to global_hs_circuitmap or something.

That said, this silly the_ prefix is used in multiple places in our codebase to name big file-level singletons (e.g. the_nodelist, the_evdns_base, etc.).

Please let me know if you still feel like global_hs_circuitmap or something like that would be better, and I will just rename it.

In the comment for hs_circuits_have_same_token we use introduction and rendezvous tokens, but in the function we use first_token and second_token. Maybe introduction/rendezvous naming can be consistently used.

Hmm, that function is abstract on purpose. The HS circuitmap can carry both introduction and rendezvous tokens, but that function actually does not care about the token type. Do you think the function should be split into two functions handling each case separately or something?

In commit f4db1ab5fcf3e48c6b4011e9f1cbae6db1aa8c5b it says that hs_received_establish_intro is the new entry point, but I think this is meant to be hs_intro_received_establish_intro

Oops. Fixed that commit msg and force-pushed to the same branch name (naughty I know).

In general, it can be useful to write multiple unit tests per function under test if there are branches in the code. For example, it looks like we just test one path for generate_establish_intro_cell, but it might be worth writing additional unit test to cover error cases.

I wrote a unittest for one error case of that function and pushed it to the top of the branch. It even found a bug! Please check the top commit and if you like it I will squash it along with the rest of the branch so that it's clean for Nick's review.

So the bug was there because I was not checking the retval of crypto_hmac_sha3_256() correctly... Not sure why we have 1 being the error retval in crypto.c instead of -1, perhaps I should address this in crypto_hmac_sha3_256()?

encode_establish_intro_cell_legacy might be able to be unit tested directly. It looks like it is tested indirectly but we could have several unit tests to test the different branches of this directly.

I think that old function is more unittestable now indeed. However, I'd suggest we make a new ticket for unittesting that legacy function.

Putting this in needs_review, please let me know if it needs any other changes!!!

Trac:
Status: needs_revision to merge_ready

Replying to asn:

Replying to chelseakomlo:

All of the new modules look great!

There are some magic numbers in encode_establish_intro_cell_legacy- some of these could be extracted into defines

Hmm, that function is (very) old code. I tried to leave it alone as much as possible.

I'm inclined to suggest that any improvements to that old code should be done on a new ticket, so that reviewers of this ticket have to worry as little as possible about regressions.

Sure, let's do that. I like the rule of thumb of refactoring/adding tests to legacy code when touching it, but there is of course a limit to how much improvement should be made/how testable the old code is.

the_hs_circuitmap might be renamed for specificity- the_ is a bit unclear.

Hmm, I could rename that to global_hs_circuitmap or something.

That said, this silly the_ prefix is used in multiple places in our codebase to name big file-level singletons (e.g. the_nodelist, the_evdns_base, etc.).

Please let me know if you still feel like global_hs_circuitmap or something like that would be better, and I will just rename it.

If this is a common convention/useful naming to the team, then the_ is good with me!

In the comment for hs_circuits_have_same_token we use introduction and rendezvous tokens, but in the function we use first_token and second_token. Maybe introduction/rendezvous naming can be consistently used.

Hmm, that function is abstract on purpose. The HS circuitmap can carry both introduction and rendezvous tokens, but that function actually does not care about the token type. Do you think the function should be split into two functions handling each case separately or something?

No, that was my mistake- after closer look, this looks good!

In general, it can be useful to write multiple unit tests per function under test if there are branches in the code. For example, it looks like we just test one path for generate_establish_intro_cell, but it might be worth writing additional unit test to cover error cases.

I wrote a unittest for one error case of that function and pushed it to the top of the branch. It even found a bug! Please check the top commit and if you like it I will squash it along with the rest of the branch so that it's clean for Nick's review.

Yes, looks great.

So the bug was there because I was not checking the retval of crypto_hmac_sha3_256() correctly... Not sure why we have 1 being the error retval in crypto.c instead of -1, perhaps I should address this in crypto_hmac_sha3_256()?

Yes, we should probably have consistent retvals for errors. Do you think this should be fixed as part of this issue? I see crypto_hmac_sha3_256 is added in commit 0cfcc7f3da4a77e1162f1e011ba4954d309070fa

encode_establish_intro_cell_legacy might be able to be unit tested directly. It looks like it is tested indirectly but we could have several unit tests to test the different branches of this directly.

I think that old function is more unittestable now indeed. However, I'd suggest we make a new ticket for unittesting that legacy function.

Sure, as it looks like the behavior of encode_establish_intro_cell_legacy is largely unchanged in this ticket, then opening a new ticket to add tests sounds good to me.

Trac:
Status: merge_ready to needs_revision

OK. After another great round of review by chelsea and david, I revised my code and pushed a new branch and merge request. My revisions mainly improve the sha3 MAC code and its tests.

The new branch is ticket19043_030_03 and the merge request is here: https://gitlab.com/asn/tor/merge_requests/7/diffs

Putting this back to merge_ready. Feel free to change ticket status if you find more issues.

Trac:
Status: needs_revision to merge_ready

Trac:
Keywords: review-group-12 deleted, review-group-13 added
Status: merge_ready to needs_revision

I've left a bunch of comments in the gitlab review. Some of the issues here are cosmetic, but others are potential stack-smashers!

Trac:
Keywords: TorCoreTeam201611 deleted, TorCoreTeam201612 added

Taking over this one from asn. Made a pass on nickm's comment in https://gitlab.com/asn/tor/merge_requests/7/diffs.

The new merge request is here containing all the fixups from the review in the above request: https://gitlab.com/dgoulet/tor/merge_requests/16/commits

Branch: ticket19043_030_03

Trac:
Status: needs_revision to needs_review

comments posted (to both pull requests).

Trac:
Status: needs_review to needs_revision

Addressed all comments so back in review mode!

Trac:
Status: needs_revision to needs_review

prop224: Implementation of ESTABLISH_INTRO cell

Child items ...

Activity