This set of commits implements the functionality specified in p224 sections 3.1.1, 3.1.2, and 3.1.4 (note that there is no 3.1.3).
The code currently lacks unit tests for both the legacy versions (which were never tested in the first place) and for the p224-specific versions, so it's probably best to wait on a merge until I have had the chance to write a testing suite.
thanks for all the code. Looks like a great start, and I think it will be very
useful in the future!
Now for some logistics: are you planning to work more on this? If yes, when
should we expect updates here? These days we are actively working on
implementing prop224, and the ESTABLISH_INTRO cell is one of our next steps, so
I'd like to see if your schedule is compatible with our plans.
Note that there have been some recent prop224 torspec changes (like the keylen
size) that impact the ESTABLISH_INTRO cell slightly.
Here are some future steps on this area:
Address the XXXs I added to the code.
Write unit tests.
Move prop224 functions like rend_mid_establish_intro_p224() to their own file.
Maybe call new file hs_intro_point.c or something. Also maybe rename that function to
rend_mid_handle_establish_intro_prop224()?
Write function-level documentation.
Run make check-spaces and fix the various whitespace/style issues.
Let me know if you plan to work on these, and if you have an ETA on delivery,
so that we can count you in our implementation roadmap :)
I am indeed planning to work more on this and finish up whatever I need to get this in.
Shooting to address these fixes by Monday, as well as start some unit testing, does that fit with the roadmap schedule?
It does! I'll be alert on Monday/Tuesday to check your changes!
Looking forward to getting this merged!
Sounds good :) However we are still far away from merge time, since there are many many more components that need to be completed for prop224 to exist!
Hello, another question: Do you have plans on working on more prop224 cells? Like INTRODUCE1 or RENDEZVOUS1? Or maybe are there other MIT students who are currently working on other cells?
Asking to know if we should move on with implementing INTRODUCE1 cells, or we should wait for more patches from you guys.
I'm working on the intro point code for ESTABLISH_INTRO. I'm now at the part where I need to connect intro circuits with auth keys. This is done so that when a client comes with an auth key, we can easily find the right circuit.
This is currently done with functions circuit_set_intro_point_digest() and circuit_clear_rend_token() using the digestmap_tintro_digest_map.
However, for prop224 I suspect it's bad form to use a SHA1 digestmap_t for storing ed25519 keys, so I'm trying to figure out what to do.
Here are some approaches:
a) Just define two more global maps that will store the prop224 rend circuit information. This is the cheap approach, since it's quite easy to implement. FWIW, we need two maps: one to store intro point circuit info, and another to store the rend cookies.
The downside here is that we are then left with maintaining 4 of those maps (two for old system, two for new system) which might become hairy.
b) Somehow refactor the current system so that we can do both old-style and prop224 mappings using the same data structure. For example, we could hash the old-style keys with sha256 and store them in a digest256 map, and do the same for new style keys as well (or just store the keys intact since they are 32bytes).
This seems like an approach that will be easier to maintain since we will only have two functions and data structures, instead of four. However, it's not a trivial refactoring. For example, we should be careful to not use new-style keys with clients trying to connect to old-style intro points.
Side question: Do we have a digestmap_t that can hold ed25519 keys? Can I just use digest256map_t for that since ed25519 keys are 32bytes anyway?
a) Just define two more global maps that will store the prop224 rend circuit information. This is the cheap approach, since it's quite easy to implement. FWIW, we need two maps: one to store intro point circuit info, and another to store the rend cookies.
The downside here is that we are then left with maintaining 4 of those maps (two for old system, two for new system) which might become hairy.
b) Somehow refactor the current system so that we can do both old-style and prop224 mappings using the same data structure. For example, we could hash the old-style keys with sha256 and store them in a digest256 map, and do the same for new style keys as well (or just store the keys intact since they are 32bytes).
This seems like an approach that will be easier to maintain since we will only have two functions and data structures, instead of four. However, it's not a trivial refactoring. For example, we should be careful to not use new-style keys with clients trying to connect to old-style intro points.
Hrm, I see circuit_set_intro_point_digest being called at only one place and circuit_get_intro_point two places. Very few call sites. I like the idea of using the same data structure and interface for this (with already working code). Refactoring to a digest256map with sha256(old_current_digest) and ed25519 keys sounds like a win. The only thing we would want I presume is a "flag" for the above function that is "Oh this is a old digest" or "New digest, do not hash".
Heck, we could even "sha256(digest)" in rend_mid_establish_intro() and do directly the query with this instead of passing a flag. In that function, pk_digest is only used for getting the intro points so we can just adapt it to a sha256 prior. Same goes for rend_mid_introduce(), hash and then call. Those functions will be deleted months/years after 224 so adding code there is a better idea imo than adding code to the circuit_ API that will actually be kept after.
Another idea, we can add two functions that specifically only takes the old digest, sha256 it and then call the circuit interface (circuit_set_intro_point_digest). As long as those functions are "old HS" specific though and not making them part of the circuit API.
Side question: Do we have a digestmap_t that can hold ed25519 keys? Can I just use digest256map_t for that since ed25519 keys are 32bytes anyway?
b) Somehow refactor the current system so that we can do both old-style and prop224 mappings using the same data structure. For example, we could hash the old-style keys with sha256 and store them in a digest256 map, and do the same for new style keys as well (or just store the keys intact since they are 32bytes).
This is what I'd suggest. I'd suggest an underlying data structure mapping a tagged structure to a circuit pointer. If you use the handle and the ht code, it should actually be easy.
Then you can define all the accessor functions, to actually make the code safer.
b) Somehow refactor the current system so that we can do both old-style and prop224 mappings using the same data structure. For example, we could hash the old-style keys with sha256 and store them in a digest256 map, and do the same for new style keys as well (or just store the keys intact since they are 32bytes).
This is what I'd suggest. I'd suggest an underlying data structure mapping a tagged structure to a circuit pointer. If you use the handle and the ht code, it should actually be easy.
Then you can define all the accessor functions, to actually make the code safer.
Hm, I'm very unfamiliar with the HT_ and HANDLE_ code, but I'm definitely willing to get my hands dirty if this is the right way to do this. I'll start studying the HT_PROTOTYPE and HT_GENERATE2 functions.
BTW, the idea here is that we want to create a new map that maps from all sorts of HS tokens (hs_token_t) to circuits. Right? An hs_token_t in this case can hold both old-style and new-style tokens for both introduction and rendezvous.
For example here is a data structure definition:
/** Represents a token used in the HS protocol. Each such token maps to a * specific introduction or rendezvous circuit. */typedef struct hs_token_t { /* The HS protocol version that uses this token. * * The version value is 2 for the old HS version, and 3 for next generation * hidden services. * * The size of the hs_token depends on the HS protocol version and the type * of token: * Old HS protocol uses 128bit tokens for introduction and rendezvous. * New HS protocol uses 128bit tokens for rendezvous, and 256bit tokens for * introductions. */ int version; /* Is this a rendezvous cookie? Otherwise, it's an introduction token. */ int is_rend_token; /* The token itself. Memory allocated at runtime depending on the HS version. */ uint8_t *hs_token;}
Does this look reasonable to you?
The alternative approach would be to remove is_rend_token from the structure, and have one map for introduction tokens, and another one for rend tokens.
(Also maybe I should add an size_t hs_token_len to control the length of the token)
My two cents. I like the idea very much. Scales much better with future version. We can have a single version agnostic function that takes the token and re-route it to the right function.
Yes, tokens have different size depending on the version (well except rdv token but that might not be true for version 4 and onward). I would add an enum of the possible type (in this case only two exists) which will makes it nicer if we ever add more types. And the token length as well. You might also want to put a or_circuit_t in there? :)
Trac: Milestone: Tor: 0.2.??? to Tor: 0.3.0.x-final Summary: Implementation of prop224 ESTABLISH_INTRO cell to prop224: Implementation of ESTABLISH_INTRO cell
I'm happy to push the first revision of code that handles prop224
ESTABLISH_INTRO cells. I started with MIT Alec's branch as a base, but most of
the code has since been rewritten. All errors mine, etc.
First of all, it's based on David's #20004 (moved) branch which adds all the relevant
trunnel definitions. It then introduces some functions that will come handy
later when we handle v3 ESTABLISH_INTRO cells. The biggest addition here is
the HS circuitmap subsystem which maps introduction tokens to circuits. The
branch then proceeds with functions that generate and handle ESTABLISH_INTRO
cells, which is the main contribution. Finally, it adds tests for handling
both v2 and v3 ESTABLISH_INTRO cells.
There are some XXXs sprinkled around the code aimed towards early reviewers.
Please browse through them as most of them should be resolved before we reach
the final review stages :)
WRT unittest LoC coverage:
{{{
hs_circuitmap.c : 100.00%
hs_intropoint.c : 80.95%
hs_service.c : 80.39%
}}}
The numbers here will improve further upon resolving some of the XXXs.
David, feel free to grab the code mutex here if you feel like it. :)
Ok I've done a round of review on the merge request.
Thanks for the review.
I addressed most of your comments and pushed a fixup branch at bug19043_v2_dgoulet_review. I also replied to the gitlab comments. This is just a ticket for you to review the fixes.
Let me know when you feel OK with them and I can squash all the fixup commits back to the original commit structure.
Also, please let me know if you'd like to get the code mutex for now to do miscellaneous improvements (e.g. try out the hs_cells API). Personally, I found
it useful when I took the HSDir code from you at the very end, so maybe you also find it useful.
I've reviewed and squashed-rebased on master in: ticket19043_030_01
This is merge ready imo but I'll set this one in needs_revision for now as I'll do an attempt to break it down with the hs_cell.{c|h} idea and see if we like it as we discussed. If not, we'll move this one in merge_ready for nickm's review.
Trac: Points: 3 to 6 Keywords: TorCoreTeam201610 deleted, TorCoreTeam201611 added Status: needs_review to needs_revision
Ok, dropping the hs_cell idea, it was way to confusing with the trunnel API prefixed hs_cell_.
I've made two extra commits on top. First one is some comments/define naming cleanup and second one is some code... let's call that "code enhancement/" :).
Still in ticket19043_030_01.
@asn, if you think this is fine, I think we can move on to merge_ready.
It basically allows Tor to parse and handle ESTABLISH_INTRO cells and also map them with circuits (using the new HS circuitmap subsystem). It's also able to generate ESTABLISH_INTRO cells, but this is just done to generate unit test data.
It's squashed and rebased on git master. When the prop224 HSDir branch (#17238 (moved)) gets merged, this branch will require some minimal edit so that we move some stuff to the hs_common.c file introduced by #17238 (moved) (there is an XXX about that). I will do that and rebase it when the time comes.
I also tested it on chutney today and the current HS functionality seems to work just fine. More manual testing is always good.
WRT unit testing, here is the coverage:
File 'src/or/hs_circuitmap.c'Lines executed:99.05% of 105File 'src/or/hs_intropoint.c'Lines executed:84.52% of 84File 'src/or/hs_service.c'Lines executed:80.39% of 51
Please note that hs_service.c is unused by real code yet (it's just there basically to generate unittest vectors).
Please let me know if you need me to do anything else here.
A couple minor points, feel free to take/leave any:
There are some magic numbers in encode_establish_intro_cell_legacy- some of these could be extracted into defines
the_hs_circuitmap might be renamed for specificity- the_ is a bit unclear.
In the comment for hs_circuits_have_same_token we use introduction and rendezvous tokens, but in the function we use first_token and second_token. Maybe introduction/rendezvous naming can be consistently used.
In commit f4db1ab5fcf3e48c6b4011e9f1cbae6db1aa8c5b it says that hs_received_establish_intro is the new entry point, but I think this is meant to be hs_intro_received_establish_intro
In general, it can be useful to write multiple unit tests per function under test if there are branches in the code. For example, it looks like we just test one path for generate_establish_intro_cell, but it might be worth writing additional unit test to cover error cases.
encode_establish_intro_cell_legacy might be able to be unit tested directly. It looks like it is tested indirectly but we could have several unit tests to test the different branches of this directly.
A couple minor points, feel free to take/leave any:
Thanks for the review. I force pushed the fixes to the same branch.
There are some magic numbers in encode_establish_intro_cell_legacy- some of these could be extracted into defines
Hmm, that function is (very) old code. I tried to leave it alone as much as possible.
I'm inclined to suggest that any improvements to that old code should be done on a new ticket, so that reviewers of this ticket have to worry as little as possible about regressions.
the_hs_circuitmap might be renamed for specificity- the_ is a bit unclear.
Hmm, I could rename that to global_hs_circuitmap or something.
That said, this silly the_ prefix is used in multiple places in our codebase to name big file-level singletons (e.g. the_nodelist, the_evdns_base, etc.).
Please let me know if you still feel like global_hs_circuitmap or something like that would be better, and I will just rename it.
In the comment for hs_circuits_have_same_token we use introduction and rendezvous tokens, but in the function we use first_token and second_token. Maybe introduction/rendezvous naming can be consistently used.
Hmm, that function is abstract on purpose. The HS circuitmap can carry both introduction and rendezvous tokens, but that function actually does not care about the token type. Do you think the function should be split into two functions handling each case separately or something?
In commit f4db1ab5fcf3e48c6b4011e9f1cbae6db1aa8c5b it says that hs_received_establish_intro is the new entry point, but I think this is meant to be hs_intro_received_establish_intro
Oops. Fixed that commit msg and force-pushed to the same branch name (naughty I know).
In general, it can be useful to write multiple unit tests per function under test if there are branches in the code. For example, it looks like we just test one path for generate_establish_intro_cell, but it might be worth writing additional unit test to cover error cases.
I wrote a unittest for one error case of that function and pushed it to the top of the branch. It even found a bug! Please check the top commit and if you like it I will squash it along with the rest of the branch so that it's clean for Nick's review.
So the bug was there because I was not checking the retval of crypto_hmac_sha3_256() correctly... Not sure why we have 1 being the error retval in crypto.c instead of -1, perhaps I should address this in crypto_hmac_sha3_256()?
encode_establish_intro_cell_legacy might be able to be unit tested directly. It looks like it is tested indirectly but we could have several unit tests to test the different branches of this directly.
I think that old function is more unittestable now indeed. However, I'd suggest we make a new ticket for unittesting that legacy function.
Putting this in needs_review, please let me know if it needs any other changes!!!
There are some magic numbers in encode_establish_intro_cell_legacy- some of these could be extracted into defines
Hmm, that function is (very) old code. I tried to leave it alone as much as possible.
I'm inclined to suggest that any improvements to that old code should be done on a new ticket, so that reviewers of this ticket have to worry as little as possible about regressions.
Sure, let's do that. I like the rule of thumb of refactoring/adding tests to legacy code when touching it, but there is of course a limit to how much improvement should be made/how testable the old code is.
the_hs_circuitmap might be renamed for specificity- the_ is a bit unclear.
Hmm, I could rename that to global_hs_circuitmap or something.
That said, this silly the_ prefix is used in multiple places in our codebase to name big file-level singletons (e.g. the_nodelist, the_evdns_base, etc.).
Please let me know if you still feel like global_hs_circuitmap or something like that would be better, and I will just rename it.
If this is a common convention/useful naming to the team, then the_ is good with me!
In the comment for hs_circuits_have_same_token we use introduction and rendezvous tokens, but in the function we use first_token and second_token. Maybe introduction/rendezvous naming can be consistently used.
Hmm, that function is abstract on purpose. The HS circuitmap can carry both introduction and rendezvous tokens, but that function actually does not care about the token type. Do you think the function should be split into two functions handling each case separately or something?
No, that was my mistake- after closer look, this looks good!
In general, it can be useful to write multiple unit tests per function under test if there are branches in the code. For example, it looks like we just test one path for generate_establish_intro_cell, but it might be worth writing additional unit test to cover error cases.
I wrote a unittest for one error case of that function and pushed it to the top of the branch. It even found a bug! Please check the top commit and if you like it I will squash it along with the rest of the branch so that it's clean for Nick's review.
Yes, looks great.
So the bug was there because I was not checking the retval of crypto_hmac_sha3_256() correctly... Not sure why we have 1 being the error retval in crypto.c instead of -1, perhaps I should address this in crypto_hmac_sha3_256()?
Yes, we should probably have consistent retvals for errors. Do you think this should be fixed as part of this issue? I see crypto_hmac_sha3_256 is added in commit 0cfcc7f3da4a77e1162f1e011ba4954d309070fa
encode_establish_intro_cell_legacy might be able to be unit tested directly. It looks like it is tested indirectly but we could have several unit tests to test the different branches of this directly.
I think that old function is more unittestable now indeed. However, I'd suggest we make a new ticket for unittesting that legacy function.
Sure, as it looks like the behavior of encode_establish_intro_cell_legacy is largely unchanged in this ticket, then opening a new ticket to add tests sounds good to me.
OK. After another great round of review by chelsea and david, I revised my code and pushed a new branch and merge request. My revisions mainly improve the sha3 MAC code and its tests.