In an effort to publish early and often, my bug7144branch contains an implementation of this which is not quite finished yet. There are some unittests in there for the added code (but only ~60% coverage of the new code so far).
Note that we'll need to remember to revise consider_testing_reachability() and dirserv_single_reachability_test() to not use random middle nodes to connect to bridges when testing their reachability. (Otherwise all this work for bridge guards is still a bit wasted.)
Also, I've diverged slightly from prop!#188 (closed): rather than sending a CREATE_FAST cell to the bridge guard, we use circuit_should_use_create_fast_for_circuit() (which has been slightly modified) to attempt to do use the same CREATE* cell type as a normal client.
My latest code is in my bug7144_r1branch. I know it's a lot of code, and I apologise for that… but if you review it, I'll buy you a beverage of your choice at the next dev meeting. :)
Review is happening on this experimental Phabricator instance that doesn't belong to us and is probably going to eat all the data: https://phablab.krautspace.de/D1
Because the data is going to get eaten, I just wanted to archive this:
I hope it was okay to for me to merge that! I felt like I shouldn't merge my own commits… then I thought it'd be okay because it's only documentation and it's now sort of "my" proposal. In the end, I decided that it was probably okay, since it seems easier for others to audit my code and follow my specification side-by-side — with the spec already updated — than needing to review both separately. If this was not okay, please let me know, and I'll revert my commit and never do that again.
The changes include:
ADD a new section detailing loose-source routed circuits, including:
How the circuit should appear to the OP, the bridge, and the
bridge guard,
How the hop(s) in the path is/are chosen,
How the first hop is handled and how circuit extension is
handled, in a generalised sense (i.e. not specific to a bridge
using bridge guards),
Which cell types are used and how they are chosen,
When the original create cell is answered (in relation to when
cells are sent to the other additional hops),
What should be done when relay cells are received when the
additional hops in the loose-source routed circuit are not
fully constructed,
How the circuit crypto and cell digests for incoming/outgoing
relay cells works (again, in a generalised sense, i.e. not
specific to bridges using bridge guards),
How and whether a given relay cell is treated as "recognized",
Under what circumstances (based on whether a stream is attached
and which relay command was sent) should cause a node which
uses loose-source routed circuits to drop a relay cell or mark
a circuit for close, and,
When and what statistics should be gathered for loose-source
routed circuits.
UPDATE and clarify the example loose-source routed circuit
construction (the original one from the proposal which was specific
to a client using a bridge that uses bridge guards).
CHANGE the "Make it harder to tell clients from bridges" section
which described the tradeoffs between using CREATE versus
CREATE_FAST cells for the first additional hop (i.e. the bridge
guard). Those concerns is no longer entirely relevant, since, in
the meantime, we've introduced the NTor handshake and it's
extremely likely that both the client and the bridge will use
CREATE2 for all their hops.
ADD a new section on enumerating bridges via the behaviours
inherent to bridge reachability testing.
REMOVE open question (from the very end of the proposal) regarding
whether the standard guard selection algorithms are adequate. They
are. (Even if they are convoluted and overly-complicated.)
Okay, I came here to drink coffee and review code, and my doctor tells me I shouldn't drink so much coffee. I'll look at the smaller ones first.
fbb21bbd9d21 Refactor loop over cpath for getting nicknames into separate function.
lgtm!
43670da13937 Generalise logic for whether a circuit_t supports ntor.
Yes but we should also open a ticket here for removing *_supports_ntor() entirely; we no-longer allow TAP-only relays on the network. (Opened as #17882 (moved))
05d5eaa18985 Generalize logic for calculating cpath length.
lgtm
04fd0097cd9a Refactor circuit_get_cpath_hop() to operate on just the cpath.
lgtm
1568e1449278 Redefine CIRCUIT_IS_ORIGIN to use ORIGIN_CIRCUIT_MAGIC, not purpose.
lgtm. There is a too-wide line here, I think, but please don't fix it now; I'll get it when I do "make check-spaces" after merge.
6daf9165951d Make logic for choosing create cell type be agnostic to circuit type.
hmm. I know this isn't new, but the !cpath->extend_info->onion_key check looks poor to me, since it will fail once no-TAP relays are a reality. Probably doesn't need to get fixed on this branch though.
b5546456b415 Check circuit types before casting in relay_send_command_from_edge_().
If we're going to make this change, we need to recognize that this function isn't really "from_edge" any more -- a cell sent outwards from a bridge is not sent "from_edge". Renaming this function might be overkill, but we should document its new semantics in its comments.
e81acaf5f33e Implement Bridge Guards (prop188).
I'll come back to this in my next comment. It's the big one.
45d2457abd5c Add unittests for loose.c.
Changes to non-test code all lgtm
Tests seem okay after a quick skim. If you haven't already done so, please run them under valgrind to make sure they don't leak.
Okay, now to review "e81acaf5f33e Implement Bridge Guards (prop188).".
The big files are command.c and loose.c. I'll review the others first.
circuitmux.c:
I think that the changes here should make use of a CIRCUIT_HAS_CPATH macro rather than doing origin || loose; this logic is likely to be important elsewhere.
onion.c:
What does the + // prop#188 comment mean? More detail please.
or.h:
0x13371515, huh? :) Fair enough.
p_chan_relay_cell is a little confusing. How do we know that they won't send two relay early cells, and what do we do if they do? Also consider prop#249 ("Large create cells")
I think some of the tor_asserts() in the OR_TO_LOOSE_CIRCUIT_() functions need to check for LOOSE_CIRCUIT_MAGIC, not OR_CIRCUIT_MAGIC.
Did you grep for other uses of OR_CIRCUIT_MAGIC to see if they needed to change as well?
relay.c:
I wonder whether there is really no shared code with circuit_receive_relay_cell and loose_circuit_process_relay_cell. I guess I'll find out when I review loose.c
What does the + // prop#188 comment mean? More detail please.
Now to review the command.c changes and loose.c in e81acaf5f33e7c6de4205dfdc8d797c3bd9dbd80.
command.c:
command_answer_create_cell has no documentation.
loose.c:
I recommend that after we merge we rename this to loosecirc.c or something.
Arg. Unless I miss my guess, there is waaay to much copied code from origin_circuit_t here. Cut-and-paste programming is a sad thing. I can't tell what's new and what isn't, what's different and what isn't. I wonder, can we extract the common fields from origin_circuit_t and loose_circuit_t into a new type, refactor the origin_circuit_t functions to work on that, and then redefine many/most of these functions in terms of the refactored origin_circuit_t stuff?
If the above change were done, this would be much easier to review, since I wouldn't have to hand-compare it to the origin_circuit_t code.
Okay, I came here to drink coffee and review code, and my doctor tells me I shouldn't drink so much coffee. I'll look at the smaller ones first.
Thanks, Nick! I owe you a beverage at the next meeting in Valencia. :)
[…]
43670da13937 Generalise logic for whether a circuit_t supports ntor.
Yes but we should also open a ticket here for removing *_supports_ntor() entirely; we no-longer allow TAP-only relays on the network. (Opened as #17882 (moved))
I thought it odd that we still had a function which should only ever return true, but I didn't want to go off refactoring pieces of code when it wasn't strictly necessary (since these patches are already large).
[…]
1568e1449278 Redefine CIRCUIT_IS_ORIGIN to use ORIGIN_CIRCUIT_MAGIC, not purpose.
lgtm. There is a too-wide line here, I think, but please don't fix it now; I'll get it when I do "make check-spaces" after merge.
Whoopsies.
6daf9165951d Make logic for choosing create cell type be agnostic to circuit type.
hmm. I know this isn't new, but the !cpath->extend_info->onion_key check looks poor to me, since it will fail once no-TAP relays are a reality. Probably doesn't need to get fixed on this branch though.
What if I were to move it below the if (server_mode(options)) check? That way, (at least in the future) we'll never accidentally use CREATE_FAST cells between ORs, where one has no onion_key. Separate ticket?
b5546456b415 Check circuit types before casting in relay_send_command_from_edge_().
If we're going to make this change, we need to recognize that this function isn't really "from_edge" any more -- a cell sent outwards from a bridge is not sent "from_edge". Renaming this function might be overkill, but we should document its new semantics in its comments.
I made a coccinelle script for the transformation. It's added, applied, and cleaned up (whitespace fixes and renaming the actual function declaration and implementation ) in:
The coccinelle script can easily be applied to any current/further patches which call either relay_send_command_from_edge() or relay_send_command_from_edge_().
[…]
45d2457abd5c Add unittests for loose.c.
Changes to non-test code all lgtm
Tests seem okay after a quick skim. If you haven't already done so, please run them under valgrind to make sure they don't leak.
I ran them under valgrind while writing them. There's no leaks, AFAICT.
Okay, now to review "e81acaf5f33e Implement Bridge Guards (prop188).".
The big files are command.c and loose.c. I'll review the others first.
circuitmux.c:
I think that the changes here should make use of a CIRCUIT_HAS_CPATH macro rather than doing origin || loose; this logic is likely to be important elsewhere.
Did you mean this check?
if ((circ->magic == OR_CIRCUIT_MAGIC) || (circ->magic == LOOSE_OR_CIRCUIT_MAGIC))
That check is for discovering if the circuit has a p_circ_id and maybe having a p_chan. Should I make a CIRCUIT_HAS_PCHAN macro? Also, I agree that CIRCUIT_HAS_CPATH sounds useful, but I don't understand where in circuitmux.c it should be applied.
onion.c:
What does the + // prop#188 comment mean? More detail please.
Whoops. That should not have been committed. That was me making a mark in the code at the very beginning of hacking on this, to tell myself that this is a spot I might need to pay attention to. I ended up not needing to though. The comment has been removed in:
* [https://gitweb.torproject.org/user/isis/tor.git/commit/?h=bug7144_r1&id=f076f639f685775368ba1e7b1d5d7c568a3807b9 f076f639] Remove prop!#188 comment from onion.c.
or.h:
0x13371515, huh? :) Fair enough.
It works as magic because it's highly improbable. :)
p_chan_relay_cell is a little confusing. How do we know that they won't send two relay early cells, and what do we do if they do? Also consider prop#249 ("Large create cells")
My reading of the spec and testing caused me to believe that only one relay_early should/would be sent, and that the OR should simply drop additional cells while the channel is not yet fully constructed:
5.5. Routing relay cells When an OR receives a RELAY or RELAY_EARLY cell, it checks the cell's circID and determines whether it has a corresponding circuit along that connection. If not, the OR drops the cell. Otherwise, if the OR is not at the OP edge of the circuit (that is, either an 'exit node' or a non-edge node), it de/encrypts the payload with the stream cipher, as follows: 'Forward' relay cell (same direction as CREATE): Use Kf as key; decrypt. 'Back' relay cell (opposite direction from CREATE): Use Kb as key; encrypt. Note that in counter mode, decrypt and encrypt are the same operation. The OR then decides whether it recognizes the relay cell, by inspecting the payload as described in section 6.1 below. If the OR recognizes the cell, it processes the contents of the relay cell. Otherwise, it passes the decrypted relay cell along the circuit if the circuit continues. If the OR at the end of the circuit encounters an unrecognized relay cell, an error has occurred: the OR sends a DESTROY cell to tear down the circuit. When a relay cell arrives at an OP, the OP decrypts the payload with the stream cipher as follows: OP receives data cell: For I=N...1, Decrypt with Kb_I. If the payload is recognized (see section 6..1), then stop and process the payload.
Without taking prop!#249 (moved) into account, my logic was that we should store one relay_early and drop the rest of them:
if (PREDICT_UNLIKELY(!loose_circ->has_opened)) { /* Store the first relay_early cell for later, after our loose circuit is * fully constructed. */ tor_assert(cell); if (cell_direction == CELL_DIRECTION_OUT) { log_debug(LD_CIRC, "Received %s command on loose circuit %d, but we haven't " "finished constructing the extra hops in this circuit! " "Saving for later.", cell_command_to_string(cell->command), LOOSE_TO_CIRCUIT(loose_circ)->global_circuitlist_idx); if (!loose_circ->p_chan_relay_cell) { ++num_seen; loose_circ->p_chan_relay_cell = tor_malloc_zero(sizeof(cell_t)); *loose_circ->p_chan_relay_cell = *cell; return 0; } else { /* OPs should not be sending additional relay cells until the first * has been answered, so if we already have one stored, then the OP is * misbehaving. Mark this circuit for close. */ log_warn(LD_CIRC, "Received another relay cell on loose circuit %d, which " "is not done constructing. Closing this circuit.", LOOSE_TO_CIRCUIT(loose_circ)->global_circuitlist_idx); return -END_CIRC_REASON_TORPROTOCOL; } } else { /* If the first relay cell we saw on this circuit was incoming, then * something really weird is happening. Drop it the cell. */ log_warn(LD_OR, "Received incoming relay cell with command %s on loose circuit " "%d, but we haven't finished construction the extra hops in " "this circuit! This is odd. Dropping the cell. Path is:", cell_command_to_string(cell->command), LOOSE_TO_CIRCUIT(loose_circ)->global_circuitlist_idx); loose_circuit_log_path(LOG_WARN, LD_OR, loose_circ); return 0;}
Although, now that I look at it again, it looks wrong. The part which does return -END_CIRC_REASON_TORPROTOCOL should actually do return 0 to drop extra outgoing relay_early from the OP side. Also, the return 0 part should instead do something like:
if (crypto_cipher_crypt_inplace(loose_circ->cpath->b_crypto, cell, CELL_PAYLOAD_SIZE) && (cell->recognised)) { // XXX I would need to double check that loose_circ_process_relay_cell // would still do the right thing in this case. loose_circuit_process_relay_cell(loose_circ, NULL, cell, CELL_DIRECTION_IN, 1);} else { // Avoid passing any relay_early cells from the loose hops backwards to the OP. return -END_CIRC_REASON_TORPROTOCOL;}
Aside from all this, this logic becomes totally incorrect when we have prop!#249 (moved) large/split create cells. I'll make two patches, one to fix the above mistakes, and another to revise the code to handle post-prop!#249 (moved) create cells. Does that sound okay?
I think some of the tor_asserts() in the OR_TO_LOOSE_CIRCUIT_() functions need to check for LOOSE_CIRCUIT_MAGIC, not OR_CIRCUIT_MAGIC.
Yikes! Good catch. O_o'
* [https://gitweb.torproject.org/user/isis/tor.git/commit/?h=bug7144_r1&id=749d559ac3cea1a0ab5591b94385a238a9eb40dc 749d559a] Change [CONST_]OR_TO_LOOSE_CIRCUIT to check correct magic.
Did you grep for other uses of OR_CIRCUIT_MAGIC to see if they needed to change as well?
It's used in to see if a rend_splice is actually an or_circuit_t in circuit_free(). And also in circuitmux.c (as mentioned in my last comment) to as part of a check to see if a circuit should have a p_circ_id.
relay.c:
I wonder whether there is really no shared code with circuit_receive_relay_cell and loose_circuit_process_relay_cell. I guess I'll find out when I review loose.c
There's plenty. I wasn't sure what to do about that, since otherwise I would be refactoring large sections of the cell processing code.
What does the + // prop#188 comment mean? More detail please.
Okay, I came here to drink coffee and review code, and my doctor tells me I shouldn't drink so much coffee. I'll look at the smaller ones first.
Thanks, Nick! I owe you a beverage at the next meeting in Valencia. :)
Thanks, Isis! I do enjoy me some beverage! Almost as much as I enjoy me code.
In any case where I don't respond below, either your code looks fine, or no change is needed, or a change can wait till later.
[...]
6daf9165951d Make logic for choosing create cell type be agnostic to circuit type.
hmm. I know this isn't new, but the !cpath->extend_info->onion_key check looks poor to me, since it will fail once no-TAP relays are a reality. Probably doesn't need to get fixed on this branch though.
What if I were to move it below the if (server_mode(options)) check? That way, (at least in the future) we'll never accidentally use CREATE_FAST cells between ORs, where one has no onion_key. Separate ticket?
Okay, now to review "e81acaf5f33e Implement Bridge Guards (prop188).".
The big files are command.c and loose.c. I'll review the others first.
circuitmux.c:
I think that the changes here should make use of a CIRCUIT_HAS_CPATH macro rather than doing origin || loose; this logic is likely to be important elsewhere.
Did you mean this check?
{{{
if ((circ->magic == OR_CIRCUIT_MAGIC) ||
(circ->magic == LOOSE_OR_CIRCUIT_MAGIC))
}}}
That check is for discovering if the circuit has a p_circ_id and maybe having a p_chan. Should I make a CIRCUIT_HAS_PCHAN macro? Also, I agree that CIRCUIT_HAS_CPATH sounds useful, but I don't understand where in circuitmux.c it should be applied.
CIRCUITH_HAS_PCHAN would be great.
p_chan_relay_cell is a little confusing. How do we know that they won't send two relay early cells, and what do we do if they do? Also consider prop#249 ("Large create cells")
My reading of the spec and testing caused me to believe that only one relay_early should/would be sent, and that the OR should simply drop additional cells while the channel is not yet fully constructed:
{{{
5.5. Routing relay cells
When an OR receives a RELAY or RELAY_EARLY cell, it checks the cell's
circID and determines whether it has a corresponding circuit along that
connection. If not, the OR drops the cell.
Otherwise, if the OR is not at the OP edge of the circuit (that is,
either an 'exit node' or a non-edge node), it de/encrypts the payload
with the stream cipher, as follows:
'Forward' relay cell (same direction as CREATE):
Use Kf as key; decrypt.
'Back' relay cell (opposite direction from CREATE):
Use Kb as key; encrypt.
Note that in counter mode, decrypt and encrypt are the same operation.
The OR then decides whether it recognizes the relay cell, by
inspecting the payload as described in section 6.1 below. If the OR
recognizes the cell, it processes the contents of the relay cell.
Otherwise, it passes the decrypted relay cell along the circuit if
the circuit continues. If the OR at the end of the circuit
encounters an unrecognized relay cell, an error has occurred: the OR
sends a DESTROY cell to tear down the circuit.
When a relay cell arrives at an OP, the OP decrypts the payload
with the stream cipher as follows:
OP receives data cell:
For I=N...1,
Decrypt with Kb_I. If the payload is recognized (see
section 6..1), then stop and process the payload.
}}}
Without taking prop!#249 (moved) into account, my logic was that we should store one relay_early and drop the rest of them:
{{{
if (PREDICT_UNLIKELY(!loose_circ->has_opened)) {
/* Store the first relay_early cell for later, after our loose circuit is
* fully constructed. */
tor_assert(cell);
if (cell_direction == CELL_DIRECTION_OUT) { log_debug(LD_CIRC, "Received %s command on loose circuit %d, but we haven't " "finished constructing the extra hops in this circuit! " "Saving for later.", cell_command_to_string(cell->command), LOOSE_TO_CIRCUIT(loose_circ)->global_circuitlist_idx); if (!loose_circ->p_chan_relay_cell) { ++num_seen; loose_circ->p_chan_relay_cell = tor_malloc_zero(sizeof(cell_t)); *loose_circ->p_chan_relay_cell = *cell; return 0; } else { /* OPs should not be sending additional relay cells until the first * has been answered, so if we already have one stored, then the OP is * misbehaving. Mark this circuit for close. */ log_warn(LD_CIRC, "Received another relay cell on loose circuit %d, which " "is not done constructing. Closing this circuit.", LOOSE_TO_CIRCUIT(loose_circ)->global_circuitlist_idx); return -END_CIRC_REASON_TORPROTOCOL; }} else { /* If the first relay cell we saw on this circuit was incoming, then * something really weird is happening. Drop it the cell. */ log_warn(LD_OR, "Received incoming relay cell with command %s on loose circuit " "%d, but we haven't finished construction the extra hops in " "this circuit! This is odd. Dropping the cell. Path is:", cell_command_to_string(cell->command), LOOSE_TO_CIRCUIT(loose_circ)->global_circuitlist_idx); loose_circuit_log_path(LOG_WARN, LD_OR, loose_circ); return 0;
}
}}}
Although, now that I look at it again, it looks wrong. The part which does return -END_CIRC_REASON_TORPROTOCOL should actually do return 0 to drop extra outgoing relay_early from the OP side. Also, the return 0 part should instead do something like:
{{{
if (crypto_cipher_crypt_inplace(loose_circ->cpath->b_crypto, cell,
CELL_PAYLOAD_SIZE) && (cell->recognised)) {
// XXX I would need to double check that loose_circ_process_relay_cell
// would still do the right thing in this case.
loose_circuit_process_relay_cell(loose_circ, NULL, cell, CELL_DIRECTION_IN, 1);
} else {
// Avoid passing any relay_early cells from the loose hops backwards to the OP.
return -END_CIRC_REASON_TORPROTOCOL;
}
}}}
Aside from all this, this logic becomes totally incorrect when we have prop!#249 (moved) large/split create cells. I'll make two patches, one to fix the above mistakes, and another to revise the code to handle post-prop!#249 (moved) create cells. Does that sound okay?
Yes, please!
I think some of the tor_asserts() in the OR_TO_LOOSE_CIRCUIT_() functions need to check for LOOSE_CIRCUIT_MAGIC, not OR_CIRCUIT_MAGIC.
Yikes! Good catch. O_o'
I'm a little worried that the testing didn't catch this. :/
relay.c:
I wonder whether there is really no shared code with circuit_receive_relay_cell and loose_circuit_process_relay_cell. I guess I'll find out when I review loose.c
There's plenty. I wasn't sure what to do about that, since otherwise I would be refactoring large sections of the cell processing code.
Yeah; I think we need to combine that if possible, either as part of merging this or right after. Having duplicate code leads to maintenance troubles.
(Note to self : waiting to see what Isis says about last round of comments, then I need to look over loose.c to figure out what's different, if it's not getting refactored just yet.)
T4: Generalize logic for calculating cpath length.
The code used to check if (circ && circ->cpath) and return 0 if circ is NULL.
Now circuit_get_cpath_len will dereference circ without that check.
(This might not be an issue if every caller makes sure circ is not NULL.)
T5: In loose_circuit_pick_cpath_entry, extend_info_from_node should be called with node, 1, because we're connecting to it directly. (This is a nitpick, as bridge relays don't currently use ReachableAddresses and ClientPreferIPv6ORPort to pick the preferred OR address to extend to. At the moment, only clients use the preferred address code.)
} else { /* We should pick an entry node */ node = choose_good_entry_server(CIRCUIT_PURPOSE_OR, loose_circ->build_state); if (!node) { log_warn(LD_CIRC, "Failed picking suitable first hop for loose " "circuit."); return NULL; } entry_ei = extend_info_from_node(node, 0); tor_assert(entry_ei); }
T6: In loose_circuit_populate_cpath, I think we don't care about ntor for one-hop circuits because they were originally for directory fetches only, which are authenticated by signature, and don't contain any private information. Maybe we should change this now loose source routing and (soon) single onion services will use one-hop circuits. (Or maybe it's ok as-is!)
// XXXprop#188 Why do we not care if it's ntor if it's only one hop?
I'll think about this for single onion services in #19163 (moved).
The code is great - it's well-structured and a joy to read.
But I wonder about the code copied from the base OR code:
have the changes to the OR code in 0.2.8 also been made in the copied code? (T7)
(I wonder if this could be causing subtle breakage. But I have no evidence either way.)
OK, now I'm done reading, time to debug:
Once this patch is applied, chutney clients never download a microdesc consensus, even though the authorities have one. (But this works ok in maint-0.2.8.) I need to look into the code that's executed when a client asks an authority for a consensus for the first time. And I need to look at the code that this patch adds to existing tor OR code.
A question about bridge guards in general: can't they be used by the next hop after the guard to passively verify that the bridge guard really is a bridge guard? Consider a client C and a bridge B with a bridge guard G. C doesn't know that G is a bridge guard of B, so at some point it tries to open the circuit C -> B -> R -> G where R is some arbitrary relay. B inserts a loose-source routed hop to G, since that's its bridge guard, so we get
C -> B -> G -> R -> G
Since no one will ever try to create a circuit like that directly, since R will refuse to extend through G, R now knows that the first hop through G must be loose-source routed, and thus G must be a bridge guard. Neither C nor B can prevent this since C doesn't know which guards B are using, and B can't read the encrypted extend cells intended for R.
Is this bad, or do we assume that bridge guards are detectable anyway? Or is it actually prevented by the path-choosing algorithms somehow?