circuit_can_use_tap() should only allow TAP for v2 onion services

changed milestone to %Tor: unspecified

added 034-triage-20180328 035-removed component::core tor/tor easy intro milestone::Tor: unspecified points::0.5 priority::medium prop224 security security-low severity::normal status::new tor-hs type::defect version::tor 0.3.2.1-alpha labels

Trac:
0001-Restricts-TAP-usage-for-circuits-to-only-v2-onion-se.patch

The above patch uses circ->rend_data->version to determine the onion service version in use for the circuit. There is an assert that that structure exists before trying to dereference the version from it, which I think it should always be the case that it does exist unless there are maybe Tor2Web things I'm unfamiliar with.

Trac:
Status: new to needs_review

Replying to irl:

The above patch uses circ->rend_data->version to determine the onion service version in use for the circuit. There is an assert that that structure exists before trying to dereference the version from it, which I think it should always be the case that it does exist unless there are maybe Tor2Web things I'm unfamiliar with.

Unfortunately, that won't work. We introduced the rend_data version back before we did prop224 because we were planning on using a specialized version for v3 but it turns out that it wasn't really usable the way we wanted. We should probably also add a big comment in rend_data_t to tell the world that it is ONLY for v2 even thought a version field exists and all this ABI/API business.

So v2 circuits have a rend_data object attached and v3 circuits have an hs_ident object.

Trac:
Status: needs_review to needs_revision

Trac:
0001-Restricts-TAP-usage-for-circuits-to-only-v2-onion-se.2.patch

Ok, so testing now instead for the presence of the version specific object on the origin_circuit_t. If this patch is now doing the right thing, I'd like to go back and add those comments to the rend_data in origin_circuit_t to avoid any confusion on that in the future too.

This test may be too strict, would rend_data ever be defined for v3 services? Looking at the definitions all the data stored appears to be specific to v2 in rend_data.

Trac:
Status: needs_revision to needs_review

Ok the check is the way to do it for now.

In circuit_can_use_tap(), I might suggest you to return the result of the 3 conditions instead of this if(2 cond) then return 1 cond.

  return circuit_purpose_can_use_tap_impl(circ->base_.purpose) &&
 	 extend_info_supports_tap(circ->cpath->extend_info)) &&
 	 return circuit_rend_version_can_use_tap_impl(circ);

Final thing, may I suggest also a rename of circuit_rend_version_can_use_tap_impl() to hs_circuit_is_v2() or something around those lines that you can put in hs_circuit.c to enclose anything related to HS into its subsystem?

The reason here is that this new function is really returning the circuit HS version, not if it is supporting tap per-se. With this we'll also win some extra semantic in the code which clearly shows that only HS v2 circuit can do tap. And because circuit_purpose_can_use_tap_impl() is tested both the purpose of the circuit and the version needs to match for the tap to be supported.

Thanks!

Trac:
Status: needs_review to needs_revision
Reviewer: N/A to dgoulet

There's something I'm missing here perhaps. Every time circuit_can_use_tap is called, there's never a rend_data or hs_ident defined (pointers are both nil). Maybe this has not been added yet, or maybe it's been removed before, when the function is called. This is true even when I'm requesting a v2 onion service and the check for purpose has returned correctly.

Ah, undocumented state machines, aren't they fun?

I'd suggest you look for the code that changes circuit purposes to the ones we're looking for, Hopefully these functions come on v2 and v3 varieties, or they otherwise have some context that lets us know which version we're using.

When you find those call sites, if there is no existing struct member that gives us the version information, add a 1-bit flag to the circuit struct that tells us whether it's v2 or v3.

Trac:
0001-Restricts-TAP-usage-to-only-v2-onion-services.patch

Not convinced by the naming of these variables/constants. There's a lot of "hidden services" in the code and I wonder if it's best to just go with that, instead of trying to have onion services (probably it is).

The test suite passes nicely and I've been able to access v2/v3 services. I've not tested hosting a service but it would be good to get a review to make sure I'm on the right track and to get some hints as to naming.

The only code path I haven't fully explored is "should_use_create_fast_for_circuit". Is there ever a case where a v2 onion service would be trying to use create_fast? I don't want to have it fail to use TAP and fall back to create_fast because the v2 flag wasn't present on a code path.

I'm thinking to add some assertions that is_v2 is set in any case where rend_data is being added to a circuit, which should provide some level of assurance and potentially catch any bugs that appear later on.

Trac:
Status: needs_revision to needs_review

Replying to irl:

Not convinced by the naming of these variables/constants. There's a lot of "hidden services" in the code and I wonder if it's best to just go with that, instead of trying to have onion services (probably it is).

Naming things is always a fun game. We'd typically use something like uses_tap_for_hs_v2 to be explicit. Characters are cheap, confusion is expensive.

The test suite passes nicely and I've been able to access v2/v3 services.

Are there existing unit tests for this code that you could expand with the new flag? (I suspect there might not be - if that's the case, a new unit test would be nice, but let us know if you'd like one of us to do it.)

I've not tested hosting a service

You might enjoy using chutney to run entire testing tor networks locally:

git clone https://git.torproject.org/tor.git
git clone https://git.torproject.org/chutney.git
cd tor
make test-network
# or, for comprehensive tests
make test-network-all

but it would be good to get a review to make sure I'm on the right track and to get some hints as to naming.

This looks like you have the client rend purpose here, but you need the client intro and service rend purposes. We have to use TAP for client intro because the descriptor only has TAP onion keys. We have to use TAP for service rend because the INTRODUCE cell only has TAP onion keys.

I wonder why this code still works?

You might have found another bug - does it still work if you make circuit_can_use_tap() always return 0?

The only code path I haven't fully explored is "should_use_create_fast_for_circuit". Is there ever a case where a v2 onion service would be trying to use create_fast? I don't want to have it fail to use TAP and fall back to create_fast because the v2 flag wasn't present on a code path.

CREATE_FAST should only be used on single-hop directory fetch paths during bootstrap. It's used when we don't know a relay's onion keys. I wonder if we're checking this correctly?

And I wonder if this is the bug you've found above. That would be a nice catch. Still security-low though.

I'm thinking to add some assertions that is_v2 is set in any case where rend_data is being added to a circuit, which should provide some level of assurance and potentially catch any bugs that appear later on.

We typically use non-fatal, once-off assertions, to avoid crashing or spamming logs:

if (BUG_ONCE(condition)) {
  corrective_action();
}

tor_assert_nonfatal_once(condition);

Feel free to add some for the CREATE_FAST case, too.

Trac:
Status: needs_review to needs_revision

Taking over after asking irl on IRC.

Trac:
Status: needs_revision to accepted
Owner: N/A to dgoulet
Reviewer: dgoulet to N/A

I was working on this and I started to wonder here why isn't the purpose + the presence of a TAP onion key in the extend_info_t object not enough to rule out v2 HS?

As an example, HSv3 never sets a TAP onion key so it can simply never hit true there. Is the goal to have an explicit flag that identifies the circuits HS version? Tbh, just to have that, it requires quite a bit of gymnastic and added flags for some gain I'm not sure I understand?

A straight forward way would be to add a flag to the extend_info_t so the v2 and v3 subsystem can put the right version in there. But how is that different from "setting a TAP key" and "not setting a tap key" ?

Trac:
Status: accepted to needs_information

Replying to dgoulet:

I was working on this and I started to wonder here why isn't the purpose + the presence of a TAP onion key in the extend_info_t object not enough to rule out v2 HS?

This check is a defence in depth mechanism.

As an example, HSv3 never sets a TAP onion key so it can simply never hit true there. Is the goal to have an explicit flag that identifies the circuits HS version? Tbh, just to have that, it requires quite a bit of gymnastic and added flags for some gain I'm not sure I understand?

It makes sure that our v3 code never uses TAP. And that our v2 code only uses TAP for two specific purposes: client intro, and service rend.

These checks make it easier to get rid of TAP, because we know we're not accidentally using it for anything else. And they make sure we can't be tricked into using it for anything else, if there are bugs in our code.

A straight forward way would be to add a flag to the extend_info_t so the v2 and v3 subsystem can put the right version in there. But how is that different from "setting a TAP key" and "not setting a tap key" ?

If there are bugs in our code, it is different.

Trac:
Status: needs_information to needs_revision

Ok honestly, what I'm worried about here (for 033) is the size of the change touching many layers just to have this defense in depth. I'm worry to break things or even introduce bugs in a version that we are currently stabilizing.

But this ticket also imo highlights a problem we have in tor generally which is that we don't have a good way to identify what a circuit is for exactly. For HS, because we have two versions now, the purpose is not enough. Even for a normal client, everything is GENERAL purpose, we don't know if it is for fetching a descriptor or a consensus or an HS descriptor which makes us look at the connection object to know that but that isn't set until the circuit is launched...

And being unable to distinguish HS version is annoying actually. In theory, we would look at rend_data_t and hs_ident_t that both have a version field that was put there so when the circuit layer calls back in the HS subsystem, we do know what it is for. This highlights another thing which is that we launch circuit before they are ever setup with per-subsystem metadata (which can be OK imo).

Because we set the HS data after, it means we need to pass metadata to the circuit layer to tell it how we want the circuit. And that layer imo should remain agnostic of the "why of the circuit". The subsystem that requests the circuit should be the one with insane amount of checks and passing down the details on how it wants it constructed.

If we tell the circuit layer "what the circuit is for" so that it can make decision itself (for which I'm not convinced it should but it is already doing A LOT anyway right now), it means we should think of a proper way to pass that information down to that subsystem. Circuit flags? More metadata in extend_info_t? ...

If we want to go that way that is "hey circuit layer, I want a HSv2 client circuit for rendezvous", I would be much happier if we could get in a nice interface to identify which circuit it is for exactly (origin circuit), have it tested and then we can just hook on that for this defense.

But adding a circuit flag for "v2 circuit creation" and passing it down layers or even having a version parameter in extend_info_t are hacks that aren't very sustainable options over time.

For the record, I'm strongly in favor of keeping the circuit subsystem agnostic here and having a clear separation of work between the circuit layer and the layer that wants a circuit from it. Overlapping checks and validation like that imo leads to spaghetti code and it is error prone. And we already have much infrastructure in place for this that is the extend_info_t object and the CIRCLAUNCH_ flags.

I'm happy to open a ticket for this kind of idea but for now I'm pushing this to 034.

Trac:
Milestone: Tor: 0.3.3.x-final to Tor: 0.3.4.x-final

Trac:
Keywords: N/A deleted, 034-triage-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

Trac:
Keywords: N/A deleted, 034-removed-20180328 added

circuit_can_use_tap() should only allow TAP for v2 onion services

Child items 0

Activity