Opened 3 years ago

Last modified 9 days ago

#19859 needs_revision enhancement

Expose stream isolation information to controllers

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: 0.4.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs tor-control dns isolation needs-spec needs-design term-project
Cc: JeremyRand Actual Points:
Parent ID: Points: 3
Reviewer: nickm Sponsor:

Description

See the discussion on the "How to integrate an external name resolver into Tor" thread on tor-dev; most notably http://archives.seul.org/tor/dev/Aug-2016/msg00019.html .

Resolvers would like to know the isolation information of incoming streams so they know which streams need to be isolated from which other streams.

Semantically, this is a little tricky. The underlying rule that Tor implements is that each stream has a tuple of attributes (A_1, A_2... A_n), and a bit field (b_1, b_2... b_n). Two streams S_a and S_b may share the same circuit iff, for every i such that the OR of their b_i values is true, they have the same A_i value.

Note that this is not transitive: Stream S_a may be able to share a circuit with S_b or S_c, even if S_b cannot share with S_c. Worse

Should we (1) expose these attribute tuples and bitfields and require controllers to manipulate them correctly? That seems obnoxious and error-prone.

Or should we (2) allow controllers to ask questions like "may stream A share a circuit with stream B?" Or "what streams may A share a circuit with?" This might lead to O(n) queries, and it will still be error-prone because of the non-transitivity issue.

Or would it be better to (3) oversimplify the system above and provide each stream a 'cookie' such that any two streams with the same cookie may definitely share the same circuit? But this is problematic, and will overestimate how much isolation we need.

My current best idea is that (4) we should provide an operation of the form "make stream A have the same isolation properties as stream B". And possibly "make circuit C have isolation properties as if it had been used by stream A". So we don't expose isolation information, we just expose a way to manipulate it.

Or maybe there's a further clever way I'm not even thinking about just now.

Child Tickets

Change History (17)

comment:1 Changed 3 years ago by JeremyRand

Hey Nick,

Indeed, this does seem like a complex problem. For the specific use cases that Namecoin has, my guess is that (4) is probably the simplest and least-error-prone option. However, a concern comes to mind: if a Namecoin lookup is considered to have the same isolation properties as the stream created by the client application that generated the Namecoin lookup, this presumably reveals to someone (who? The exit relay? A Namecoin P2P node? The service being accessed? All of the above?) that the Namecoin lookup and the client application stream are linked.

Is this a problem? If the protocol is HTTP(S), then the accessed service can already tell that it's being looked up using Namecoin by checking the Host header (and if TLS isn't used, then the exit relay can also tell this). If the protocol uses TLS, then the accessed service and the exit relay can tell it's being looked up using Namecoin by checking the SNI header (although SNI could be disabled for some protocols that use TLS, and perhaps a future version of TLS will offer SNI encryption). If it's some other protocol that doesn't have a Host-like header, then I don't see any way that the use of Namecoin is inherently detectable. Since Tor is used for arbitrary TCP traffic, I am hesitant to endorse a solution that unnecessarily reveals to adversaries that Namecoin is in use, particularly since Namecoin usage is rather rare right now and therefore this narrows the anonymity set substantially.

From the point of view of the Namecoin P2P node, looking up a Namecoin name doesn't necessarily reveal everything about how it will be used. For example, it normally doesn't reveal what subdomain is being looked up, nor does it reveal what record types are being looked up (IPv4, IPv6, TorHS, TLSA, SSHFP, etc.). It's not immediately obvious to me how much of this extra data is revealed, and to whom, as a result of the service being accessed over the same circuit, but it makes me uneasy without a clear, convincing argument that it's safe.

So, possible solution: offer an operation of the form "make stream A have the same isolation properties as stream B, except that stream A must also be isolated from stream B".

Is this modification worth it? Would it put much extra load on the Tor network? Are my concerns about revealing extra information to adversaries justified, or am I being overly cautious about something that isn't really a threat?

Cheers.

comment:2 Changed 3 years ago by JeremyRand

Cc: JeremyRand added

comment:3 Changed 3 years ago by dgoulet

Keywords: triage-out-030-201612 added
Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final

Triaged out on December 2016 from 030 to 031.

comment:4 Changed 3 years ago by nickm

Points: 3

comment:5 Changed 3 years ago by nickm

Keywords: triaged-out-20170308 added
Milestone: Tor: 0.3.1.x-finalTor: unspecified

Deferring all 0.3.1 tickets with status == new, owner == nobody, sponsor == nobody, points > 0.5, and priority < high.

I'd still take patches for most of these -- there's just nobody currently lined up to work on them in this timeframe.

comment:6 Changed 2 years ago by dgoulet

Keywords: tor-hs added; hidden-services triage-out-030-201612 removed

comment:7 Changed 2 years ago by nickm

Keywords: tor-control dns isolation needs-spec needs-design term-project added; needs-proposal triaged-out-20170308 removed

comment:8 Changed 4 weeks ago by JeremyRand

Tor patch at https://notabug.org/JeremyRand/tor/src/stream-socks-auth (Git commit hash 5c57583ad3efbaf711ce9a1b967ccfbac1db9e1d).

I ended up going with Nick's option (1), for the following reasons:

  1. It keeps the Tor patch simple, and allows the controller (which is written in a higher-level language (Python in my case) and is therefore less prone to memory safety bugs) to handle the isolation logic.
  2. For performance reasons, Namecoin needs to preemptively open connections before the STREAM events are received. So whatever isolation data the controller receives will be used to assign resolution commands to existing Namecoin connections; this limits the applicability of Nick's option (4).
  3. There's already precedent for this in the CIRC event (which sends the stream isolation data fields verbatim), and it seems weird to make the STREAM event have a wildly different design than the CIRC event for the same functionality.

I infer from the keywords on this ticket that you'll also want a spec patch; is that correct? Is there anything else needed for this?

comment:9 Changed 4 weeks ago by nickm

Milestone: Tor: unspecifiedTor: 0.4.3.x-final
Status: newneeds_review

Thanks for the patch!

I infer from the keywords on this ticket that you'll also want a spec patch; is that correct? Is there anything else needed for this?

That's right; the spec patch should document the new behavior in control-spec.txt.

It would also be really good to have tests for the new code here.

comment:10 Changed 4 weeks ago by JeremyRand

That's right; the spec patch should document the new behavior in control-spec.txt.

Okay, I'll work on that next. I should have a spec patch for you within a few days.

It would also be really good to have tests for the new code here.

This patch is mostly a copy/paste job of the existing code for the CIRC event. So ordinarily I'd base the tests for this patch on the CIRC tests as well, but a brief look at the tests didn't find any tests for that CIRC code. Am I correct in thinking that the CIRC stream isolation fields don't have any tests right now, or did I miss something?

If that's correct, I'm still okay with writing tests if you want, but given that this is my first time hacking on the core Tor code, it's likely that I'll ask for some minor hand-holding on IRC, as the test codebase used by core Tor is not something that I have any familiarity with.

comment:11 Changed 4 weeks ago by JeremyRand

I should have a spec patch for you within a few days.

Whoops, did I say a few days? Clearly I meant an hour. :) Spec patch at https://notabug.org/JeremyRand/torspec/src/stream-event-isolation (Git commit hash f0364ccac62cff3334d9d7bad5340fd785096009).

comment:12 Changed 4 weeks ago by asn

Reviewer: nickm

comment:13 Changed 4 weeks ago by nickm

Thanks for the patch! Before I get too deep into the code, I'll review the spec.

It would be good if the programmer who uses this feature doesn't need to know in advance a complete list of all the different things that can be isolated. So for example, if we later add isolation based on source port, or isolation based on destination address or TLD, it would be good if the programmer doesn't need to change their code accordingly.

So what if, instead of having a bunch of different ISOLATE_XYZ flags, we have one item that is a comma-separated list of entries flags, each matching the name of one of the other fields?

Also, I suggest that we mark all of these fields as optional, since older Tors will not provide them, and not all protocols necessarily have them.

Finally, the list of ClientProtocol values should mention that they can be extended in the future.

comment:14 Changed 4 weeks ago by nickm

Status: needs_reviewneeds_revision

Looking at the code, it looks plausible. I'd suggest writing unit tests mainly for the new entry_connection_describe_status_for_controller function. Also the argument to that function should be const. But before you revise the code, I'd suggest doing a new version of the spec patch so we're on the same page about the desired behavior.

Thanks for all the code; it's looking nice!

comment:15 Changed 10 days ago by JeremyRand

Thanks for the review Nick!

So what if, instead of having a bunch of different ISOLATE_XYZ flags, we have one item that is a comma-separated list of entries flags, each matching the name of one of the other fields?

If I'm understanding you correctly, you're suggesting having an ISO_FIELDS field, such that ISO_FIELDS=FOO,BAR means that the FOO and BAR fields of the STREAM event represent fields that the SOCKS port is configured to isolate on? I think this is a good idea, but there is a caveat. Tor can isolate based on destination address, destination port, and client address, but those don't have a 1:1 correspondence to STREAM fields (they're components of fields that contain both an address and a port). So in order for this to work, I'd need to add 3 additional fields to STREAM events that are arguably redundant. Is that acceptable?

Agreed on the other spec suggestions.

comment:16 Changed 10 days ago by nickm

I think that's acceptable if we have to. Alternatively, we could declare that DESTADDR, DESTPORT, and CLIENTADDR are handled specially, but that all other and future isolatable fields will be handled uniformly.

comment:17 Changed 9 days ago by JeremyRand

Alternatively, we could declare that DESTADDR, DESTPORT, and CLIENTADDR are handled specially, but that all other and future isolatable fields will be handled uniformly.

This sounds preferable to me, although I think we should specify that those special values might be replaced by correspondingly named fields in the future, i.e. controllers should fall back to the special behavior if those fields don't exist, but should use the named fields if they are present. That way it's easier to transition away from the special behavior in the future if we want to. Also might as well include CLIENTPORT even though Tor doesn't currently isolate based on it, since the client port is included in the same field as CLIENTADDR. This way we don't need to change the control protocol if Tor decides to add client-port-based stream isolation in the future.

Updated spec patch at https://notabug.org/JeremyRand/torspec/src/stream-event-isolation (Git commit hash cd916518f3e3ea5f8a390d24727eadf68cb8271f).

Note: See TracTickets for help on using tickets.