Don't surprise users with new circuits in the middle of browsing

changed milestone to %Tor: 0.2.7.x-final

added PostFreeze027 TorCoreTeam201509 component::core tor/tor milestone::Tor: 0.2.7.x-final owner::yawning priority::medium resolution::fixed status::closed tbb-usability tbb-wants tor-core type::enhancement version::tor unspecified labels

Trac:
0001-Bug-15482-Don-t-abandon-circuits-that-are-still-bein.patch

Tor patch to always reset timestamp_dirty

I question whether this is something to do unconditionally, or whether it's only something to do when we have an external application managing circuit isolation for us.

I also wonder whether there shouldn't be some kind of randomized upper limit to how long this can keep a circuit alive.

Trac:
Keywords: MikePerry201503 deleted, MikePerry201503 tbb-wants added
Milestone: N/A to Tor: 0.2.7.x-final
Type: defect to enhancement

I'm attaching a patch with a more cautious (but more complex, and untested) approach. What do you think?

Another option would be to add an additional isolation flag, ISO_TIME, that could be turned off on a given SocksPort.

Trac:
0001-More-cautious-approach-to-increasing-circuit-lifetim.patch

Trac:
bug15482.patch

Simpler version of Nick's patch for TBB 4.5a5 (no randomness or max).

GeKo and I both agree that the SOCKS isolation check is important here. However, I felt that the randomness was definitely not the best plan in its current form, because sometimes circuits could still end up very short lived, still allowing TBB to surprise the user with a new circuit at random intervals.

I also wasn't 100% clear on the reasoning for having a max at all, so I removed that too. I attached the version we're building 4.5a5 with here: https://trac.torproject.org/projects/tor/attachment/ticket/15482/bug15482.patch.

We can revisit the idea of a max and/or randomization sometime between now and 4.5-stable (which should hopefully be in ~2-3 weeks).

Trac:
Cc: N/A to gk

The way I thought I implemented the randomness didn't allow for short-lived circuits. It wasn't choosing randomly between [0,X], but instead between [X, X*9/8].

For me, the idea of no maximum at all is a bit scary; it means that keepalive-type stuff will keep a circuit open forever even if the user doesn't expect that it would.

Finally -- did somebody test out the isolation thing, to make sure that only authentication-isolated circuits get the new behavior? I only wrote it; it does need some testing.

Trac:
Cc: gk to gk, mcs

Trac:
Cc: gk, mcs to gk, mcs, anonym

Trac:
Keywords: MikePerry201503 tbb-wants, TorBrowserTeam201503 deleted, MikePerry201503, tbb-wants, TorBrowserTeam201504 added

https://research.torproject.org/ideas.html discusses Tor circuit reuse as rotating every 10 minutes. I saw that #13766 (moved) was invalidated and closed, but if the 10 minutes rotation approach is retired, let's update that documentation as well.

Replying to nickm:

The way I thought I implemented the randomness didn't allow for short-lived circuits. It wasn't choosing randomly between [0,X], but instead between [X, X*9/8].

You're right. In my haste I misread this.

For me, the idea of no maximum at all is a bit scary; it means that keepalive-type stuff will keep a circuit open forever even if the user doesn't expect that it would.

Are there specific attacks or types of attacks that make you nervous here?

I believe there is a potential concern that we're making e2e correlation easier with longer circuit lifespans, but I think I am more worried about the case where I leave my browser open overnight and some website keeps pinging itself every so often to cause a new circuit to get built in order to discover my guard node. I am also worried about non-malicious but dumb websites that do this anyway, and end up exposing me to the entire Tor exit node population over the course of a few hours (during which time malicious exits get more chances to try to own my browser, sniff my cookies, perform e2e correlation, mount their own guard discovery attacks, or otherwise mess with me).

Guard discovery and exit node churn were the main security reasons I wanted a huge circuit dirtiness timeout in #13766 (moved), until Roger pointed out that a fixed-length custom circuit dirtiness would provide a 100% accurate distinguisher for Tor Browser users at the Guard node.

In my ideal world, we would find some way to make this distinguisher statistical (instead of absolute, instantaneous certainty) while still ensuring really long circuit lifetimes for websites that would otherwise cause circuit churn. Plenty of long-term semi-accurate statistical classifiers likely exist for Tor Browser traffic at the guard node, so this tradeoff seems like the right one to me.

Finally -- did somebody test out the isolation thing, to make sure that only authentication-isolated circuits get the new behavior? I only wrote it; it does need some testing.

That's what alphas are for :). In fact, I'm revisiting this ticket right now because I'm still personally experiencing cases where my exit IP has changed in the middle of actively interacting with a website, and this change has disrupted my browsing activity. I'd like to brainstorm ways of making the circuit dirtiness larger without exposing us to attacks. However, I'm also heavily biased in the usability and experimentation direction here, due to the uncertain (and conflicting) nature of the security issues at hand...

My current (admittedly completely haphazard and half-baked) inclination is to do something like randomizing the initial per-circuit dirtiness between 10 minutes and a much larger value, reset the dirtiness to rand(0,10min) upon stream close (in circuit_detach_stream()), and still omit any maximum value on dirtiness. I'm updating this ticket to give you a chance to talk me down from this cliff, ideally within the next few days before I jump and commit a patch to do something insane like this for 4.5-stable ;).

Trac:
0001-Bug-15482-Don-t-abandon-circuits-that-are-still-in-u.patch

This version also extends the lifetime upon stream detach.

Ok, I attached a new version that also does the stream detach hack (which I added primarily because we're boosting our HTTP keep-alive back up to 2 minutes thanks to solving #4100 (moved)). This version also adds some debug loglines, and I tested it with TBB and torsocks 1.2 and 2.0. The SOCKS u+p check works, but I noticed that the old torsocks 1.x actually sends the UNIX username as the SOCKS username by default, but the new torsocks 2.0 has options for SOCKS u+p that are off by default.

I also noticed that there is still a subtle distinguisher here. If a non-hidserv circuit has been alive for more than 10 minutes after first use, the only way this could happen without this patch is if a stream was still open on this circuit. In that case, a normal Tor client would close that circuit immediately after receiving the RELAY_END cell from upstream. However, clients running any version of this patch will keep non-hidserv circuits open past the 10 minute mark, and then close them without necessarily receiving a cell from upstream.

I am not sure what to do about this. I think it would require a way to reliably differentiate hidserv from non-hidserv circuits to use effectively, but it might be pretty accurate after that. Does this distinguisher trump the usability win here?

I now have a proper git branch with something resembling what I'd like to have for TBB 4.5 and beyond: https://gitweb.torproject.org/mikeperry/tor.git/commit/?h=bug15482

I decided to create a SocksAuthCircuitRenewPeriod torrc option that governs how long we extend the lifetime of socksauth-isolated circuits each time a new stream arrives on them. I set the default value to 1 hour.

I also added code in circuit_is_acceptable() to allow us to keep using a circuit with SOCKS u+p auth even if it was otherwise too dirty. I did this because when we enable HTTP/2 (#14952 (moved)), we'll have super-long-lived connections that may actually exceed even the 1 hour circuit lifetime extension. To preserve our circuit UI and isolation model, we'll want to keep using the same circuit for new connections with the same auth in this case.

Because the circuit_detach_stream() hack makes it easier to differentiate TBB users (since it removes any possibility of a circuit closing immediately after RELAY_END), I placed that in its own commit. I don't think I'll actually use this commit in 4.5, though I do think it will improve behavior once HTTP/2 is enabled.

Along the way, I noticed that circuit_is_better() has a serious bug where the circuit purpose value was actually being obtained incorrectly, causing the majority of that function body to be skipped, so I fixed that. When this is fixed, we also need to ensure that we actually keep using SOCKS auth circuits if a stream arrives with that same SOCKS auth, otherwise we'll actually increase circuit churn.

I'm going to let that branch run on my TBB through the weekend and keep an eye on the loglines, and if it still seems good to me by Monday, I'll probably apply everything but the circuit_detach_stream() commit to the 4.5-stable release.

Trac:
Status: new to needs_review

Don't surprise users with new circuits in the middle of browsing

Child items 0

Activity