So instead of raising the timeout, let's make normal circuits behave more like hidden service circuits: keep resetting their timestamp_dirty every time a new stream is attached. This has the effect that a user will never suddenly get a new circuit in the middle of actively using a website, which will be a huge usability improvement. Still not ideal, but good enough to leave the actual circuit dirtiness timeout alone.
I am going to do this with a TBB-specific Tor patch for now so we can test this in TBB 4.5a5. We can then decide if we want to make this a torrc option after that.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I question whether this is something to do unconditionally, or whether it's only something to do when we have an external application managing circuit isolation for us.
I also wonder whether there shouldn't be some kind of randomized upper limit to how long this can keep a circuit alive.
GeKo and I both agree that the SOCKS isolation check is important here. However, I felt that the randomness was definitely not the best plan in its current form, because sometimes circuits could still end up very short lived, still allowing TBB to surprise the user with a new circuit at random intervals.
The way I thought I implemented the randomness didn't allow for short-lived circuits. It wasn't choosing randomly between [0,X], but instead between [X, X*9/8].
For me, the idea of no maximum at all is a bit scary; it means that keepalive-type stuff will keep a circuit open forever even if the user doesn't expect that it would.
Finally -- did somebody test out the isolation thing, to make sure that only authentication-isolated circuits get the new behavior? I only wrote it; it does need some testing.
https://research.torproject.org/ideas.html discusses Tor circuit reuse as rotating every 10 minutes. I saw that #13766 (moved) was invalidated and closed, but if the 10 minutes rotation approach is retired, let's update that documentation as well.
The way I thought I implemented the randomness didn't allow for short-lived circuits. It wasn't choosing randomly between [0,X], but instead between [X, X*9/8].
You're right. In my haste I misread this.
For me, the idea of no maximum at all is a bit scary; it means that keepalive-type stuff will keep a circuit open forever even if the user doesn't expect that it would.
Are there specific attacks or types of attacks that make you nervous here?
I believe there is a potential concern that we're making e2e correlation easier with longer circuit lifespans, but I think I am more worried about the case where I leave my browser open overnight and some website keeps pinging itself every so often to cause a new circuit to get built in order to discover my guard node. I am also worried about non-malicious but dumb websites that do this anyway, and end up exposing me to the entire Tor exit node population over the course of a few hours (during which time malicious exits get more chances to try to own my browser, sniff my cookies, perform e2e correlation, mount their own guard discovery attacks, or otherwise mess with me).
Guard discovery and exit node churn were the main security reasons I wanted a huge circuit dirtiness timeout in #13766 (moved), until Roger pointed out that a fixed-length custom circuit dirtiness would provide a 100% accurate distinguisher for Tor Browser users at the Guard node.
In my ideal world, we would find some way to make this distinguisher statistical (instead of absolute, instantaneous certainty) while still ensuring really long circuit lifetimes for websites that would otherwise cause circuit churn. Plenty of long-term semi-accurate statistical classifiers likely exist for Tor Browser traffic at the guard node, so this tradeoff seems like the right one to me.
Finally -- did somebody test out the isolation thing, to make sure that only authentication-isolated circuits get the new behavior? I only wrote it; it does need some testing.
That's what alphas are for :). In fact, I'm revisiting this ticket right now because I'm still personally experiencing cases where my exit IP has changed in the middle of actively interacting with a website, and this change has disrupted my browsing activity. I'd like to brainstorm ways of making the circuit dirtiness larger without exposing us to attacks. However, I'm also heavily biased in the usability and experimentation direction here, due to the uncertain (and conflicting) nature of the security issues at hand...
My current (admittedly completely haphazard and half-baked) inclination is to do something like randomizing the initial per-circuit dirtiness between 10 minutes and a much larger value, reset the dirtiness to rand(0,10min) upon stream close (in circuit_detach_stream()), and still omit any maximum value on dirtiness. I'm updating this ticket to give you a chance to talk me down from this cliff, ideally within the next few days before I jump and commit a patch to do something insane like this for 4.5-stable ;).
Ok, I attached a new version that also does the stream detach hack (which I added primarily because we're boosting our HTTP keep-alive back up to 2 minutes thanks to solving #4100 (moved)). This version also adds some debug loglines, and I tested it with TBB and torsocks 1.2 and 2.0. The SOCKS u+p check works, but I noticed that the old torsocks 1.x actually sends the UNIX username as the SOCKS username by default, but the new torsocks 2.0 has options for SOCKS u+p that are off by default.
I also noticed that there is still a subtle distinguisher here. If a non-hidserv circuit has been alive for more than 10 minutes after first use, the only way this could happen without this patch is if a stream was still open on this circuit. In that case, a normal Tor client would close that circuit immediately after receiving the RELAY_END cell from upstream. However, clients running any version of this patch will keep non-hidserv circuits open past the 10 minute mark, and then close them without necessarily receiving a cell from upstream.
I am not sure what to do about this. I think it would require a way to reliably differentiate hidserv from non-hidserv circuits to use effectively, but it might be pretty accurate after that. Does this distinguisher trump the usability win here?
I decided to create a SocksAuthCircuitRenewPeriod torrc option that governs how long we extend the lifetime of socksauth-isolated circuits each time a new stream arrives on them. I set the default value to 1 hour.
I also added code in circuit_is_acceptable() to allow us to keep using a circuit with SOCKS u+p auth even if it was otherwise too dirty. I did this because when we enable HTTP/2 (#14952 (moved)), we'll have super-long-lived connections that may actually exceed even the 1 hour circuit lifetime extension. To preserve our circuit UI and isolation model, we'll want to keep using the same circuit for new connections with the same auth in this case.
Because the circuit_detach_stream() hack makes it easier to differentiate TBB users (since it removes any possibility of a circuit closing immediately after RELAY_END), I placed that in its own commit. I don't think I'll actually use this commit in 4.5, though I do think it will improve behavior once HTTP/2 is enabled.
Along the way, I noticed that circuit_is_better() has a serious bug where the circuit purpose value was actually being obtained incorrectly, causing the majority of that function body to be skipped, so I fixed that. When this is fixed, we also need to ensure that we actually keep using SOCKS auth circuits if a stream arrives with that same SOCKS auth, otherwise we'll actually increase circuit churn.
I'm going to let that branch run on my TBB through the weekend and keep an eye on the loglines, and if it still seems good to me by Monday, I'll probably apply everything but the circuit_detach_stream() commit to the 4.5-stable release.