#9063 enables Guard discovery in about an hour by websites

changed milestone to %Tor: 0.2.3.x-final

added component::core tor/tor milestone::Tor: 0.2.3.x-final priority::high resolution::fixed status::closed type::defect version::tor 0.2.4.13-alpha labels

Two more variations:

A) A website just causes Alice to launch 2200 stream connections. Then she gets 2200 connected cells heading her way.

B) If Alice is using the leaky pipe design and has two circ-window-maxes heading her way, but she's also uploading stuff and triggers a sendme cell to return to her, that brings us to 2001.

The attack enabled by #9063 (moved) is extremely similar to the Guard discovery attack from rpw's paper.

It would appear to allow a malicious website to find your guards in about an hour..

Rock, meet the hard place :/.

Trac:
Cc: robgjansen, Flo, nickm to robgjansen, Flo, nickm, rpw
Priority: normal to major

Attack is courtesy of our friend boboper.

Thanks bob!

Trac:
Summary: #9063 (moved) went too far to #9063 (moved) enables Guard discovery of 0.2.4.x clients in about an hour by websites

See also #9001 (moved) for potential longer-term solutions.

Actually, this is all clients. Bob pointed out that it is their Guards who have to upgrade for them to be vulnerable, not them.

Pretty bad stuff.

Perhaps we should remove #9063 (moved) and save it for Gaurds who actually experience OOMing?

Trac:
Summary: #9063 (moved) enables Guard discovery of 0.2.4.x clients in about an hour by websites to #9063 (moved) enables Guard discovery in about an hour by websites
Priority: major to critical

Geez, fuck -- yeah, probably roll it back unless/until we can find a smarter way to implement it.

My bug9072-023, bug9072-024 and bug9072-025 #if out the #9063 (moved) code.

Trac:
Status: new to needs_review

Hm. Targetted OOM, or guard discovery. Not a great choice.

We might need to do the 0.2.5 version of a #9063 (moved) fix that we were talking about, where we defined a maximum space allocatable for cells, and do some kind of OOM killer on on clogged-looking circuits if we run out of space for new queued cells. Not great, and not delightful to contemplate for an 0.2.4 backport, but the OOM situation isn't greater either.

andrea: feel free to merge those for now .

This doesn't make discovering a client's entry guards significantly easier. It does make failing a client's circuits from the exit/destination end significantly easier, and as soon as the path-bias detector is turned on, that becomes a full remote user-tracing attack.

Here are a few sketch oom-killer algorithms. Let's try to find flaws in them and find something better.

Algorithm 1.

have a max number of allocated cells configured.
when that number is reached, consider all circuits and kill those with the longest queues until we are at X% of the limit.

Algorithm 2.

as in alg 1 above.
when he hit the limit, consider the connections with the largest number of queued cells on circuits aiming for them.
kill those connections until we are at X% of the limit.

Algorithm 3. 1, 2) as in alg 2 above. 3) kill the biggest circuits on those connections until we are at X% of our limit.

Algorithms 1T, 2T, 3T: As algorithms 1, 2, 3, but instead of longest queue, look at longest flush time, where flush time is the queue length divided by the connection's observed bandwidth.

IMO none of 1, 2, 3 is too complex for 0.2,4.

Algorithms 1U, 2U, 3U:

As algorithms 1T 2T and 3T, but define a queue's flush time as some function that accounts for the number of other circuits targetting the same connection. If we were using round robin, I would say: N/BW where bw is the conn observed bandwidth and N is the sum of all queue lengths on that connection, each queue length clippedd to be no longer than the queue under consideration.

Instead of closing circuits or connections in response to OOM, you could check whether the relay has hit its rate limit, and if it hasn't, send a bunch of cells along the circuit/connection. (Perhaps the next relay will have more room for them.)

How long are cell queues for normal clients' circuits and connections in practice?

"Maximum number of cells in entrys's circuit queue = 8 * (1000_data + 10_sendmecirc + 20_sendme_stream + N * (1_connected + 1_end)), N is number of maximum allowed streams for client"

"so 10k is actual number for client with 100 streams limit"

(quoting from irc)

If we disable the leaky pipe feature, 2k cells is enough, with 100 stream limit.

Of course, this isn't counting drop cells, which we might want to add in at some point to help with website fingerprinting.

Replying to nickm_mobile:

Algorithms 1U, 2U, 3U:

As algorithms 1T 2T and 3T, but define a queue's flush time as some function that accounts for the number of other circuits targetting the same connection. If we were using round robin, I would say: N/BW where bw is the conn observed bandwidth and N is the sum of all queue lengths on that connection, each queue length clippedd to be no longer than the queue under consideration.

Is there any reason not to simplify this down to the circuit level only? Anything based on connections is easily subverted by a sybil attack. And you can't consider the time-independent queue length alone because of the sybil attack.

The goal is to drive up the cost of the attack while making sure you are not killing an honest client's circuit. If you consider a function of the queue length and the waiting time of the longest waiting cell for each circuit, then a malicious client will have to read cells at a rate at least as fast as the slowest honest client on every sybil in order to cause the relay to target the honest client's circuit. It can be shown that the longer the attacker's queue becomes, the faster it must read from it to avoid selection because it needs to keep the waiting times of the cells in the malicious circuits lower than those of all honest circuits.

I guess authenticated SNEDMEs, where the exit inserts a 1 byte nonce into every 100th cell and won't increase the package window unless the SENDME contains the nonce, is also off of the table since like the queue length limit defense, it doesn't handle non-data cells.

I imagine its not a simple fix, but what are the other problems with making all classes of relay cells count against the circuit package window?

#9063 enables Guard discovery in about an hour by websites

Child items ...

Activity