Opened 7 years ago

Last modified 12 hours ago

#5707 assigned enhancement

Use end to end stream timing data to further prune circuits

Reported by: mikeperry Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: SponsorZ, performance, needs-research, tor-client, mike-0.2.5
Cc: arma, r_a@…, mikeperry Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

At FC12, a "congestion aware" path selection algorithm was proposed. http://fc12.ifca.ai/pre-proceedings/paper_46.pdf

It's broken, and nobody should ever use it. It ends up restricting the available paths to an unacceptably low and hard to analyze quantity of the network total. Probably something in the ballpark of 20%.

Instead, we should reuse the CBT code+math with end to end stream timing probes (for ex: STREAM CONNECT requests to 127.0.0.1). I will bet my "got it by accident along the way" Math degree that the resulting distribution is Pareto, which means we can accurately predict to the percentile how many paths we would like to reject at this phase using the same code CBT uses.

logder argues that this approach is better than CBT, because CBT primarily measure CPU load, not latency, and also the CREATE cell queues are different than the RELAY cell queues, which makes circuit building a poor performance metric for normal traffic. I don't see a good reason not to do both and pick the percentiles such that their product is still satisfyingly large. CPU saturated relays should be avoided, anyway, IMO.

Child Tickets

Attachments (1)

cbt_first-rtt_guard_comparison.pdf (1.6 MB) - added by ra 6 years ago.
Comparison of guards by their CBT and First-RTT

Download all attachments as: .zip

Change History (23)

comment:1 Changed 7 years ago by nickm

Keywords: needs-research added
Milestone: Tor: unspecified

comment:2 Changed 7 years ago by mikeperry

FYI: Research here doesn't mean we need to wait for an academic paper. There's a ton of circuit build time graphing scripts in torflow at https://gitweb.torproject.org/torflow.git/tree/HEAD:/CircuitAnalysis/BuildTimes

These could be adapted to record + graph stream attempts to localhost, to measure time until you get the STREAM_CLOSE REASON=EXIT_POLICY response.

A couple of graphs of this should be all we need to determine if we can reuse the math.

comment:3 Changed 7 years ago by nickm

Keywords: tor-client added

comment:4 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:5 Changed 7 years ago by mikeperry

Keywords: SponsorZ added

comment:6 Changed 7 years ago by mikeperry

Summary: Use end to end timing data to further prune circuitsUse end to end stream timing data to further prune circuits

comment:7 Changed 7 years ago by mikeperry

I wrote some example end-to-end probing code for #7691. Could easily be generalized/adapted for stream timing data collection.

Also note that for purposes of things like #7691, we do not want any activity we *always* do over a circuit to cause it to be counted as "successfully used". Otherwise, path bias attackers can anticipate this activity and let it through before failing the circuit.

comment:8 Changed 6 years ago by mikeperry

Milestone: Tor: unspecifiedTor: 0.2.5.x-final
Owner: set to mikeperry
Status: newassigned

I think I want to try writing this for 0.2.5.x.

comment:9 Changed 6 years ago by mikeperry

I've been brainstorming about how to best deploy something like this. I have the following high-level ideas about how we'd use it to best effect:

  1. We create a LowLatencyPorts torrc option to include ports where interactive latency matters more than connection setup time. This would include such things as SSH, Skype, Mumble, jitsi, etc.
  1. We set two pairs of consensus parameters: one for these sets of ports, and another for predictive-built circuits. Each pair would have a CBT part and and a Stream Ping Timeout part (SPT).
  1. If a stream request (or predictive build) comes in for a LowLatencyPort and we don't have any such circuits available, we apply the CBT value for the construction, and then run the ping and apply that. We then flag any such circuits that survive the CBT timeout and the SPT timeout as acceptable for use for such ports.
  1. For normal predictive circuits, we would apply CBT + SPT, but with higher (more lenient) values. For all other on-demand circuits, we'd apply CBT only (so you don't have to wait for a end-to-end ping before using them).

I'm thinking that the cutoffs would be something like CBT=85 SPT=85 for predictive circuits (yielding the "fastest" 72% of network paths for them), CBT=80 SPT=50 for LowLatencyPort circuits (yielding the "fastest" 40% of paths for them), and CBT=75 for all other on-demand circuits.

Of course, we'll want to tune LowLatency SPT cutoffs such that they can actually support voice traffic.

I think this hack alone will get us to the SponsorF deliverable for voice.

comment:10 Changed 6 years ago by arma

All the researchers doing Tor anonymity analysis get really agitated when we add new path selection approaches that aren't based on global information. And assuming the congestion is inside the network, where you're connecting from shouldn't make a big impact. And finally, all these "local not global" approaches raise complex questions about an adversary who influences a target user's opinions to influence her paths.

So the first question is, how well can we approximate your above plans with probers (a la bwauths)? And the followup question is, how much information do we need to put into e.g. the consensus for it to work?

Also, you should know that Micah Sherr's 'virtual coordinate system' plan has some code somewhere, though I have so far failed to publically pry it out of them.

comment:11 in reply to:  10 Changed 6 years ago by mikeperry

Replying to arma:

All the researchers doing Tor anonymity analysis get really agitated when we add new path selection approaches that aren't based on global information. And assuming the congestion is inside the network, where you're connecting from shouldn't make a big impact. And finally, all these "local not global" approaches raise complex questions about an adversary who influences a target user's opinions to influence her paths.

So the first question is, how well can we approximate your above plans with probers (a la bwauths)? And the followup question is, how much information do we need to put into e.g. the consensus for it to work?

Tor latency has a heavy weight on local characteristics in many instances.

In fact, the consensus is *exactly* the right place to put a timeout to allow a local/censorship adversary to look at the consensus, look at your traffic, and delay your circuit by just enough to make it fail when they want, so your routes go *where* they want.

This information needs to be computed locally so we can ensure that your Tor client completes a predictable percentage of the total available network paths, exactly like the CBT math does.

In fact, that math actually *works* in the current tor implementation. To within +/- 1%, 80% of your circuits build before the circuit build timeoutthat your client learns, and you can observe this fact in your logs and in the BUILDTIMEOUT_SET values captured by Torperf.

You keep telling me not to design systems optimized for just one Internet connection, and I'm telling you CBT is that system. Please read the spec and tell me how to improve it so that this is clear to you: https://gitweb.torproject.org/torspec.git/blob/master:/path-spec.txt#l309

comment:12 in reply to:  10 Changed 6 years ago by mikeperry

Replying to arma:

All the researchers doing Tor anonymity analysis get really agitated when we add new path selection approaches that aren't based on global information. And assuming the congestion is inside the network, where you're connecting from shouldn't make a big impact. And finally, all these "local not global" approaches raise complex questions about an adversary who influences a target user's opinions to influence her paths.

So the first question is, how well can we approximate your above plans with probers (a la bwauths)? And the followup question is, how much information do we need to put into e.g. the consensus for it to work?

Interpreting your suggestion another way as opposed to a globally-constructed timeout: I guess you're suggesting we could alter path selection itself based on the measured congestion/queuing information of each relay.

While I admit that this would not have the opportunities for route manipulation that a consensus timeout-based approach would, the practical problem is that the consensus updates every 2-4 hours for clients.. I don't think this is frequent enough for us to measure or model node queuing delay.

I still think we can tune the local system such that it ensures it allows within a very close tolerance to the fastest X% of paths. We can also code it to disable itself if it is consistently unable to maintain a prediction of this timeout such that the expected percentage of stream probes actually complete within that timeout. We could also add this code to CBT, and have it revert to Tor's 1 minute timeout in such a case..

Also, you should know that Micah Sherr's 'virtual coordinate system' plan has some code somewhere, though I have so far failed to publically pry it out of them.

I am also having a hard time finding a non-thesis version of this work. Is this what you're talking about: http://freehaven.net/anonbib/cache/DBLP:conf/pet/SherrBL09.pdf

However, in general, I don't think a virtual coordinate system will work, because I suspect the major issue with Tor for low-latency applications is it's own per-hop queuing delay variance, not topology..

comment:13 Changed 6 years ago by ra

Cc: r_a@… added

comment:14 Changed 6 years ago by mikeperry

Keywords: mike-0.2.5 added

Changed 6 years ago by ra

Comparison of guards by their CBT and First-RTT

comment:15 Changed 6 years ago by ra

Attached is the result of measurements of ~1.6M circuits, sorted by guards and compared their CBT and First-RTT values. Only guards that were measured at least 2k times each are included. It looks like a distribution for First-RTT is more stable then for CBT. Parameters seem to differ between guards for both CBT and First-RTT.

comment:16 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

comment:17 Changed 3 years ago by cass

Severity: Normal

This ticket is tagged SponsorZ, but it looks like progress stalled a while ago. Is this still a thing that needs funding?

comment:18 Changed 3 years ago by ra

I cannot comment on the sponsor issue. However, I would like to mention a paper we published on the ticket's topic - see https://naviga-tor.github.io/ for details (pre-print, code, and data).

comment:19 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:20 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:21 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:22 Changed 12 hours ago by gaba

Cc: mikeperry added
Owner: mikeperry deleted
Note: See TracTickets for help on using tickets.