Project: See if we can improve performance by throttling busy streams at guard nodes

changed milestone to %Deliverable-Sep2010

added component::core tor/tor milestone::Deliverable-Sep2010 priority::medium resolution::implemented status::closed tor-relay type::task labels

Trac:
Description: In particular see http://archives.seul.org/or/dev/Dec-2009/msg00002.html and discussion stemming from Proposal 163. Basically, we want to keep individual clients from hammering the network too hard when we need the banwidth to relay traffic. We have a partial implementation, and Roger says that current next steps are performance measurements of some kind:

For #2 (closed), look at bwconnrate and bwconnburst in the consensus. Next step for #2 (closed) are to do more rigorous performance comparisons (turn it on, off, on, off, compare torperf results).

to

In particular see http://archives.seul.org/or/dev/Dec-2009/msg00002.html and discussion stemming from Proposal 163. Basically, we want to keep individual clients from hammering the network too hard when we need the banwidth to relay traffic. We have a partial implementation, and Roger says that current next steps are performance measurements of some kind:

For #2 (closed), look at bwconnrate and bwconnburst in the consensus. Next step for #2 (closed) are to do more rigorous performance comparisons (turn it on, off, on, off, compare torperf results).

Child Tickets: [[TicketQuery(parent=#1750 (moved))]]

Trac:
Description: In particular see http://archives.seul.org/or/dev/Dec-2009/msg00002.html and discussion stemming from Proposal 163. Basically, we want to keep individual clients from hammering the network too hard when we need the banwidth to relay traffic. We have a partial implementation, and Roger says that current next steps are performance measurements of some kind:

For #2 (closed), look at bwconnrate and bwconnburst in the consensus. Next step for #2 (closed) are to do more rigorous performance comparisons (turn it on, off, on, off, compare torperf results).

Child Tickets: [[TicketQuery(parent=#1750 (moved))]]

to

In particular see http://archives.seul.org/or/dev/Dec-2009/msg00002.html and discussion stemming from Proposal 163. Basically, we want to keep individual clients from hammering the network too hard when we need the banwidth to relay traffic. We have a partial implementation, and Roger says that current next steps are performance measurements of some kind:

Look at bwconnrate and bwconnburst in the consensus. Next step is to do more rigorous performance comparisons (turn it on, off, on, off, compare torperf results).

Child Tickets: [[TicketQuery(parent=#1750 (moved))]]

This project is tricky for two reasons:

A) Only relays after 0.2.2.7-alpha respect these consensus params, so any performance changes we see will be highly variable.

B) If we enable these params and they actually work, then the bwauthority measurers are going to start thinking that relays are slower than they are. Right now the bwauth scripts push more bytes over relays they believe to be fast. They'll get throttled now. Worse, they'll only punish the relays running 0.2.2.7-alpha or later, so those relays will get lower weights in the consensus, see less use from clients, and look like they're getting faster even if they're not. The short-term fix is to make the Tor that each bwauth uses be a relay (but advertise very little bandwidth), so it isn't throttled.

Let's pretend to be scientists, and come up with an actual experiment here.

Turning on the feature network-wide and trying to discern the results is going to be tricky, since many of the relays will ignore it.

How about we run two relays on the same machine with the same bandwidth constraints, one with the feature enabled, one without, run a special torperf for each of them that hard-codes that node as its entry, and see how things go? The problem there is that randomly assigned load from other users will cause the two relays to get different weights, and then we'll have an extra variable that we don't want.

Another option is to run one fast relay that has the Guard flag, with a specialized torperf that hard-codes it as its entry, and turn the feature on and off and compare data points. I think that will get us the more precise tests we want.

Which leaves two further questions: A) What should the period be of turning it on and off? My first thought is 24 hours on, 24 hours off, for a week or two, should give us some sense. B) What bwconnrate/burst should we pick? I had originally imagined something like 5KB rate and 2000KB burst. But there are other options we could pick too.

We might want to start the torperf running on that Guard for a few days first, to get a baseline.

We could use fluxe3 for the experiment. It is a guard that is overloaded, so I guess results should be noticable?

So what do we expect to see in the torperf results if we set those params?

For the 5MB stats, the first 2MB come at the previous pace, but the last 3MB take a minimum of 3000/5=600 extra seconds, or 10 more minutes. That's a serious increase over the current 5MB stats, which average 50 to 100 seconds currently.

So we should certainly expect the 5MB results to jump. But what about the 50KB and 1MB results? They could be the same, or they could become better. It really depends how many connections there are that are sucking down more than 2000+5n/s kilobytes, and how much spare bandwidth there is. That is, for relays that have plenty of bandwidth to play with, we shouldn't expect to see an improvement in 50KB or 1MB results if this is the only relay that's doing it.

So the first observation is that for best results, we should do it on a relay that's in the sweet spot -- not so fast that it can handle all its users (relative to their second hop relay), and not so slow that its users are getting throttled close to 5KB/s anyway. I'm not clear where that sweet spot is.

The second observation is that we are expecting network effects from this change, that can't be seen just by changing one relay. In particular, we are hoping that the second hop and third hop will become faster too if other relays in the network stop introducing so much congestion. So just varying a single entry guard and leaving the rest of the network alone won't give us the whole picture.

The third observation is that it isn't really stable long-term to leave any given choice of parameters enabled in the consensus. What we're trying to do is divide typical web browsing users from typical bulk-download users (such that we punish basically none of the web browsing users), and then we're hoping that squelching the bulk-downloading users will result in less congestion throughout the network. Ultimately I am guessing the params should be a function of traffic we're seeing. This question would be a great one for some researcher to help out with. It sure would benefit from a Tor network simulator too.

Sebastian has set up a torperf that enters through fluxe3, and he's alternating between 5kb/2000kb and not setting per conn bandwidth constraints, and noting the times he switches.

(I talked to Ian briefly about doing the switches at regular intervals, and he was convinced that doing them at irregular intervals is better. Otherwise we'll always wonder if there's a weird pattern with a period of 2 weeks that we're measuring instead.)

Ok, turns out the perconnbwrate/burst options weren't doing what I wanted.

Bug 1830 is now (hopefully) resolved. Once I put out a new alpha, and weasel makes a deb, Sebastian can upgrade and then start the experiment again.

Personally I want to say that I've really never liked this option. We should make it as light as possible, because people do send important yet big files over Tor, such as protest video or leaked surveillance video. I think 5Mbyte is waay to small for this.

If our goal is to stop bittorrent users, we can probably get away with more like 100Mbyte and 50Kb/sec. We should also take the smaller value from the bw authorities. We should at least test higher values when trying to make a final decision on what to actually deploy.

Also, right now we are apparently throttling upstream and downstream independently. If the idea of this feature is to kill P2P, we should be summing upstream and downstream bandwidth, and throttling orcons in both directions once the sum of traffic exceeds a high limit. P2P will be transmitting in both directions, so it is like setting half the limit for it for free.

Trac:

ECDF of request completion times

Replying to mikeperry:

Personally I want to say that I've really never liked this option. We should make it as light as possible, because people do send important yet big files over Tor, such as protest video or leaked surveillance video. I think 5Mbyte is waay to small for this.

I think any reasonable fixed value is too inflexible for this.

See https://blog.torproject.org/blog/research-problem-adaptive-throttling-tor-clients-entry-guards for the next research steps.

I'm going to close this trac entry, labelling our early experiment as a success.

Sebastian, feel free to continue your experiment if you like, or you can repurpose fluxe3 to some new experiment. Thanks!

Trac:
Resolution: N/A to implemented
Status: new to closed

Trac:
Keywords: N/A deleted, tor-relay added

Trac:
Component: Tor Relay to Tor

closed

moved to tpo/core/tor#1750 (closed)

mentioned in issue tpo/core/tor#1750 (closed)

Project: See if we can improve performance by throttling busy streams at guard nodes

Child items 0

Activity