Florian Tschorsch and Björn Scheuermann have written a patch to a) disable the outgoing token bucket to avoid the "double door effect" and b) reduce the token bucket refill interval from 1 second to, say, 100 milliseconds or less.
Sebastian and I discussed the patch with Florian and Björn. See the latest version of the patch in branch tokenbucket in my public repository.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Removing the outgoing token bucket entirely is not okay with me, for the reasons discussed on the or-dev thread, so I'm going to treat this as preliminary.
Is it possible to separate the more frequent refill interval part of this from the double-door part? They seem like they should be logically separable.
Trac: Milestone: N/Ato Tor: 0.2.3.x-final Status: new to needs_review
Removing the outgoing token bucket entirely is not okay with me, for the reasons discussed on the or-dev thread, so I'm going to treat this as preliminary.
We propose to continue using a token bucket on the incoming side to limit the incoming rate anyway. It will also be easy to monitor (not limit!) the outgoing rate. We might make use of these two mechanisms to proceed as follows: Every time traffic is generated in our relay we remove a corresponding amount of tokens from the bucket on the incoming side. Thereby slowing down the rate at which data flows into the router.
It is easy to see that it will limit the outgoing traffic to the configured limit -- while at the same time it does not need any rate limiting mechanism on the outgoing side (it therefore avoids any double-door effects). It is also reasonably simple and will thus (hopefully) not cause undesired side effects.
If you agree with this modification, I would love to implement the changes.
Is it possible to separate the more frequent refill interval part of this from the double-door part? They seem like they should be logically separable.
Both enhancements are sperated in our implementation already. They can be configured by two new parameters: DisableOutgoingTokenBucket and TokenBucketRefillInterval. Nevertheless we recommend to use both, since using smaller refill intervals only will amplify the double-door effects.
Just a small update concerning our patch:
We ran two identical configured relays (0.2.3.0-alpha-dev, non-exit, 500kbps) with and without our patch enabled for a time period of one week.
The standard Tor relay queued 16% of all relayed cells for one second or more.
In contrast, with our patch and a configured refill period of 10ms, only 2% of all relayed cells get queued for one second or more.
Thus the results reveal the great impact of our patch.
Therefore we should push forward the discussion/ reviewing of our proposed modifications.
So, did you write the patch that can guarantee that a configured maximum amount of traffic is never exceeded? That sounds like the next thing to review here going forward, because we cannot use any code that doesn't have that guarantee.
I am skeptical about the "decrement the in-bucket when traffic is 'generated internally' " approach. It seems that it would require pervasive changes throughout the codebase. I bet that the "adaptively decrease the read-rate until the write-limit is (almost) never reached)" approach would yield significantly more reliable and maintainable code.
As for separating stuff: I meant separate patches, so that the features can be merged and reviewed independently. If the short-refill-interval approach yields significant improvements, it shouldn't have to wait for the double-gating fixes to get revised. I just talked to Karsten and Sebastian, and they said they'd like to try to split up the patch series into independent patches. That'd be great!
As for separating stuff: I meant separate patches, so that the features can be merged and reviewed independently. If the short-refill-interval approach yields significant improvements, it shouldn't have to wait for the double-gating fixes to get revised. I just talked to Karsten and Sebastian, and they said they'd like to try to split up the patch series into independent patches. That'd be great!
Yes, Sebastian and I can help. Where do we find the latest branch?
I am skeptical about the "decrement the in-bucket when traffic is 'generated internally' " approach. It seems that it would require pervasive changes throughout the codebase. I bet that the "adaptively decrease the read-rate until the write-limit is (almost) never reached)" approach would yield significantly more reliable and maintainable code.
Is it so? With my current knowledge of the code, I would suspect that there are only two additional places to implement the mentioned modifications of our patch: (1) In connection_buckets_decrement() when counting bytes of directory traffic. There we need to subtract the num_written bytes off the readbucket. Btw: Apart from that a directory mirror answers requests only when capacity is available (see usage of global_write_bucket_low). (2) I am not exactly sure where it happens (probably in connection_edge_package_raw_inbuf), but we need to consider data that is less then the cells' payload. This difference needs to be subtracted off the readbucket too.
Please tell me If I missed something here. Nevertheless this approach maintains the bandwidth limits strictly. In contrast the adaptive approach seems to be more inaccurate.
As for separating stuff: I meant separate patches, so that the features can be merged and reviewed independently. If the short-refill-interval approach yields significant improvements, it shouldn't have to wait for the double-gating fixes to get revised. I just talked to Karsten and Sebastian, and they said they'd like to try to split up the patch series into independent patches. That'd be great!
We have evidence that decreasing the refill-intervals only amplifies the double-door effects and therefore queues cells even longer. On the other side it makes the traffic less bursty and smoothens it. Btw: This will probably be also the fact with buffer-events enabled, where shorter intervals are used only. Measurements that confirm both are in progress.
Therefore, in order to gain maximum improvements, we advise to enable both features. Though, technically it can be reviewed independently.
Yes, Sebastian and I can help. Where do we find the latest branch?
Once we have agreed on a modification of the original proposal, I will finish and upload it to my public git-repository. At the moment there is the original patch available only.
I am skeptical about the "decrement the in-bucket when traffic is 'generated internally' " approach. It seems that it would require pervasive changes throughout the codebase. I bet that the "adaptively decrease the read-rate until the write-limit is (almost) never reached)" approach would yield significantly more reliable and maintainable code.
Is it so? With my current knowledge of the code, I would suspect [...] I am not exactly sure [...]
Too many uncertainties. :) I think Nick has a point here. We should come up with a patch that we're 100% sure respects the configured bandwidth limit, even if it doesn't give us 100% of the performance improvement that we hope to achieve. We can still apply another patch that does what you have in mind, once that is tested more and there are fewer uncertainties.
As for separating stuff: I meant separate patches, so that the features can be merged and reviewed independently. [...]
Therefore, in order to gain maximum improvements, we advise to enable both features. Though, technically it can be reviewed independently.
Yes, Sebastian and I can help. Where do we find the latest branch?
Once we have agreed on a modification of the original proposal, I will finish and upload it to my public git-repository. At the moment there is the original patch available only.
Great. Where would we find the original patch again? :) We can split the branch tomorrow and you can apply your changes to the separate branches.
Thanks to all of you for your help.
Thanks for being persistent in trying to get this patch into Tor. :)
I squashed, splitted, and rebased your branch to have 1 commit per change (plus 1 commit for a more general change). The result is in branch tokenbucket in my public repository. Note that I messed up with the author information, but I assumed you don't care about the Git meta data as long as the ChangeLog says that you are the author(s). Or maybe Sebastian can fix that if you care.
If this branch looks okay to you, please apply any further patches on top of it, and we'll merge them into the existing commits before merging everything into master.
I also removed your proposal, because proposals are now in a separate repository. I'm attaching a patch to this ticket for adding your proposal there.
Trac: Owner: N/Ato karsten Status: needs_review to assigned
We can absolutely understand your concerns and Nick always had a point there. We are just excited about the performance improvements obtained with our initial proposal, so that we tried to find a way to bring it on. Nevertheless you are right that it might be a good first step to implement a modification as you suggest, even though we expect that it will not overcome all the performance issue under all circumstances then.
In order to meet your comments, we drafted a different, alternative design that strictly maintains configured bandwidth limits.
Let us start from our original patch (cf. tokenbucket proposal). Thus, we have a regular token bucket on the reading side with a certain rate and a certain burst size. Let x denote the current amount of tokens in the bucket. On the outgoing side we need something appropriate that monitors and constrains the outgoing rate, but at the same time avoids holding back cells (cf. double door effects) whenever possible.
Here we propose something that adopts the role of a token bucket, but realizes this functionality in a slightly different way. We call it a "credit bucket". Like a token bucket, the credit bucket also has a current fill level, denoted by y. However, the credit bucket is refilled in a different way.
To understand how it works, let us look at the possible operations:
As said, x is the fill level of a regular token bucket on the incoming side and thus gets incremented periodically according to the configured rate. No changes here.
If x<=0, we are obviously not allowed to read.
If x>0, we are allowed to read up to x bytes of incoming data. If k bytes are read (k<=x), then we update x and y as follows:
x = x - k (1)
y = y + k (2)
(1) is the standard token bucket operation on the incoming side. Whenever data is admitted in, though, an additional operation is performed: (2) allocates the same number of bytes on the outgoing side, which will later on allow the same number of bytes to leave the onion router without any delays.
If y + x > -M, we are allowed to write up to y + x + M bytes on the outgoing side, where M is a positive constant. M specifies a burst size for the outgoing side. M should be higher than the number of tokens that get refilled during a refill interval, we would suggest to have M in the order of a few seconds "worth" of data. Now if k bytes are written on the outgoing side, we proceed as follows:
If k <= y then y = y - k
In this case we use "saved" credits, previously allocated on the incoming side when incoming data has been processed.
If k > y then y = 0 and x = x - (k-y)
We generated additional traffic in the onion router, so that more data is to be sent than has been read (the credit is not sufficient). We therefore "steal" tokens from the token buffer on the incoming side to compensate for the additionally generated data. This will result in correspondingly less data being read on the incoming side subsequently. As a result of such an operation, the token bucket fill level x on the incoming side may become negative (but it can never fall below -M).
If y + x <= -M then outgoing data will be held back. This may lead to double-door effects, but only in extreme cases where the outgoing traffic largely exceeds the incoming traffic, so that the outgoing bursts size M is exceeded.
Aside from short-term bursts of configurable size (as with every token bucket), this procedure guarantees that the configured rate may never be exceeded (on the application layer, that is; as with the current implementation, an attacker may easily cause the onion router to arbitrarily exceed the limits on the lower layers). Over time, we never send more data than the configured rate: every sent byte needs a corresponding token on the incoming side; this token must either have been consumed by an incoming byte before (it then became a "credit"), or it is "stolen" from the incoming bucket to compensate for data generated within the onion router.
This modification can be implemented with moderate effort and requires changes only at the points where currently the token bucket operations are performed.
We feel that this is not the be-all and end-all solution, because it again introduces a feedback loop between the incoming and the outgoing side. We therefore still hope that we will be able to come to a both simpler and more effective design in the future. However, we believe that what we proposed above is a good compromise between avoiding double-door effects to the furthest possible extent, strictly enforcing an application-layer data rate, and keeping the extent of changes to the code small.
This new scheme sounds like a good idea to make progress. What happens to the current notion of a bandwidthburst? That is probably what M wants to be based on somehow, yes?
What happens to the current notion of a bandwidthburst? That is probably what M wants to be based on somehow, yes?
We need to distinguish two burst sizes, in the incoming and the outgoing direction. The incoming burst size is, just like before, given by the size of the token bucket on the incoming side; let me denote this size by T here. For the outgoing side, we have, analogously, a size S for the credit bucket. Then, there is a hard limit of M+S+T on the outgoing burst size: in the extreme case, the incoming bucket is full (x=T) and the credit bucket on the outgoing side is also full (y=S), so we may send out M+S+T bytes immediately. This may happen if (a) there has been lots of incoming traffic which "vanished" inside the router before (so that the credit bucket became full) and (b) there is currently no incoming traffic (so that the incoming bucket ran full, too), and (c) lots of outgoing traffic is spontaneously generated within the router.
To reliably avoid double-dooring, S and M should both be at least a few times larger than T. To be on the absolutely safe side, we could either have separate configuration parameters for M, S, and T, or we could have one maximum burst size parameter b and would internally use a fixed ratio between the sizes, e.g., T = 0.1b, S=0.45b, M = 0.45*b.
For good performance, it will of course be essential to configure a sufficiently large burst size. This should be communicated in the documentation.
This sounds promising! I think an approach like this could totally work. Let's get it written up in a proposal form, and get some code written. Since I have been so remiss in reviewing this promptly, I'll take on writing the first draft of the proposal unless somebody else wants to do it first. I think I could get it done early next week. (If somebody else wants to take a shot at it instead then I would be grateful, but at this point I really ought to do something to make up for my tardiness.)
I wonder if we can improve this algorithm in cases where we know that a given read will generate no bytes written (as is the case for when we read directory information), or when we find out early that we're going to have an imbalance (as when we pack or unpack a partial cell).
Another wrinkle that we need to address is that not all connections are rate-limited: by default, for example, the read and write limits do not apply to connections from localhost. I think the answer there is to make sure that even if a connection isn't counted towards the read or write bucket, its reads have the proper effect on the write bucket and its writes the appropriate effect on the read bucket.
We should also keep an eye out for ways to instrument the logic here so that we can find out how much this is helping us.
Finally, because of our transition to libevent during 2.3.x, we need to find a way to implement this both using bufferevent rate limiting and using Tor's rate limiting code. Big fun there.
Karsten, did your branch manage to separate the finer-grained-ticks part from the rest of the code? If so, we should really just open a new ticket for that: it ought to be mergeable independently.
Karsten, did your branch manage to separate the finer-grained-ticks part from the rest of the code? If so, we should really just open a new ticket for that: it ought to be mergeable independently.
To be honest, I can't say. I think the branch needs more work before merging parts of it. Florian might know better though.