Opened 8 years ago

Closed 7 years ago

#4086 closed task (implemented)

Compare performance of TokenBucketRefillInterval params in simulated network

Reported by: arma Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Keywords: performance flowcontrol
Cc: robgjansen, tschorsch@…, kevin Actual Points:
Parent ID: #4465 Points:
Reviewer: Sponsor:

Description

Once we merge #3630, we will start refilling our token buckets 100 times a second rather than once a second.

In theory, this feature should magnify the effect of the CircuitPriorityHalflife config option, as well as generally smoothing our TCP flows between relays.

It could have surprising adverse effects though, such as making all of our network writes be one cell long (plus IP, TCP, TLS, and MTU overhead).

So the task here is to run simulated Tor networks with various combinations of these config options, at various bandwidth rates, and see if there are sweet spots, surprising bad combinations, etc.

For example, should the TokenBucketRefillInterval value be a function of our available bandwidth?

Child Tickets

Attachments (7)

refill-perf-ewma0.pdf (485.7 KB) - added by robgjansen 8 years ago.
refill interval tests, ewma half life 0
refill-perf-ewma30.pdf (482.1 KB) - added by robgjansen 8 years ago.
refill interval tests, ewma half life 30
refill-timings.pdf (228.8 KB) - added by robgjansen 8 years ago.
refill interval tests, simulation timings
consensus_vs_observed.pdf (118.2 KB) - added by robgjansen 8 years ago.
distribution of bandwidth weights in the consensus vs relay observed bandwidths from server descriptors from 2012-01-31
refill_20120313.pdf (1.3 MB) - added by robgjansen 8 years ago.
shadow, refill interval tests, round 2
refill-ewma0-2012-10-01.pdf (872.1 KB) - added by karsten 7 years ago.
refill-ewma30-2012-10-01.pdf (875.3 KB) - added by karsten 7 years ago.

Change History (47)

comment:1 Changed 8 years ago by arma

I ended up choosing a value of '10 times a second' for today's 0.2.3.5-alpha release since that seemed pretty aggressive still but not crazy high.

comment:2 Changed 8 years ago by karsten

Parent ID: #4465

comment:3 Changed 8 years ago by arma

Keywords: performance added

comment:4 Changed 8 years ago by robgjansen

Cc: jansen@… added

comment:5 Changed 8 years ago by arma

Keywords: flowcontrol added

Changed 8 years ago by robgjansen

Attachment: refill-perf-ewma0.pdf added

refill interval tests, ewma half life 0

Changed 8 years ago by robgjansen

Attachment: refill-perf-ewma30.pdf added

refill interval tests, ewma half life 30

Changed 8 years ago by robgjansen

Attachment: refill-timings.pdf added

refill interval tests, simulation timings

comment:6 Changed 8 years ago by robgjansen

I've run some experiments with Shadow on EC2 and attached the performance and timing results. Before explaining things, please note that the Tor model used for these experiments is currently under development and may change in future versions of Shadow. I'd be happy to re-run the experiments once the model is "final" (for various definitions of final).

The network:
Topology forms a complete graph between all countries. Bandwidth of nodes in each country taken from netindex.com data. Latency of links between countries taken from planetlab pairwise node ping measurements, where countries are clustered by geographical region. Packet loss and jitter also taken from netindex data.

The nodes:
50 servers, 50 relays, 475 web clients, 25 bulk clients. Relay bandwidths are taken from real consensus and rate limits from real server descriptors. Web clients "think" for a time between 1 and 20 seconds, selected uniformly at random, between 320 KiB downloads. Bulk clients continuously download 5 MiB files. Servers have 100 MiB/s connections and are placed according to alexa's reported most popular servers.

Configs:
Tested was refilling every 1000 ms (1/s), 100 ms (10/s), 10 ms (100/s) and 1 ms (1000/s) under otherwise default Tor configurations. I ran two sets of these experiments: one with CircuitPriorityHalfLife of 30 (EWMA), and one with 0 (RR).

Overall, the performance graphs look favorable for multiple refills per second. Notice the weird "stairstep" behavior when scheduling once per second.

Web performance: Time to first byte of payload increases regardless of the intervals tested. Time to last byte seems to improve most when jumping to refilling every 10 ms (100/s). The difference in performance for EWMA vs RR seems insignificant for web downloaders.

Bulk performance: Time to first byte improves when scheduling with EWMA, but does not improve when scheduling with RR. This is mostly because RR is outperforming EWMA quite significantly to begin with. Similarly, time to last byte only improves when scheduling with EWMA, but actually hurts performance when scheduling with RR. Again, RR appears to beat out EWMA for bulk clients.

Timing data is also attached. Experiment time is only slightly increased when refilling every 100 ms (10 times per second), but much more so when refilling every 10 ms (100 times per second). Time is insane for 1 ms refills. Note that this is aggregate simulation time, which includes overhead for handling the simulations discrete events. I am working on tracking CPU time on a per-node basis, which should help us draw conclusions more concretely regarding timing.

The verdict:
I am not giving one;) Although based on these graphs alone, it would appear Tor should be scheduling with RR and refilling every 10-100 ms (maybe 50?).

comment:7 Changed 8 years ago by kevin

Summary
I ran a set of preliminary experiments on ExperimenTor with different TokenBucketRefillInterval values. In particular, the experiments compare refilling tokens once every 1000 milliseconds (once per second), once every 100 milliseconds (ten times per second), and once every 10 milliseconds (100 times per second). Also, experiments are run with EWMA on and off, to identify any interactions between token bucket refill interval and circuit-level scheduling policy.

Results
Results are available here (Note that the file is too large to upload to this page).

The network model
The network topology consists of pairwise links with delays configured by sampling from the King data set [1].

The Tor router and client model
200 destination servers, 50 relays, 950 web clients, 50 bulk clients. Relay bandwidths are taken from a real consensus document and rate limits from the corresponding relay descriptors.

The web clients "think" for a time between 1 and 30 seconds, selected uniformly at random, between 300 KiB file downloads.

Bulk clients continuously download 5 MiB files with no think time between fetches. The destination servers are configured to be faster than any of the relays or clients, so they are guaranteed not to be the bandwidth bottleneck.

All clients bandwidths are assigned by sampling from the Ookla Speedtest data [2].

Performance metrics
To measure the client's performance, I measured time-to-first-byte and overall download time (equivalently, time-to-last-byte).

Router configurations
Tested was refilling every 1000 ms (1/s), 100 ms (10/s), and 10 ms (100/s) under otherwise default Tor client and router configurations. I ran two sets of these experiments: one with CircuitPriorityHalfLife of 30 (EWMA), and one with 0 (RR).

High level observations
These results generally support the results obtained by Shadow, even though slightly different assumptions were made in constructing the experiments (which is in and of itself interesting!).

The performance graphs look favorable for multiple refills per second. Also, you'll see the familiar "stairstep" behavior in the time-to-first-byte graphs when scheduling once per second. More details:

Results for web clients
The most frequent token refill interval (every 10 ms) seems to offer the best time-to-first-byte and overall download times for web clients, regardless of whether EWMA is enabled.

Results for bulk clients
Bulk clients don't show an obvious improvement in time-to-first-byte with shorter refill intervals. Interestingly, 100 ms seems to offer the worst performance for the bulk clients when EWMA is disabled.

References
[1] King data set. http://pdos.csail.mit.edu/p2psim/kingdata/
[2] Ookla Netindex Data. http://www.netindex.com/

comment:8 Changed 8 years ago by Flo

Cc: tschorsch@… added

Thanks for the results and the great work. Overall it looks very good and promising: Web clients benefit most from the smaller refill intervals and data transmission is smoothed. Though, I'm a little surprised by the divergence of the red line (100ms) on page 4 of Kevin's runs. For me the outlier looks like an anomaly/ artifact. Are there any reasons which explain such a behavior?

comment:9 in reply to:  8 ; Changed 8 years ago by robgjansen

Replying to Flo:

Thanks for the results and the great work. Overall it looks very good and promising: Web clients benefit most from the smaller refill intervals and data transmission is smoothed. Though, I'm a little surprised by the divergence of the red line (100ms) on page 4 of Kevin's runs. For me the outlier looks like an anomaly/ artifact. Are there any reasons which explain such a behavior?

Comparing Kevin's page 4 to page 4 of my refill-perf-ewma0.pdf would suggest that Kevin's 10 ms line is the anomaly/artifact, not the 100 ms line. Things are better when CircuitPriorityHalflife is set.

comment:10 in reply to:  6 ; Changed 8 years ago by arma

Replying to robgjansen:

50 servers, 50 relays, 475 web clients, 25 bulk clients. Relay bandwidths are taken from real consensus and rate limits from real server descriptors

If you take relay bandwidths from the server descriptors and not from the consensus, do the results change? I would guess you'll have less overall capacity in the network, so the effect of EWMA should be even more pronounced. Though then again, since you're rate limiting to the values in the descriptor already, the only difference would be relays that have lots of extra capacity but haven't changed the rate limiting from its default, and those are probably rare.

it would appear Tor should be scheduling with RR and refilling every 10-100 ms (maybe 50?).

Is it easy to do up a graph with 50ms refill rates, to see if it's more like 10 or more like 100? That would also give us a sense of how much variation there is in simulation outcome.

comment:11 Changed 8 years ago by arma

So Rob's "bulk download time when EWMA is on" (slide 4 of refill-perf-ewma30) shows that refilling 1/s is worst, with more frequent refills giving successively faster download times.

Whereas Kevin's "bulk download time when EWMA is on" (slide 8) shows that refilling 100/s is worst, with "refill 1/s" looking best.

These would appear to be exact opposites, yes?

comment:12 Changed 8 years ago by arma

Cc: kevin added

comment:13 in reply to:  6 ; Changed 8 years ago by arma

Replying to robgjansen:

Bulk performance: Time to first byte improves when scheduling with EWMA, but does not improve when scheduling with RR. This is mostly because RR is outperforming EWMA quite significantly to begin with. Similarly, time to last byte only improves when scheduling with EWMA, but actually hurts performance when scheduling with RR. Again, RR appears to beat out EWMA for bulk clients.

Rob: how do you say that "RR appears to beat out EWMA for bulk clients" when slide 4 of refill-perf-ewma0 shows that refilling once per second is best, and slide 4 of refill-perf-ewma30 shows that refilling once per second is worst? It seems your graphs show that if we're going to refill more than once a second then the bulk clients are better off with EWMA.

comment:14 in reply to:  9 Changed 8 years ago by arma

Replying to robgjansen:

Replying to Flo:

Thanks for the results and the great work. Overall it looks very good and promising: Web clients benefit most from the smaller refill intervals and data transmission is smoothed. Though, I'm a little surprised by the divergence of the red line (100ms) on page 4 of Kevin's runs. For me the outlier looks like an anomaly/ artifact. Are there any reasons which explain such a behavior?

Comparing Kevin's page 4 to page 4 of my refill-perf-ewma0.pdf would suggest that Kevin's 10 ms line is the anomaly/artifact, not the 100 ms line. Things are better when CircuitPriorityHalflife is set.

Exciting. I'm curious how stable these results are when you (and Kevin) do another run. Are some of the lines we're looking at (and trying to read meaning into) just randomly drawn? :)

Changed 8 years ago by robgjansen

Attachment: consensus_vs_observed.pdf added

distribution of bandwidth weights in the consensus vs relay observed bandwidths from server descriptors from 2012-01-31

comment:15 in reply to:  10 ; Changed 8 years ago by robgjansen

Cc: robgjansen added; jansen@… removed

Replying to arma:

Replying to robgjansen:

50 servers, 50 relays, 475 web clients, 25 bulk clients. Relay bandwidths are taken from real consensus and rate limits from real server descriptors

If you take relay bandwidths from the server descriptors and not from the consensus, do the results change? I would guess you'll have less overall capacity in the network, so the effect of EWMA should be even more pronounced. Though then again, since you're rate limiting to the values in the descriptor already, the only difference would be relays that have lots of extra capacity but haven't changed the rate limiting from its default, and those are probably rare.

I'm not sure why you think there will be much less capacity. I've attached a CDF showing the distribution of consensus bandwidth weights vs relay-reported observed bandwidth. Turns out the difference between the CDFs (absolute value of the difference in the integral over the range [0,infty]) is less than 1000 KiB/s.

it would appear Tor should be scheduling with RR and refilling every 10-100 ms (maybe 50?).

Is it easy to do up a graph with 50ms refill rates, to see if it's more like 10 or more like 100? That would also give us a sense of how much variation there is in simulation outcome.

Yes.

comment:16 in reply to:  13 Changed 8 years ago by robgjansen

Replying to arma:

Replying to robgjansen:

Bulk performance: Time to first byte improves when scheduling with EWMA, but does not improve when scheduling with RR. This is mostly because RR is outperforming EWMA quite significantly to begin with. Similarly, time to last byte only improves when scheduling with EWMA, but actually hurts performance when scheduling with RR. Again, RR appears to beat out EWMA for bulk clients.

Rob: how do you say that "RR appears to beat out EWMA for bulk clients" when slide 4 of refill-perf-ewma0 shows that refilling once per second is best, and slide 4 of refill-perf-ewma30 shows that refilling once per second is worst? It seems your graphs show that if we're going to refill more than once a second then the bulk clients are better off with EWMA.

I was looking at the overall performance regardless of the refill interval. My point was that if you e.g. combine all 4 lines into one on each of ewma0 and ewma30, RR would beat EWMA (although only slightly).

In the context of what you care about, which appears to be refilling more than once per second, then I agree that EWMA improves the situation for bulk clients (based on this set of graphs, that is). In this case you essentially want to ignore the solid black line on each graph, which means you are removing the worst case from EWMA and the best from RR. My "combine lines" approach would then only include the colored lines, and EWMA then looks better.

comment:17 in reply to:  15 ; Changed 8 years ago by arma

Replying to robgjansen:

I'm not sure why you think there will be much less capacity. I've attached a CDF showing the distribution of consensus bandwidth weights vs relay-reported observed bandwidth. Turns out the difference between the CDFs (absolute value of the difference in the integral over the range [0,infty]) is less than 1000 KiB/s.

That's really surprising!

As a couple of examples right now, we have

r Unnamed VXv0HBzh4bSCg7fpdjDkLe2Imj0 BzyZRxkk3iv/Ztj2M+RonaHxkrA 2012-03-08 20:
10:55 93.182.132.103 9002 9031
s Exit Fast Guard HSDir Running Stable V2Dir Valid
v Tor 0.2.2.35
w Bandwidth=103000

where the descriptor bandwidth is more like 20000

r BigBoy n4lJHKt7A2hd6r01jhk/+nTY2DY hnylYApnhz1DYl2wcwWMzrEdYZg 2012-03-09 03:1
4:58 38.229.79.2 443 8080
s Fast Guard HSDir Named Running Stable V2Dir Valid
v Tor 0.2.3.12-alpha
w Bandwidth=107000

with a descriptor bandwidth more like 25460

r wau DsurM90nptpcEUGzn4Ofkx+SM0w S5HeHK3MLzA4USBE2KEaCPPuj/8 2012-03-08 21:09:5
8 109.163.233.200 443 80
s Exit Fast Named Running V2Dir Valid
v Tor 0.2.3.12-alpha
w Bandwidth=154000

which is rate limited to 30834.

How could the sum of the differences possibly be less than 1000 here?

comment:18 in reply to:  17 ; Changed 8 years ago by robgjansen

Replying to arma:

How could the sum of the differences possibly be less than 1000 here?

I computed the difference in the cumulative distributions of consensus bandwidth and observed bandwidth. You computed the difference in a few fast relays.

There are two issue here: total capacity and capacity distribution. I'm telling you that the shape of the distribution is important because it affects how clients select relays (and, in turn, load distribution). You are telling me that total capacity is important for somewhat obvious performance reasons.

So, we need a metric that captures both: what do you think about weighting each difference (of the y-values in the CDFs) by the bandwidth (the x-value in the CDFs) so that differences between high bandwidth nodes means more than between low bandwidth nodes (differences in the tail of the CDF mean more than elsewhere)?

In the meantime, I think it makes sense to run a set of experiments where relay capacity is based on observed bandwidth rather than consensus weights.

comment:19 in reply to:  18 Changed 8 years ago by arma

Replying to robgjansen:

There are two issue here: total capacity and capacity distribution. I'm telling you that the shape of the distribution is important because it affects how clients select relays (and, in turn, load distribution). You are telling me that total capacity is important for somewhat obvious performance reasons.

Ah ha. Yes, there are two issues here: first is how we sample relays from the total list to decide which relays your simulated network will use. Second is how big those relays should be.

I think the "how big they should be" is clearly the descriptor numbers.

But how to sample them is an interesting open problem. I could see a strong argument for sampling them based on the consensus weights.

So, we need a metric that captures both: what do you think about weighting each difference (of the y-values in the CDFs) by the bandwidth (the x-value in the CDFs) so that differences between high bandwidth nodes means more than between low bandwidth nodes (differences in the tail of the CDF mean more than elsewhere)?

Hm. I have no good intuition about what outcome that would give us. Is it aiming to be something in between the descriptor values and the consensus values?

In the meantime, I think it makes sense to run a set of experiments where relay capacity is based on observed bandwidth rather than consensus weights.

I agree. It will at least remove some confusing variables.

comment:20 Changed 8 years ago by kevin

What experiments can we run to help answer some of these questions?

comment:21 Changed 8 years ago by kevin

To answer my question from above, I'd also like to understand how much variability the experiment results have, before we start making too many conclusions.

I'm planning to re-run my experiments and see if I can re-produce the results I posted.

Changed 8 years ago by robgjansen

Attachment: refill_20120313.pdf added

shadow, refill interval tests, round 2

comment:22 Changed 8 years ago by robgjansen

I just attached another set of experimental results. The differences from the original setup are:

  • relay capacity is based on observed bandwidth rather than consensus weights
  • run on my 2.2 GHz server instead of EC2
  • added a "refill every 50 ms" experiment

The results are a bit different than before. This is expected given the changes in relay capacity. Perhaps we want to pick a couple of refill settings in which we are most interested, do 5-10 runs each, and show the cumulative performance to reduce variances?

comment:23 in reply to:  21 Changed 8 years ago by arma

Replying to kevin:

To answer my question from above, I'd also like to understand how much variability the experiment results have, before we start making too many conclusions.

See also #4490 for a related ticket. Not something we have to solve this week though. :)

comment:24 in reply to:  22 ; Changed 8 years ago by arma

Replying to robgjansen:

I just attached another set of experimental results. The differences from the original setup are:

  • relay capacity is based on observed bandwidth rather than consensus weights
  • run on my 2.2 GHz server instead of EC2
  • added a "refill every 50 ms" experiment

The results are a bit different than before. This is expected given the changes in relay capacity.

Exciting! For your "no ewma, bulk download, refill 1/s" case, it looks like 60% of them finish in a reasonable time, and the other 40%...what? That's a high fraction of cases that look basically broken.

It looks like in the ewma case, refilling more than 1/s is the best option for bulk downloaders? Why would refilling more often slow them down so much? Are we just seeing network breakdown because we kept the load the same while reducing the capacity too much? Hm.

Perhaps we want to pick a couple of refill settings in which we are most interested, do 5-10 runs each, and show the cumulative performance to reduce variances?

Do you think there's a lot of variance from one run to the next? Is the variance from the choice of topology? You're already averaging lots of individual fetches from clients I believe. What else might be big contributing factors to variance?

comment:25 in reply to:  24 ; Changed 8 years ago by robgjansen

Replying to arma:

Exciting! For your "no ewma, bulk download, refill 1/s" case, it looks like 60% of them finish in a reasonable time, and the other 40%...what? That's a high fraction of cases that look basically broken.

I noticed that, but some results are better than no results ;-)

It looks like in the ewma case, refilling more than 1/s is the best option for bulk downloaders? Why would refilling more often slow them down so much? Are we just seeing network breakdown because we kept the load the same while reducing the capacity too much? Hm.

One thing that comes to mind is an increased CPU load. I currently model CPU for each node by measuring the actual time the experiment box takes to run the Tor parts of the simulation. This CPU delay time is then multiplied by the ratio of the node's configured CPU speed and the experiment box's CPU speed. Future events for that node are then delayed if it becomes "blocked on CPU".

My previous experiments were run on EC2 and with consensus weights as capacity, as opposed to my server with observed bandwidth as capacity. (I made the classic mistake here of changing too many variables.) I can turn off the CPU delay model, and we can take a look at performance under the assumption that CPU will never be a bottleneck, if you'd like.

Do you think there's a lot of variance from one run to the next? Is the variance from the choice of topology? You're already averaging lots of individual fetches from clients I believe. What else might be big contributing factors to variance?

It may depends on how clients pick paths, as that directs network congestion. Smaller networks may have more variance since if you get unlucky and happen to clog up some important nodes, it affects a high fraction of clients.

comment:26 in reply to:  25 ; Changed 8 years ago by arma

Replying to robgjansen:

My previous experiments were run on EC2 and with consensus weights as capacity, as opposed to my server with observed bandwidth as capacity. (I made the classic mistake here of changing too many variables.) I can turn off the CPU delay model, and we can take a look at performance under the assumption that CPU will never be a bottleneck, if you'd like.

Can't hurt, might help?

Do you think there's a lot of variance from one run to the next? Is the variance from the choice of topology? You're already averaging lots of individual fetches from clients I believe. What else might be big contributing factors to variance?

It may depends on how clients pick paths, as that directs network congestion. Smaller networks may have more variance since if you get unlucky and happen to clog up some important nodes, it affects a high fraction of clients.

Ok. I'm increasingly thinking that a "run" should be k runs in a row, averaged. Even though it takes longer. Because right now, if I understand you right, there's a risk that we look at the output for a given experiment and draw a conclusion that the code change is good or bad, when in fact there's a good chance that it's just variation in runs that produced the difference. Hopefully that will be pretty easy to automate too? What's a good value of k -- 5 or 10 maybe?

comment:27 in reply to:  26 ; Changed 8 years ago by arma

Replying to arma:

Ok. I'm increasingly thinking that a "run" should be k runs in a row, averaged.

And if we want to get super fancy, we could draw bars on the data points in each case, to give a sense of variance between runs. (Does that notion even make sense for cdf graphs?)

comment:28 Changed 8 years ago by kevin

It's a fine idea to run k experiments and average, but what that means is that a single experiment that normally takes 1 to 2 hours to complete now takes k to 2k hours to complete.

While this could provide more stable/confident results, it would generally take several days longer than the current approach (particularly so, since it sometimes takes me a few runs to get the experiment right...).

comment:29 in reply to:  27 ; Changed 8 years ago by robgjansen

Replying to arma:

Replying to arma:

Ok. I'm increasingly thinking that a "run" should be k runs in a row, averaged.

And if we want to get super fancy, we could draw bars on the data points in each case, to give a sense of variance between runs. (Does that notion even make sense for cdf graphs?)

Doesn't it make more sense to just show the cumulative results for all experiments with the same configuration. I normally do this before publishing results in papers, after I am confident I understand the code changes and their effects well enough. I'm not sure we are there yet with the work in this ticket.

Though, it would be nice to be able to determine how far one CDF varies from another. I attempted to do something like this above in comment 15, but I think in that case I wasn't comparing apples to apples.

I'd like to emphasize that Shadow is already cutting out as many random variances as possible. In other words, if I run the same experiment twice without changing anything, the results are exactly the same (except for memory addresses and timestamps in log files :P). I've verified this several times.

But, it is still the case that a given experiment could get unlucky with its seed to the master PRNG, and a configuration change could change the randomness enough to avoid the "unlucky" behaviors. I have not tested "run vanilla Tor with several seeds and analyze the variances" recently, but its probably a good idea.

comment:30 in reply to:  29 Changed 8 years ago by arma

Replying to robgjansen:

Doesn't it make more sense to just show the cumulative results for all experiments with the same configuration.
Though, it would be nice to be able to determine how far one CDF varies from another.

Both good points. I've opened #5398 to separate this topic from the "what should our refill interval be" question.

comment:31 Changed 7 years ago by karsten

Owner: set to karsten
Status: newassigned

I'll try simulating different refill intervals with Shadow. Grabbing the ticket.

comment:32 Changed 7 years ago by robgjansen

Following arma's advice in #4486, I suggest we use the large-m2.4xlarge topology. Its both distributed with Shadow (resource/scallion-hosts/large-m2.4xlarge.tar.xz) and also exists in the current EC2 image (~/workspace/large-m2.4xlarge).

comment:33 in reply to:  32 Changed 7 years ago by robgjansen

Replying to robgjansen:

Following arma's advice in #4486, I suggest we use the large-m2.4xlarge topology. Its both distributed with Shadow (resource/scallion-hosts/large-m2.4xlarge.tar.xz) and also exists in the current EC2 image (~/workspace/large-m2.4xlarge).

I'm changing my mind.

After doing a sample experiment with the large-m2.4xlarge.tar.xz model distributed with Shadow, I'm now thinking we should use the model from #6401 instead. The reason is that in my sample experiment, bulk data only accounts for about 10% of the network load, whereas the PETS 2008 exit traffic study says it is 40%, and the NSS 2010 study said it was 52%. I've already analyzed the load to be more approximately correct in the #6401 model.

Sorry to flip-flop. If you've already run the experiments, the results are probably useful anyway.

Thoughts?

comment:34 Changed 7 years ago by karsten

I didn't run the experiments yet. How would I use the #6401 model?

comment:35 in reply to:  34 Changed 7 years ago by robgjansen

Replying to karsten:

I didn't run the experiments yet. How would I use the #6401 model?

As usual: extract the tarfile I emailed you last week, and run scallion inside the extracted directory.

Changed 7 years ago by karsten

Attachment: refill-ewma0-2012-10-01.pdf added

Changed 7 years ago by karsten

comment:36 Changed 7 years ago by karsten

Status: assignedneeds_review

I ran into problems with the #6401 model (simulation got stuck at around 29 minutes), so Rob and I decided to use the default large-m2.4xlarge network model that comes with Shadow's EC2 image.

Results are attached.

comment:37 Changed 7 years ago by arma

According to these results, 100ms is the clear winner on all fronts.

comment:38 in reply to:  37 Changed 7 years ago by robgjansen

Replying to arma:

According to these results, 100ms is the clear winner on all fronts.

Agreed. 100ms (10 times a second) seems to be aggressive enough without being too crazy high to cause large CPU performance issues. Nice guess for the default value :)

comment:39 Changed 7 years ago by karsten

Status: needs_reviewneeds_information

Does this mean we can close this ticket?

comment:40 Changed 7 years ago by arma

Resolution: implemented
Status: needs_informationclosed

Yes. We definitely "compared performance in simulated network". It confirmed our guess about an appropriate value, and also showed that it should be helping performance.

There remain further research questions, such as "For example, should the TokenBucketRefillInterval value be a function of our available bandwidth?" but I think we can put those off.

Thanks everybody!

Note: See TracTickets for help on using tickets.