Opened 8 years ago

Closed 7 years ago

#4490 closed project (implemented)

Sensitivity analysis of different ways to sample relay capacity for simulations

Reported by: arma Owned by: arma
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Keywords: performance simulation SponsorF20121101
Cc: robgjansen, kevin Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Since current Tor simulators can't handle the whole 2500 relays, they end up running simulations on a subset of relays that they hope is representative. Early papers chose their sample by choosing n relays weighted by capacity. Rob and Kevin both found that they could achieve more consistent results by breaking the relays into deciles, and choosing n/10 relays from the "0-10%" bucket, n/10 from the "10-20%" bucket, and so on.

But I suspect that it is not optimal to base these parameter choices on the number of fingers that primates have. In particular, Mike's graphs show that the "0-5%" bucket is quite different from the "5-10%" bucket. So I worry that different runs could see quite different outcomes.

Rob pointed out that if we want to match reality, we'll need to know what load to place on the network -- and even messier, how to scale down that load in a way that matches the scaling down of the relay population.

But I think there's still some good insight to be had here, by looking at how much variation we get for a variety of sampling algorithms for a given set of loads. If the results are consistent for a given set of loads while varying sampling algorithms, that would be a surprising and interesting result. And if the results for a given load change by sampling algorithm, we should get some better intuition about how much they change, and what parameters seem to influence the changes the most.

Alas, the part of this question that makes the number of simulation runs blow up is that we ought to do the tests for a variety of research questions, since maybe some research questions are quite sensitive to changes in capacity distribution and others not so much.

I bet we could get quite a bit of mileage here by just looking at the resulting capacity (and thus probability) distributions of various sampling algorithms, and leaving the "Tor simulation" component out entirely -- since using a full-blown Tor simulator to measure similarity of distribution is a mighty indirect approach.

Child Tickets

Change History (11)

comment:1 Changed 8 years ago by karsten

Milestone: Sponsor F: July 15, 2012
Type: taskproject

This ticket is the same as sponsor F deliverable 21. Turning it into a project ticket and optimistically assigning it to the July milestone. Related tickets should become children of this ticket.

comment:2 Changed 8 years ago by arma

Cc: robgjansen kevin added

comment:3 Changed 8 years ago by robgjansen

I believe we solve the sampling issue in the draft of our CSET paper (Section 3.2 and Figure 4). Its not ready for distribution, so I'll describe our approach here briefly.

We sample k of n relays by breaking a list of n relays (sorted by consensus weights) into k bins, and choosing the median of each bin. This improves the "deciles" approach (10 bins) and in fact produces the "optimal" sampling, in that the distribution on relay weights in the sample is closest to the original population. The argument is that if our sampled weight distribution is the same as in Tor, then client load will be distributed approximately the same as in Tor.

We then assign bandwidth for the sampled relays using their reported observed values from the server descriptors. Whats missing in our paper is an analysis of the resulting capacity when sampling, which the above algorithm does not currently try to directly optimize (because the consensus weights are sometimes bad estimates of capcity).

Should we be sampling based on weights AND observed bandwidth? What else could we do here to improve the analysis?

comment:4 Changed 8 years ago by arma

Owner: set to arma
Status: newassigned

comment:5 Changed 8 years ago by robgjansen

In order to get a better handle on the load on the network during experiments, it would be nice to be able to get accounting information over arbitrary intervals. This can be achieved now with AccountingMax and AccountingStart, but there are some problems with this:

  1. The minimum interval is in days, where I'd like it in minutes or seconds
  2. I don't want to impose an accounting limit
  3. New interval data may blow away old interval data

Often my experiments don't run long enough for the authorities to collect the desired info and do something useful with it. Perhaps I want something more like AccountingHeartbeat, where accounting information is logged every period. It would be similarly nice to gather CellStatistics, but this also suffers from 3.

Is there another way to get this information besides the above?

comment:6 Changed 8 years ago by robgjansen

After checking the source code, I realize that HeartbeatPeriod might take care of the load part (not CellStats). The manual doesn't specify what actually gets logged in the heartbeat message, but total bytes read and written seems to be one of the things.

comment:7 Changed 8 years ago by karsten

Milestone: Sponsor F: July 1, 2012Sponsor F: November 1, 2012

Moving this ticket to the November milestone after talking to Roger. We should focus on the earlier performance tasks first.

comment:8 Changed 8 years ago by karsten

Adding a note from the #tor-dev meeting on April 12:

  • Roger said this task has turned more into Rob's CSET paper. The bigger question is: "how do you model a big Tor network when all you can run is a little Tor network?" It has to do with selecting relays and relay capacities and rate limiting (a.k.a. this deliverable), but it also has to do with selecting client load. #5398 is very related. Roger is trying to rope in Chris, Micah's grad student, since Kevin is leaving us.

comment:9 Changed 7 years ago by karsten

Roger and I talked about this deliverable at the Florence dev meeting. We have a paper that got in at CSET. We (read: Roger) should blog 2--3 paragraphs about this paper. Once that's done, we can call this sponsor F deliverable done. If it doesn't happen, that's only half as nice, but in that case we can call the deliverable done, too.

comment:10 Changed 7 years ago by karsten

Keywords: SponsorF20121101 added
Milestone: Sponsor F: November 1, 2012

Switching from using milestones to keywords for sponsor deliverables. See #6365 for details.

comment:11 Changed 7 years ago by karsten

Resolution: implemented
Status: assignedclosed

November 1 has passed and the blog post didn't happen. Calling the deliverable done with the following summary (stolen from the paper's abstract): "We co-authored the paper Methodically Modeling the Tor Network that was published at CSET '12. That paper methodically models the Tor network by exploring and justifying every modeling choice required to produce accurate Tor experimentation environments. We find that our model enables experiments that characterize Tor's load and performance with reasonable accuracy." Closing.

Note: See TracTickets for help on using tickets.