Opened 7 years ago

Closed 7 years ago

#6401 closed task (fixed)

Update Tor network models from CSET paper

Reported by: robgjansen Owned by: robgjansen
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Keywords: simulation, performance, network model
Cc: arma, karsten Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by robgjansen)

Our CSET paper explains techniques for modeling the Tor network. Our model is verified against data from server descriptors, etc., and compared in Shadow and ExperimenTor. Because of exptor's limitations, we were restricted to 100 relays and 1000 clients.

We should update the model as follows:

  1. Expand the client model to include P2P-type client swarms that both upload and download
  2. Expand the client model to include small IM-type (ping-like) clients, to better measure expected performance for clients that don't ever download large amounts of data
  3. Expand the client model to use browser-type clients that are capable of downloading real html files, parse them, and fetch the various embedded objects
  4. Re-adjust the ratios of the various client types so load and expected performance is similar to live Tor
  5. Update Shadow's various sized topology files so we can run experiments beyond ~1000 nodes.

This is a blocker for much of the performance simulation work, such as #5336, #4086, #4486, #4487, #5190, #6341

Child Tickets

Attachments (4)

20120727-ec2-torperf-combined.pdf (158.5 KB) - added by robgjansen 7 years ago.
First run of network model scaled for EC2
20120729-ec2-torperf-combined.pdf (165.9 KB) - added by robgjansen 7 years ago.
Second run of network model scaled for EC2
20120802-ec2-torperf-combined.pdf (172.7 KB) - added by robgjansen 7 years ago.
Third run of network model scaled for EC2
20120802-ec2-combined.pdf (238.7 KB) - added by robgjansen 7 years ago.
Third run - all client timings

Download all attachments as: .zip

Change History (14)

comment:1 Changed 7 years ago by robgjansen

For 1, see this github commit.
For 2, an IM client may be configured with the existing filetransfer library.
For 3, see this github pull request.

Topology configurations, as well as 4 and 5, are in progress. I'll post performance graphs here once I have them.

comment:2 Changed 7 years ago by robgjansen

For our "browser" clients, we'll need websites for them to download (using something like "wget -H -p http://www.wikipedia.org"). Any better ideas than the Alexa top X websites? Do they just download the index.html file for each site?

Also, this won't be precise, since images, etc, that come from a CDN server when I issue the wget command would probably come from different CDN servers for most of the Tor clients. Is it worth excluding the browser client until we get better statistics from real Tor clients?

comment:3 Changed 7 years ago by robgjansen

Description: modified (diff)

comment:4 Changed 7 years ago by robgjansen

Update: we are holding off on the browser-type clients, because the browser plug-in is not yet ready for 'production' (see this github issue).

All that is left is 4. above, for which simulations are currently running. I expect I'll have graphs tomorrow, barring more runtime problems or misconfigurations ;)

Changed 7 years ago by robgjansen

First run of network model scaled for EC2

comment:5 Changed 7 years ago by robgjansen

The first stab

The network model was updated to utilize EC2's m2_4xlarge instance (~68 GiB), using Tor data (server descriptors, extra infos, etc) from the month of June 2012. Clients that more closely match TorPerf statistics were also added for cleaner comparison to the live network data. The resulting network is 2.5 times larger (both clients and relays) than we used in the CSET paper.

The resulting network model:

* 250 high bandwidth http file servers

* 1 high bandwidth torrent/swarm tracker authority

* of the 250 relays:

  -100 exits-150 non-exits (1 authority)

* of the 2500 clients (each download is from a random server):

  -50 im: upload 1 KiB, download 1 KiB, wait (0-5] seconds, repeat
  -2301 web: request and download 320 KiB, wait (0-60] seconds, repeat
  -100 bulk: request and download 5 MiB, repeat
  -9 p2p: from each other p2p client: download and upload 16 KiB, repeat
  -20 torperf 50 KiB: request and download 50 KiB, wait 60 seconds, repeat
  -20 torperf 1 MiB: request and download 1 MiB, wait 60 seconds, repeat

Load on the network was distributed as follows:

TYPE    #XFERS  GiB     %
im      19046   0.018   0.052
web     96170   29.349  84.135
bulk    945     4.614   13.228
p2p     21517   0.328   0.941
perf50k 1024    0.049   0.140
perf1m  537     0.524   1.503
TOTAL   139239  34.883  100.000

Client performance is shown in the attached graph here.

Overall, it appears we are slightly overloaded compared to Tor's state in June. This is inferred from both graphs. Based on the graphs and the load breakdown, I'd like to cut out some web clients. I'd also like to see a higher percentage of the load coming from the P2P and bulk clients.

So I'll run another experiment, this time on EC2, and post the results.

comment:6 Changed 7 years ago by robgjansen

After thinking about this, I think we have too many web clients.

We originally wanted ~95% of our connections to be web, according to the now outdated shining light paper from McCoy etal. Since each of our simple clients only create one connection, we simply configured ~95% of our clients as web, and moved on.

However, web page downloads usually require many, many GET requests for all the embedded images and css files, etc - 45-ish on average, according to Google. Google also says that the GETs would be directed to 8 different hosts, on average. So a single web page would have counted as about 8 separate connections in the Mccoy study.

But each page only counts as a single connection in our model, i.e. each web client in our simulation actually represents 8 connections and all the data they download with a single connection. So, the 95% of the clients we have could be off by a factor of 8, roughly. This might explain why the load balance is slightly off in our experiments above.

So, what do we really want?

Ideally, a client that does what we actually want, i.e. a web browser that downloads embedded objects with several connections in parallel. Thats item 3 in the ticket description. Since the browser client isn't ready yet, we'll make due with our simpler single-GET client. The connection balance might not be right, but hopefully the load will.

We don't just want to cut our web clients by a factor of 8, because each of our web clients are likely representing several real clients anyway (we are likely downloading more webpages that a real client would). But, it seems reasonable to drop them from 2300 down to 1500. This should help move the load where we actually want it.

Sorry this got long. Not sure if anyone even reads it... but I'd love comments.

Changed 7 years ago by robgjansen

Second run of network model scaled for EC2

comment:7 Changed 7 years ago by robgjansen

The second stab

We used the Tor network model as described above, but changed it by reducing the web clients from 2301 down to 1500, and increasing the P2P clients from 9 to 50.

Load on the network was distributed as follows:

TYPE    #XFERS  GiB     %
im      20320   0.019   0.063
web     63464   19.368  63.073
bulk    1021    4.985   16.235
p2p     374430  5.713   18.606
perf50k 1023    0.049   0.159
perf1m  586     0.572   1.864
TOTAL   460844  30.707  100.000

Client performance is shown in the attached graph here.

The overall load was reduced by about 4 GiB, and the load distribution looks better (closer to ~40% bulk). I expected shadow client performance to be a bit faster, i.e. those dashed lines to move a little farther to the left than they did.

I noticed the gap between TorPerf and Shadow seems to expand with the download length. Intuition tells me that either Shadow is miscalculating packet delays somewhere, or our latency model changed (or was off in the first place). Though I continue to be impressed with Shadow's accuracy here in the face of so many things that could go wrong.

I'm going to review Shadow's packet code and perhaps run one more experiment here.

comment:8 in reply to:  7 Changed 7 years ago by robgjansen

Replying to robgjansen:

I noticed the gap between TorPerf and Shadow seems to expand with the download length. Intuition tells me that either Shadow is miscalculating packet delays somewhere, or our latency model changed (or was off in the first place). Though I continue to be impressed with Shadow's accuracy here in the face of so many things that could go wrong.

I'm going to review Shadow's packet code and perhaps run one more experiment here.

I analyzed the Shadow code and tested a change that I thought may be the culprit, but the results were not significantly different than the previously reported.

Also, the most recent network setup uses only 42 GiB of RAM. Our EC2 instance has 68 GiB available, so I am now scaling up the network slightly. I'm hoping to be able to launch a series of real experiments by the end of the week. Roger, Nick Hopper, and I can then poke holes in the results some time during Usenix Security next week.

Changed 7 years ago by robgjansen

Third run of network model scaled for EC2

Changed 7 years ago by robgjansen

Attachment: 20120802-ec2-combined.pdf added

Third run - all client timings

comment:9 Changed 7 years ago by robgjansen

The third stab

We scaled up our network model to utilize all of EC2's 68 GiB of RAM:

* 400 high bandwidth http file servers
* 1 high bandwidth torrent/swarm tracker authority
* 400 relays:
  -160 exits
  -240 non-exits (1 authority)
* 2600 clients (each download is from a random server):
  -80 im: upload 1 KiB, download 1 KiB, wait (0-5] seconds, repeat
  -2200 web: request and download 320 KiB, wait (0-60] seconds, repeat
  -160 bulk: request and download 5 MiB, repeat
  -80 p2p: from each other p2p client: download and upload 16 KiB, repeat
  -40 torperf 50 KiB: request and download 50 KiB, wait 60 seconds, repeat
  -40 torperf 1 MiB: request and download 1 MiB, wait 60 seconds, repeat

Load on the network was distributed as follows:

TYPE    #XFERS  GiB     %
im      30648   0.029   0.065
web     88069   26.877  59.507
bulk    1913    9.341   20.681
p2p     510250  7.786   17.238
perf50k 1888    0.090   0.199
perf1m  1068    1.043   2.309
TOTAL   633836  45.165  100.000

TorPerf results are shown in the attached graph here.
Client performance is shown in the attached graph here.

The load distribution looks good. I still expected shadow client performance to be a bit faster, i.e. those dashed lines to move a little farther to the left. For now I think this looks good enough to start experiments. After all, its the relative performance between experiments thats really more important than how well we match TorPerf.

comment:10 Changed 7 years ago by robgjansen

Description: modified (diff)
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.