wiki:org/meetings/2019BrusselsNetworkTeam/Notes/SBWSRoadmap

sbws bandwidth scanner presentation

See also juga's slides: https://juga0.github.io/tor_hackweek_bandwidth_slides/

Why bandwidth scanners?

Relays can lie about their bandwidths.

How sbws works

Threads

sbws runs these threads:

  • main thread
  • Tor event listener (stem)
  • ResultDump (stores measurements to result files)
  • standard Python threads
  • scanner threads: target of 3 threads, to measure 3 relays at a time

Critical sections for threads:

  • refresh: relay list
  • relay priority: relay list
  • measure relay: relay list, etc.

Measurement

sbws gets a list of relays from the consensus, and scans those relays. It updates ever few minutes.

Building two-hop paths from scanner to web server via an entry and exit.

  • select a target relay
  • select the other half of the 2-hop path (exit for entry, or entry for exit)
    • choose a faster relay than the target
  • exits must exit to port 443, and not be a bad exit, otherwise they are used as entries

Measure the speed

  • find the right file size to get a reasonable measurement (16 MB - 1 GB)
  • measure and store results

The results are stored as lines of JSON.

Generate

Every hour, the scanner generates a results file according to the Tor bandwidth file spec.

The results are filtered:

  • ignore older than 5 days
  • ignore relays with fewer than 2 measurements
  • ignore relays where the first and last measurements are less than 24 hours apart

Scaling

Scale the relay's self-reported bandwidth by the measured bandwidth.

See the bandwidth file spec: "Torflow Scaling".

Format the bandwidth file

Header:

  • Timestamp
  • optional metadata

Results:

  • one relay per line: id and bandwidth, and other keys

Questions

It takes 24 hours to scan the entire network.

How many measurements should we have for a relay before we vote for it?

  • Against: one result can be inaccurate, we don't want to load lots of clients on a new relay
  • For: it takes a long time to measure a new relay, and relay operators are disappointed
  • Proposal: vote for all relays, but cap early measurements (and cap few measurements?)
  • Proposal: start with a file size that depends on the relay bandwidth
  • Proposal: stop the download when you have learned enough, or the file takes too long

What is the minimum number of relays in a bandwidth file?

  • Against: A network with one measured relay is a sad network
  • For: A network with no bandwidth votes is a sad network
  • Proposal: you must be running for at least 24 hours before you publish
  • Proposal: try to keep a result for every download, even if it was too fast or too slow
  • Proposal: speed up relay measurements by reducing retries
  • Proposal: increase the number of threads, based on the available bandwidth
  • Proposal: deploy sbws on every directory authority

What diasgnostic information do we need for failed relays?

  • list categories of failures in relay bandwidth lines
  • votes contain bandwidth file headers and hash
  • DirPort URL for downloading the current bandwidth file
  • Proposal: a tool to analyse OnionPerf logs and Bandwidth files to tell relay operators what is wrong with their relay

How much bandwidth?

  • 100 Mbps peak, scaling compensates for higher-bandwidth relays
  • One scanner per directory authority, multiple servers per scanner

How can we?

  • Hiding scanning from relays
  • Make sure exit and non-exit bandwidths are equivalent, because they're measured differently
  • Remove reliance on self-reported relay bandwidths
    • That's hard, because we measure residual bandwidth, but we want to know overall capacity
Last modified 13 days ago Last modified on Feb 4, 2019, 2:36:08 PM