wiki:doc/BandwidthAuthority

So You Want to Fix the Tor Network

- or -

How to Run and Troubleshoot a Bandwidth-Measuring Directory Authority

These instructions are as of commit e268151aaa1436a8ce2d4959d1a48e69368dbf3d but probably apply anyway.

Setup

Check out the readme for setup instructions. On an Ubuntu 16.04 machine, setup.sh worked quite well. (Minor hiccups were encountered due to missing packages, and the recovery from those was only moderately confusing - it amounted to rm-ing directories and just starting the script over again.)

Configuring and Running

Are you using your own data file server, or the default? It would be better to run your own. To do this you'll have to set up a server and then edit this line in bwauthority_child.py:

urls = ["https://38.229.72.16/bwauth.torproject.org/"]

As the README says, ./run_scan.sh to start everything up. You'll want to add that in to run at boot also.

@reboot /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/run_scan.sh

Copy cron.sh to cron-mine.sh and run that in a cron every hour.

45 0-23 * * * /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/cron-mine.sh

I also make all of the historical files available in my Apache directory with a crontab:

10 * * * * ln -s /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/bwscan.* /var/www/html/bwauth > /root/cron-ln-command.log

This list of files gets very big. I manually tar and compress them once a month, I don't have a script to do that yet.

Sanity Checking

Watch the output of 'data/aggregate-debug.log' - you should see the percentages creep upwards over time, and when you hit 60% you'll start producing a file.

So you've got bandwidth values, but how do you know if they're accurate?

You can check your top 25 relays and see if they come close to what Atlas has.

tail -n +2 bwscan.V3BandwidthsFile | cut -d " " -f 2,1,3 | awk -F" " '{ print $2 " " $1 " " $3; }' | sed 's/bw=//g' | sort -n | tail -n 25

A long time ago, I downloaded votes and analyzed the difference between each bwauth, but this requires the bwauth to be included in the vote.

for i in "moria1" "gabelmoo" "longclaw" "maatuska" "faravahar"; do grep "Measured" -B 4 $i | grep -v "^s" | grep -v "^v " | grep -v "^pr" | grep -v -- "--" | awk '!(NR%2){print p$0}{p=$0}' | cut -d " " -f 3 -f 11 | sed 's/Measured=//' > $i.data; done
analyze_bwauth_thing() { echo $1 $2 `join $1 $2 | cut -d " " -f 2- | sort -n -r | head -n 200 | python -c 'import sys; d=lambda l : (abs(l[0]-l[1]) / ((l[0]+l[1])/2))*100; lines = [l.split(" ") for l in sys.stdin.readlines()]; lines = [(float(l[0]), float(l[1])) for l in lines]; print "\n".join([str(d(l)) for l in lines]);' | awk '{a+=$1} END{print a/NR}'`}
\ls *.data | python -c 'import sys; import itertools; fi = [f.strip() for f in sys.stdin.readlines()]; c= [l for l in itertools.combinations(fi, 2)]; print "\n".join(["analyze_bwauth_thing " + i[0] + " " + i[1] for i in c]) '

And you can look at the Consensus Health graphs, and see if your bwauth seems sane based on that. (Again, required your bwauth to be voting.)

Monitoring

After the bwauth has been running for a few days, you might wish to set up some sanity checks for it. Tom Ritter uses checker for his, specifically with this script. The script checks five things:

  1. Is the bwauth machine still running (checks Apache)
  2. Does the bwauth bandwidths file have a sufficiently recent timestamp?
  3. Does the bwauth bandwidths file have a sufficient number of relays?
  4. Is the percentage of the network measured sufficiently high?
  5. Have all scanners looped recently?

More details:

Timestamp and Number of Relays

Symlink ~/bwauth/torflow/NetworkScanners/BwAuthority/bwscan.V3BandwidthsFile out to your Apache directory. The top line is a timestamp. I make sure it has a timestamp in the last four hours. I choose a number of relays that is a bit below the current number of measured relays by other bwauths (currently 7600). This number ebs and flows. I might edit it 5-6 times a year.

Percentage of the network measured

I have a crontab entry:

10 * * * * grep "of all tor nodes" /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/aggregate-debug.log > /var/www/html/bwauth/AA_percent-measured.txt

That outputs the percent measured to https://bwauth.ritter.vg/bwauth/AA_percent-measured.txt and I check the last line to make sure it is reasonably high (> 96).

Scanner Loop Time

This one is less intuitive. There are 9 scanners. Sometimes a scanner gets stuck. It's very hard to detect when this happens based on the data output, by the time any of the above checks would fire, the data is excessively stale. So this check is pretty important.

The crontab entry to generate this info is:

10 * * * * for i in 1 2 3 4 5 6 7 8 9; do echo "Scanner $i"; egrep "Starting slice for percentiles [0-9]+.0-" /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/scanner.$i/bw.log; done

It outputs it to https://bwauth.ritter.vg/bwauth/AA_scanner_loop_times.txt. I check that the last line of each scanner is within a reasonable time frame (6 days).

Debugging

Bandwidth Authority Tor Fails to Start

  1. Make the Log, DataDirectory, and PidFile paths absolute paths (#20456)

Bandwidth Authority Scripts Fail on BSD / OS X

  1. Install a readlink that supports -f

OR

  1. Manually install dependencies from the BwAuthority instructions
  2. Manually set SCANNER_DIR in cron.sh and run_scan.sh

Bandwidth Authority Fails on Small (Test) Networks == =

  1. Small networks might be missing Guards, Guards+Exits, Middles, or Exits (#20467)
  2. Small networks might have bandwidths below the minimum of 1MByte/second (#20505)

On small networks, the following features can lead to no measured bandwidths:

  • bandwidth authorities measure the bandwidth of directory authorities, but don't aggregate them in the results,
  • the consensus does not include any measured bandwidths until there are at least 3 bandwidth authorities.

Bandwidth Authorities use an Old Tor Version

  1. Update the bwauth to use something more recent than tor 0.2.6 (#20453)
  2. If using tor 0.3.0 or later, add "UseMicrodescriptors 0" to the torrc (#20621)

Scanner Fails to Import Required Python Libraries

  1. Change the PYTHONPATH in the scripts (#20466)

Excessive Log Entries

  1. Remove the download URL that doesn't work (#20580)
  2. Turn pathbias off (#20457)
  3. Fix the NEWCONSENSUS event code (#20619)
Last modified 4 weeks ago Last modified on Aug 26, 2017, 4:23:57 AM