wiki:doc/BandwidthAuthority

So You Want to Fix the Tor Network

- or -

How to Run and Troubleshoot a Bandwidth-Measuring Directory Authority

These instructions are as of commit e268151aaa1436a8ce2d4959d1a48e69368dbf3d but probably apply anyway.

Setup

Check out the readme for setup instructions. On an Ubuntu 16.04 machine, setup.sh worked quite well. (Minor hiccups were encountered due to missing packages, and the recovery from those was only moderately confusing - it amounted to rm-ing directories and just starting the script over again.)

On *BSD or macOS, setup.sh probably won't work at all.

Configuring and Running

Are you using your own data file server, or the default? It would be better to run your own. To do this you'll have to set up a server and then edit this line in bwauthority_child.py:

urls = ["https://38.229.72.16/bwauth.torproject.org/"]

As the README says, ./run_scan.sh to start everything up. You'll want to add that in to run at boot also.

@reboot /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/run_scan.sh

Copy cron.sh to cron-mine.sh and run that in a cron every hour.

45 0-23 * * * /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/cron-mine.sh

I also make all of the historical files available in my Apache directory with a crontab:

10 * * * * ln -s /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/bwscan.* /var/www/html/bwauth > /root/cron-ln-command.log

This list of files gets very big. I manually tar and compress them once a month, I don't have a script to do that yet.

Sanity Checking

Watch the output of 'data/aggregate-debug.log' - you should see the percentages creep upwards over time, and when you hit 60% you'll start producing a file.

So you've got bandwidth values, but how do you know if they're accurate?

You can check your top 25 relays and see if they come close to what Atlas has.

tail -n +2 bwscan.V3BandwidthsFile | cut -d " " -f 2,1,3 | awk -F" " '{ print $2 " " $1 " " $3; }' | sed 's/bw=//g' | sort -n | tail -n 25

A long time ago, I downloaded votes and analyzed the difference between each bwauth, but this requires the bwauth to be included in the vote.

for i in "moria1" "gabelmoo" "longclaw" "maatuska" "faravahar"; do grep "Measured" -B 4 $i | grep -v "^s" | grep -v "^v " | grep -v "^pr" | grep -v -- "--" | awk '!(NR%2){print p$0}{p=$0}' | cut -d " " -f 3 -f 11 | sed 's/Measured=//' > $i.data; done
analyze_bwauth_thing() { echo $1 $2 `join $1 $2 | cut -d " " -f 2- | sort -n -r | head -n 200 | python -c 'import sys; d=lambda l : (abs(l[0]-l[1]) / ((l[0]+l[1])/2))*100; lines = [l.split(" ") for l in sys.stdin.readlines()]; lines = [(float(l[0]), float(l[1])) for l in lines]; print "\n".join([str(d(l)) for l in lines]);' | awk '{a+=$1} END{print a/NR}'`}
\ls *.data | python -c 'import sys; import itertools; fi = [f.strip() for f in sys.stdin.readlines()]; c= [l for l in itertools.combinations(fi, 2)]; print "\n".join(["analyze_bwauth_thing " + i[0] + " " + i[1] for i in c]) '

And you can look at the Consensus Health graphs, and see if your bwauth seems sane based on that. (Again, required your bwauth to be voting.)

Monitoring

After the bwauth has been running for a few days, you might wish to set up some sanity checks for it. Tom Ritter uses checker for his, specifically with this script. The script checks five things:

  1. Is the bwauth machine still running (checks Apache)
  2. Does the bwauth bandwidths file have a sufficiently recent timestamp?
  3. Does the bwauth bandwidths file have a sufficient number of relays?
  4. Is the percentage of the network measured sufficiently high?
  5. Have all scanners looped recently?

More details:

Timestamp and Number of Relays

Symlink ~/bwauth/torflow/NetworkScanners/BwAuthority/bwscan.V3BandwidthsFile out to your Apache directory. The top line is a timestamp. I make sure it has a timestamp in the last four hours. I choose a number of relays that is a bit below the current number of measured relays by other bwauths (currently 7600). This number ebs and flows. I might edit it 5-6 times a year.

Percentage of the network measured

I have a crontab entry:

10 * * * * grep "of all tor nodes" /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/aggregate-debug.log > /var/www/html/bwauth/AA_percent-measured.txt

That outputs the percent measured to https://bwauth.ritter.vg/bwauth/AA_percent-measured.txt and I check the last line to make sure it is reasonably high (> 96).

Scanner Loop Time

This one is less intuitive. There are 9 scanners. Sometimes a scanner gets stuck. It's very hard to detect when this happens based on the data output, by the time any of the above checks would fire, the data is excessively stale. So this check is pretty important.

The crontab entry to generate this info is:

10 * * * * for i in 1 2 3 4 5 6 7 8 9; do echo "Scanner $i"; egrep "Starting slice for percentiles [0-9]+.0-" /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/scanner.$i/bw.log; done

It outputs it to https://bwauth.ritter.vg/bwauth/AA_scanner_loop_times.txt. I check that the last line of each scanner is within a reasonable time frame (6 days).

Debugging

Bandwidth Authority Tor Fails to Start

  1. Make the Log, DataDirectory, and PidFile paths absolute paths (#20456)

Bandwidth Authority Scripts Fail on BSD / OS X

  1. Install a readlink that supports -f

OR

  1. Manually install dependencies from the BwAuthority instructions
  2. Manually set SCANNER_DIR in cron.sh and run_scan.sh

Bandwidth Authority Fails on Small (Test) Networks

  1. Small networks might be missing Guards, Guards+Exits, Middles, or Exits (#20467)
  2. Small networks might have bandwidths below the minimum of 1MByte/second (#20505)

On small networks, the following features can lead to no measured bandwidths:

  • bandwidth authorities measure the bandwidth of directory authorities, but don't aggregate them in the results,
  • the consensus does not include any measured bandwidths until there are at least 3 bandwidth authorities.

Bandwidth Authorities use an Old Tor Version

  1. Update the bwauth to use tor 0.2.9, because it's LTS (#20453)
  2. If using tor 0.3.0 or later, add "UseMicrodescriptors 0" to the torrc (#20621) (fixed in tor)
  3. You might get some errors using Tor 0.3.0 or later (#24110)

Scanner Fails to Import Required Python Libraries

  1. Change the PYTHONPATH in the scripts (#20466)

Excessive Log Entries

  1. Remove the download URL that doesn't work (#20580)
  2. Turn pathbias off (#20457)
  3. Fix the NEWCONSENSUS event code (#20619)

stretch setup

1: Create a new user to run the bwscanner (I call mine bwscanner)

(root) adduser --system bwscanner

2: Check out torflow from https://git.torproject.org/torflow

(root) apt-get install git ca-certificates
(root) su - bwscanner -s /bin/bash
git clone https://git.torproject.org/torflow
cd torflow
git rev-parse HEAD

The last command shows you what you actually got. There are no signed tags for torflow, so try to verify as best as you can.

git submodule init
git submodule update

3: Install a system tor

(root) apt-get install tor

4: Provide virtualenv (the sql dependencies are too new on stretch)

(root) apt-get install virtualenv python-virtualenv gcc libpython2.7-dev libsqlite3-dev
cd NetworkScanners/BwAuthority
virtualenv -p python2.7 bwauthenv
source bwauthenv/bin/activate

Replace the requirements file. This file instructs pip to download the exact versions of dependencies and includes sha256 hashes. The provided one is outdated and sets a deprecated option (--no-use-wheel) which gets automatically set by non-ancient pip versions and one that means it won't work (--no-index) because the package isn't available locally. This is the content of the file for me:

pysqlite==2.6.3 --hash=sha256:fe9c35216bf56c858b34c4b4c8be7e34566ddef29670e5a5b43f9cb8ecfbb28d
SQLAlchemy==0.7.10 --hash=sha256:77aa39d65c9d043eba6ba329b359ff867424fd6c403b7c0cb112b65e507e1d66
Elixir==0.7.1 --hash=sha256:a7ef437f25b544e4f74fb3236fc43cd25f5d6feb6037dd7c66931046d75439e9

Install the dependencies:

pip install -r requirements.txt

5: Update config

In bwauthority_child.py, find the line urls. Replaced by my own bw server.

In run_scan.sh, find the line TOR_EXE. Make it look like so:

TOR_EXE=/usr/sbin/tor

6: Update crontab

Two entries for the crontab (run crontab -e):

@reboot cd /home/bwscanner/torflow/NetworkScanners/BwAuthority && /home/bwscanner/torflow/NetworkScanners/BwAuthority/run_scan.sh
45 * * * * cd /home/bwscanner/torflow/NetworkScanners/BwAuthority/ && /home/bwscanner/torflow/NetworkScanners/BwAuthority/cron.sh
Last modified 2 weeks ago Last modified on Nov 11, 2017, 4:13:18 AM