Measure connectivity patterns between relays
https://lists.torproject.org/pipermail/tor-relays/2014-May/004598.html makes me wonder how many relays are firewalling certain outbound ports (and thus messing with connectivity inside the Tor network). It would be great if somebody would start scanning pairs of relays to see which of them can reach each other and which can't, with the goal of understanding how far from a clique our network topology actually is, and then helping with an awareness campaign to correct it if it's a problem.
Tools that might be helpful building blocks here:
- Meejah's exitscanner builds circuits, and makes sure it isn't building too many at once. Uses txtorcon and thus twisted. https://github.com/meejah/txtorcon/blob/exit_scanner/apps/exit_scanner/guard-exit-coverage.py
- phw's exitmap does something similar, but with stem rather than txtorcon. https://gitweb.torproject.org/user/phw/exitmap.git/tree
Other thoughts:
- You likely want to turn on FastFirstHopPK on the client, so it doesn't waste cpu power on handshakes at the first relay.
- If you make each relay connect to 6000 other relays in succession, and some of the relays can't handle 6000 open file descriptors at once, then you might mistakenly misinterpret "could not extend to that relay" as a property of the link between the relays when actually it's a property of the first relay. One option is to scan 500 and then move on to another first hop. Another option is to declare this a feature, and try to detect which relays can and which can't handle 6000 open file descriptors at once.
- n^2^ where n is 5000 is actually a heck of a lot of circuits. Should you just build circuits forever in the background, or are there some smarter algorithms for finding interesting patterns without making all 25 million circuits? In particular, there will be a background failure rate anyway, from e.g. relays that happen to be overloaded at that moment. So even 25 million circuits won't be enough.