https://arthuredelstein.net/exits/
lists a pile of exit relays, including some very fast exit relays, that are failing all of their dns queries. That is, they claim to be exits but Tor clients probably rarely use them, yet clients still try to use them, contributing to the long tail of low-probability high-impact misery of being a Tor client.
We should verify that we agree with his scripts, and also make sure we are comfortable running the checks on our own.
Then we should contact the affected relays, and either get them to fix their dns, or figure out what the bug is, or failing all of that, set the badexit flag for them to save clients the trouble of trying them and failing.
Then once we've done a round of that, we should come up with a process by which we repeat it regularly.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
So, I started looking into this but I don't even get a single successful run so far (I tried twice). After a while, during the third round in the exit relay loop the script, is throwing exceptions and breaks:
main function encountered errorTraceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in errback self._startRunCallbacks(fail) File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1475, in gotResult _inlineCallbacks(r, g, status)--- <exception caught here> --- File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "/home/gk/exit-dns/tor_dns_survey/relay_perf.py", line 125, in _main exit_results = await test_relays(reactor, state, socks, [guard1], exits, 10, bareIP) File "/home/gk/exit-dns/tor_dns_survey/relay_perf.py", line 105, in test_relays result = await time_two_hop(reactor, state, socks, relay, exit_node, bareIP) File "/home/gk/exit-dns/tor_dns_survey/relay_perf.py", line 76, in time_two_hop circuit_results = await build_two_hop_circuit(state, guard, exit_node) File "/home/gk/exit-dns/tor_dns_survey/relay_perf.py", line 54, in build_two_hop_circuit return { "circuit" : circuit,builtins.UnboundLocalError: local variable 'circuit' referenced before assignment
I wonder how Arthur is running that and whether he encountered similar bugs. This is with Tor 0.3.5.8, Python 3.7.3, python3-txtorcon 18.3.0-1 on a Debian 10 system.
Trac: Cc: nusenu, ggus to nusenu, ggus, arthuredelstein
File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/home/gk/exit-dns/tor_dns_survey/relay_perf.py", line 127, in _main exit_results["_relays"] = relay_data(True) File "/home/gk/exit-dns/tor_dns_survey/relay_perf.py", line 28, in relay_data response = urllib.request.urlopen(req).read() File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.7/urllib/request.py", line 525, in open response = self._open(req, data) File "/usr/lib/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args) File "/usr/lib/python3.7/urllib/request.py", line 1360, in https_open context=self._context, check_hostname=self._check_hostname) File "/usr/lib/python3.7/urllib/request.py", line 1319, in do_open raise URLError(err)urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>
The problem seems to be gone on my box but I am not sure exactly what the issue was (except it's been a local one). I don't see a similar error on a different Debian Buster machine, freshly set up. Hrm.
I fixed a bunch of issues (patches attached) and have this running now. Need to think about good analysis of the results in a next step (while starting he contact/badexit process in parallel).
Awesome that you are looking into this Georg! I have the script running daily to generate the results on the website and I haven't run into the errors you saw. But your first patch makes sense to me and I applied it to master.
The other two patches I won't apply because I don't want to break the live site, but I'm happy to try to help with any problems you might be running into.
Awesome that you are looking into this Georg! I have the script running daily to generate the results on the website and I haven't run into the errors you saw. But your first patch makes sense to me and I applied it to master.
The other two patches I won't apply because I don't want to break the live site, but I'm happy to try to help with any problems you might be running into.
Thanks, you are welcome. The exit relay you specified is down, no? See: https://metrics.torproject.org/rs.html#details/7BD7B547676257EF147F5D5B7A5B15F840F4B579, so you need to pick another one, which my third patch does. Ideally, we would not hard-code a relay here as this breaks from time to time. (And broke for me, hence the patch) I guess a better solution would be to pick a proper exit relay from the relays you have been testing anyway before testing the non-exit-ones. But for now I don't see why you can't take my third patch, like how would it break the live site?
For the second one, yeah, I can see it. If you like I can try to rewrite it in a way that better fits your needs.
If you have some scripts to group the results given some parameters (like "all relays with a DNS error in 80% of the cases during the last n days") I'd be happy to hear about them it would probably smart to have some automated way for at least extracting all the info for bad-exit decisions.
For the scripting part, I played a bit with jq and will start using that for now. We should be more clear about the longer term plan here before investing in a more robust solution but I feel the script(s) I think about writing could easily be re-usable even in that scenario.
In parallel I start reaching out to relay operators to get their setup fixed and/or the relays badexited.
FWIW, I wrote a script that gives me the fingerprints of relays that fail to connect https://eff.org for a threshold of times (it's tried 10 times) and contacted the affected relay operators as far as contact information are available (I started with relays failing 10/10 times comparing both the results of Arthur's test run and my own). I'll start bad-exiting relays later this week and will post some statistics in this ticket as well.
I'll test the script I have further (and probably fine-tune it a bit more) under this week, too. The plan is to have it as part of the helper-scripts repo later on.
Okay, some final note here: I created a script that pulls exit relays failing DNS queries with a certain threshold (by default only relays failing 10/10 times are shown) out of some JSON blob created either by Arthur's exit dns check tool or by my own run. I contacted the respective exit relay ops (that's the "[s]" below where "[sf]" means "mails sent but bounced") last week and did not really hear back (just one replied looking into it). So, now's the time to actually start the badexiting process. I pushed a rule to mark all the exits below as badexit:
(FWIW: As said previously I slightly modified Arthur's script to use https://eff.org to check for exits as the results compared with Arthur's allow us to differentiate between DNSSEC only issues and more general ones. That's useful when contacting relay ops in particular until #33179 (moved) is solved).
Marking this ticket as in needs_review for the script I want to add to the helper-scripts repo.
Trac: Status: assigned to needs_review Reviewer: N/Ato dgoulet