Opened 5 years ago

Last modified 12 months ago

#9765 new defect

TorDNSel exit lists are missing expected data

Reported by: Ry Owned by:
Priority: Medium Milestone:
Component: Core Tor/TorDNSEL Version:
Severity: Normal Keywords: TorCheck
Cc: tup, lunar, arlo@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I've run some checks on the exit lists that we're using for TorCheck, here's some stats I'm seeing a recent file:
Missing: 629 /* IPs not in the exit list */
Same: 463 /* IP in exit list matches IP in consensus */
OK(Updated): 19 /* IP from TorDNSel was helpful! */
Total: 1111 /* Total count of IPs in the current consensus */

This is from comparing routers listed in the exit lists and comparing the fingerprints to that listed in teh consensus. There's 2196 lines in the latest exit list (ignoring header lines) which means the exit list has 549 routers listed (with some not in active consensus). This is a far cry from the 1111 listed in the consensus.

So the problem is, we're having to use 629 self published IPs (from the latest consensus doc) which aren't always accurate as you know. This is a pretty problem for a project like TorCheck.

For reference, the exit list I'm referencing is 'Downloaded 2013-09-18 03:02:02', but this has been a problem since I first saw it a week or so ago. (We assumed it was intermittent at the time)

Can you investigate why TorDNSel is missing so many IPs? If you're not the right guy/gal, please pass this back and I'll find a new owner!

Child Tickets

Change History (10)

comment:1 Changed 5 years ago by karsten

TorCheck had issues in the past months (https://blog.torproject.org/blog/tor-check-outage-03-and-04-july-2013), so maybe it's related to that. Can you re-run your analysis on the archives of, say, all of 2013? It would be interesting to know whether this is a new problem or not.

comment:2 Changed 5 years ago by phobos

Owner: phobos deleted
Priority: majornormal
Status: newassigned

comment:3 Changed 5 years ago by phobos

tordnsel runs a standard tor client, if the tor client isn't seeing the whole consensus, then we have far larger issues.

comment:4 Changed 5 years ago by arma

Cc: tup lunar added

Cc'ing tup as the original tordnsel developer, and Lunar as somebody who's looked at it more recently.

I guess step zero is to confirm or deny phobos's theory above that the Tor client it's using somehow doesn't inform it about all the relays.

comment:5 Changed 5 years ago by arlolra

Cc: arlo@… added

comment:6 in reply to:  1 Changed 5 years ago by Ry

Replying to karsten:

TorCheck had issues in the past months (https://blog.torproject.org/blog/tor-check-outage-03-and-04-july-2013), so maybe it's related to that. Can you re-run your analysis on the archives of, say, all of 2013? It would be interesting to know whether this is a new problem or not.

AFAIK that was due to implementation details of the older(still current) TorCheck version which wasn't reliant on published exit-lists as much. I can surely check across 2013 at the weekend perhaps (Don't really have the connection to actually download that much data right this moment)


As for the Tor client not seeing the whole consensus, I wasn't exactly sure what you were suggesting, so I did two things. For my Tor client (built off master), I ran two scripts:

Comparing the cached-consensus with the latest consensus on the metrics server that I used for the OP.
There have been no differences for the last two hours for which I tried. (If you need to test, there's a comment with a helpful rsync command that will pull the latest consensus doc, if you change the date/hour)

Comparing the fingerprints present in the cached consensus document and a merge of both cached-descriptors and cached-descriptors.new. In this comparison I have found a really tiny number of servers are missing (3 when I started testing, seemed to stick at 2 for an hour throughout 2 different consensus files) If you don't want to compare against .new files then supply a -s flag, but I think this would be an error to do so?

From this, I conclude that the client likely sees enough of the consensus, so that the discrepancy in the OP is not caused by that specifically.

It's likely worth running the scripts against the version of Tor client being used by TorDNSEL in production just as a sanity check. I think arma might be on the right lines in that it's potentially a problem between TorDNSEL and Tor, or TorDNSEL is perhaps timing out connections and never retrying/logging them?

comment:7 Changed 5 years ago by arlolra

Some speculation:

TorDNSEL's conf currently looks like,

TestDestinationAddress 38.229.72.22:8080,8443,110,5190,6667,6697,9030

I wonder how many of the above relays allow exiting but just not to those ports or that IP? From the false negative work we've been doing on check, there're at least two that can only exit on 443 and will never be picked up. What's the best set of ports to run on? Is this going to account for half the exits?

The data used in the above investigation was from the metrics project. I noticed the cron that collects the data runs at around 2 minutes past the hour but TorDNSEL is busy collecting new data for what looks like 20 minutes. Is it getting everything in exit-addresses.new? Should probably rerun the tests with a file straight from TorDNSEL in production to confirm the above.

comment:8 in reply to:  7 Changed 5 years ago by karsten

Replying to arlolra:

The data used in the above investigation was from the metrics project. I noticed the cron that collects the data runs at around 2 minutes past the hour but TorDNSEL is busy collecting new data for what looks like 20 minutes. Is it getting everything in exit-addresses.new? Should probably rerun the tests with a file straight from TorDNSEL in production to confirm the above.

That's correct, the cronjob runs at 2 minutes past the hour. If there's a better time to fetch the time, I can change that to any minute past the hour. I ran a quick experiment where I downloaded the file every minute today between 11:04 and 12:09 UTC. Here are the file sizes and last-modified times:

116408 11:04
116408 11:05
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116408 11:06
116721 11:29
116721 11:30
116721 11:31
117034 11:32
117034 11:33
117034 11:34
117034 11:35
117191 11:36
117191 11:37
117350 11:38
117350 11:39
117351 11:40
117509 11:41
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117509 11:42
117666 12:02
117822 12:03
117822 12:04
117822 12:05
117822 12:06
117822 12:06
117822 12:06
117822 12:06

It seems that the :06 and :42 files are left unchanged for long enough to fetch them. So, should I change the cronjob to either 25 or 55 minutes past the hour?

Or should I learn more about the exit-addresses.new file and maybe fetch that one, too?

comment:9 Changed 12 months ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

comment:10 Changed 12 months ago by teor

Status: assignednew

Mark all tickets that are assigned to nobody as "new".

Note: See TracTickets for help on using tickets.