Opened 8 years ago

Closed 7 years ago

#5247 closed enhancement (implemented)

Include reverse DNS lookup results in details

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We should run reverse DNS lookups and include their results in details documents. What's the best way to run these lookups in Java? Also, do we have to run them every hour for every relay?

I wrote a simple Java application that looks up host names using the following code line:

InetAddress.getByName(address).getHostName()

The application also measures how long each lookup took. I ran it for the first 1000 relays in the consensus published on 2012-02-18 at 03:00:00. Here are some simple statistics:

 Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
114.0   688.8  1032.0  1906.0  1628.0 81120.0

So, looking up all 2759 relays in the consensus would have taken about 1.5 hours. There's no way for sequentially looking up reverse DNS entries for all relays in a consensus every hour. We'll need to make some optimizations before even starting. Questions are:

  • Is there a faster way to look up reverse DNS entries than the one used in this simple Java application?
  • Can we group multiple lookups and make a single request for them?
  • How often do we need to refresh a reverse DNS lookup result? In theory we could cache results for an arbitrary time, but would they still be accurate after 3, 6, 12, 24 hours?
  • How many requests can we make in parallel using Java threads? The Java side is easy and probably doesn't eat too much CPU time, but would we trigger some mechanism at our ISP when we make 100 requests at a time?

Here are some comments after talking to George and Damian:

  • An average lookup time of 1.9 seconds per request isn't that unlikely.
  • Using a thread pool with 5 lookup threads should be a fine start.
  • Caching results for 12 hours should work fine. It's much more likely that a relay IP address changes than that the host name changes. We could also keep some simple statistics how often host names actually change when looking them up; if the fraction is higher than we'd like it to be, we can still reduce the caching period to 6 hours or less. We should document in protocol.html how often host names are looked up.
  • Performing multiple lookups per request would be cool, but is probably not supported by Java libraries.
  • I re-ran the analysis above, but this time with the host tool instead of Java. Results are much lower, so there must be something going on in Java which slows down the lookup. More research needed.
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.0320  0.1800  0.3780  0.4252  0.5420 12.0300

(This was issue 7 in my GitHub repository.)

Child Tickets

Change History (1)

comment:1 Changed 7 years ago by karsten

Resolution: implemented
Status: newclosed

Implemented an initial version that uses a thread pool of 5 threads, caches results for 12 hours, and uses timeouts for single requests and the overall request process. If this approach turns out to be too slow or if we run into problems with lookups taking place only every 12 hours, let's open a new ticket. Closing.

Note: See TracTickets for help on using tickets.