#18177 closed enhancement (implemented)

Check Fallback Directory IPv4 and IPv6 addresses using DocTor

Reported by: teor Owned by: atagar
Priority: Medium Milestone:
Component: Core Tor/DocTor Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In #17158, we added a list of default fallback directories to Tor. Most clients will use these fallback directories to bootstrap in preference to the authorities.

After #17840 is merged, tor clients can bootstrap over IPv6. If the fallback directory has an IPv6 address, IPv6 clients will use it.

Can we check fallback IPv4 and IPv6 addresses regularly using DocTor?

It would be nice to report a summary figure, something like:
"200/250 fallback directories (80%) are reachable"

Child Tickets

Change History (19)

comment:1 Changed 23 months ago by atagar

Nice idea! The ipv6 part is blocked on #17298 but we can move forward on the ipv4 addresses.

comment:2 Changed 23 months ago by atagar

Oops, forgot to ask - DocTor is a tool to help directory authority operators. I'm fine with adding a check for the fallback directories but who should be notified about it and what action will be taken? I'm against generating notices unless someone is volunteering to be responsible to resolve the issue.

comment:3 in reply to:  2 ; Changed 23 months ago by teor

Replying to atagar:

Oops, forgot to ask - DocTor is a tool to help directory authority operators. I'm fine with adding a check for the fallback directories but who should be notified about it and what action will be taken? I'm against generating notices unless someone is volunteering to be responsible to resolve the issue.

In general, it's good to know how the figure is trending, and be transparent about it, so can it be logged to #tor-bots every hour?

It's an issue if the figure drops below, say, 50%. We need to update the list of fallback directories in the next point release of every supported tor version. (Much like we update GeoIP.)

That's something I can do, but for redundancy, we should also notify nickm as the Core Tor lead.

comment:4 in reply to:  3 Changed 23 months ago by teor

Replying to teor:

Replying to atagar:

Oops, forgot to ask - DocTor is a tool to help directory authority operators. I'm fine with adding a check for the fallback directories but who should be notified about it and what action will be taken? I'm against generating notices unless someone is volunteering to be responsible to resolve the issue.

In general, it's good to know how the figure is trending, and be transparent about it, so can it be logged to #tor-bots every hour?

Hang on, am I confusing consensus-health with DocTor?
I don't know how the DirAuth infrastructure works.

comment:5 Changed 23 months ago by atagar

consensus-health is a website, DocTor is automated alarms. Once upon a time they were a single java codebase but they're now separate. To add confusion though the email list DocTor notifies is still called consensus-health@. ;)

Ok. Sounds like this should be a new daily check that notifies consensus-health@ if we drop below 50%, then someone can make a ticket about it.

comment:6 Changed 22 months ago by atagar

Hi teor, where is the list of fallback directories? Sounds from the other ticket that it isn't merged into tor yet?

Without something to monitor this ticket is unactionable.

comment:7 in reply to:  6 Changed 22 months ago by teor

Replying to atagar:

Hi teor, where is the list of fallback directories? Sounds from the other ticket that it isn't merged into tor yet?

Without something to monitor this ticket is unactionable.

See src/or/fallback_dirs.inc in master for the current list in 0.2.8.1-alpha.
There's a minor update on the ticket that we'll merge later in the alpha series, once the set of changes is significant enough.

comment:8 Changed 22 months ago by atagar

Resolution: implemented
Status: newclosed

Thanks teor! Done, but with a lot of expansion over what you asked. Stem now has a FallbackDirectory class with two methods for getting this information...

  • FallbackDirectory.from_cache() provides the latest fallback directories Stem has cached. This is only as up-to-date as your Stem release but is quicker and avoids relying on gitweb.

Advantages are...

  • Stem's descriptor.remote module now puts less load on the directory authorities since it uses fallback directories as well.
  • Much, much easier to add further scripts that take advantage of the fallback directories.
  • Running Stem's integ tests with the ONLINE target includes a test that exercises all the fallback directories, notifying us if any are down.

Here's an example script to check the performance of the fallback directories...

import time
from stem.descriptor.remote import DescriptorDownloader, FallbackDirectory

downloader = DescriptorDownloader()

for fallback_directory in FallbackDirectory.from_cache().values():
  start = time.time()
  downloader.get_consensus(endpoints = [(fallback_directory.address, fallback_directory.dir_port)]).run()
  print('Downloading the consensus took %0.2f from %s' % (time.time() - start, fallback_directory.nickname))
% python example.py
Downloading the consensus took 5.07 from Doedel22
Downloading the consensus took 3.59 from tornoderdednl
Downloading the consensus took 4.16 from Logforme
Downloading the consensus took 6.76 from Doedel21
Downloading the consensus took 5.21 from kitten4
Downloading the consensus took 3.25 from kili
Downloading the consensus took 4.23 from wagner
Downloading the consensus took 3.30 from BabylonNetwork03
Downloading the consensus took 3.50 from kitten2
Downloading the consensus took 3.31 from coby
Downloading the consensus took 5.61 from GrmmlLitavis
Downloading the consensus took 5.05 from Doedel24
Downloading the consensus took 3.60 from BabylonNetwork02
Downloading the consensus took 3.61 from Unnamed
Downloading the consensus took 2.71 from Binnacle
Downloading the consensus took 30.80 from eriador
Downloading the consensus took 6.91 from Doedel26
Downloading the consensus took 3.30 from fluxe4
Downloading the consensus took 3.16 from PedicaboMundi
Downloading the consensus took 3.33 from kitten1
Downloading the consensus took 3.39 from fluxe3

Feel free to reopen if you need anything else.

comment:9 in reply to:  8 Changed 22 months ago by teor

Resolution: implemented
Status: closedreopened

Replying to atagar:

Thanks teor! Done, but with a lot of expansion over what you asked. Stem now has a FallbackDirectory class with two methods for getting this information...

  • FallbackDirectory.from_cache() provides the latest fallback directories Stem has cached. This is only as up-to-date as your Stem release but is quicker and avoids relying on gitweb.

In #16774, we added the fallback directories to GETINFO defaults. Tor 0.2.8.1-alpha and later should be able to tell stem the fallback directories this way as well.

  • Stem's descriptor.remote module now puts less load on the directory authorities since it uses fallback directories as well.

FYI, tor currently tries to connect to 3 fallback directories in the first few seconds, then tries an authority. It downloads from the first one that connects, and cancels the others. See #4483.

Downloading the consensus took 30.80 from eriador

That's not good, can doctor please report any fallback directories that take a relatively long amount of time to serve a consensus (like doctor does for the authorities), and report any that take more than 10 seconds?

How can I get on a list that gets this output, or will it appear on IRC in #tor-bots?

comment:10 Changed 22 months ago by atagar

In #16774, we added the fallback directories to GETINFO defaults. Tor 0.2.8.1-alpha and later should be able to tell stem the fallback directories this way as well.

That's fine. But this implementation doesn't require an active tor instance. For DocTor and other scripts dealing with descriptors having a tor process is an unnecessary hassle.

That's not good, can doctor please report any fallback directories that take a relatively long amount of time to serve a consensus (like doctor does for the authorities), and report any that take more than 10 seconds?

I doubt Nick wants a ticket every time a fallback directory is sluggish. If you're interested in avoiding slow fallback directories any reason not to simply run the script I gave above when picking them?

comment:11 in reply to:  10 Changed 22 months ago by teor

Resolution: fixed
Status: reopenedclosed

Replying to atagar:

That's not good, can doctor please report any fallback directories that take a relatively long amount of time to serve a consensus (like doctor does for the authorities), and report any that take more than 10 seconds?

I doubt Nick wants a ticket every time a fallback directory is sluggish. If you're interested in avoiding slow fallback directories any reason not to simply run the script I gave above when picking them?

Sure, split off into #18398.
Thanks for implementing the IPv4 checks, the IPv6 checks are awaiting #17298.

comment:12 in reply to:  8 Changed 21 months ago by tscpd

Feel free to reopen if you need anything else.

should there be a ipv6 dirport check, too?

Last edited 21 months ago by tscpd (previous) (diff)

comment:13 Changed 21 months ago by atagar

should there be a ipv6 dirport check, too?

See #17298

comment:14 in reply to:  10 Changed 20 months ago by teor

Resolution: fixed
Status: closedreopened

Replying to atagar:

That's not good, can doctor please report any fallback directories that take a relatively long amount of time to serve a consensus (like doctor does for the authorities), and report any that take more than 10 seconds?

I doubt Nick wants a ticket every time a fallback directory is sluggish. If you're interested in avoiding slow fallback directories any reason not to simply run the script I gave above when picking them?

So I'm doing that when I pick them, but what if they become slow some time after the release?

Also, in #18812, we realised that we'd like to check that the fallback's current key matches the one in the source code.

So can you modify DocTor to call a fallback "failed" if:

  • it doesn't respond to an ORPort request, or
    • (almost all clients will connect to the ORPort and issue a begindir request)
  • the key doesn't match the one in the fallback list, or
  • it takes longer than 15 seconds to serve a consensus

(Are these doable? Is the amount of effort ok?
The current checks are still quite useful.)

It's ok to have a few fallbacks fail.
But I'd like to know when 25% of fallbacks are failing, so that we can update the list in the next point release.
How do I get that email/notification?

comment:15 Changed 20 months ago by atagar

Hi teor, sounds good.

it doesn't respond to an ORPort request

Hmmm. We can ping the ORPort but that's about it. DocTor can exercise the DirPort, but nothing besides tor knows how to talk the ORPort protocol. Capability I'd love to have in Stem though. :)

So to be clear are you asking for a ORPort ping? DirPort usage? Both?

the key doesn't match the one in the fallback list

Which key doesn't match? fallback_dirs.inc includes the address, dir_port, orport, fingerprint, and weight. Not spotting any keys.

it takes longer than 15 seconds to serve a consensus

Sure, can do. I probably won't be getting to this for a while though (pretty busy with nyx).

How do I get that email/notification?

Specify an address and we'll have DocTor send the notices there.

comment:16 in reply to:  15 Changed 20 months ago by teor

Replying to atagar:

Hi teor, sounds good.

it doesn't respond to an ORPort request

Hmmm. We can ping the ORPort but that's about it. DocTor can exercise the DirPort, but nothing besides tor knows how to talk the ORPort protocol. Capability I'd love to have in Stem though. :)

So to be clear are you asking for a ORPort ping? DirPort usage? Both?

Please ping the IPv4 ORPort, download a consensus from the IPv4 DirPort, and, when #17298 is done, ping the IPv6 ORPort. (Downloading a consensus from the IPv6 DirPort will be unreliable, and should wait for #18394. But that's OK, because almost all clients use the ORPort.)

the key doesn't match the one in the fallback list

Which key doesn't match? fallback_dirs.inc includes the address, dir_port, orport, fingerprint, and weight. Not spotting any keys.

The fingerprint.
I guess stem can't check the fingerprint unless it speaks the ORPort protocol?

it takes longer than 15 seconds to serve a consensus

Sure, can do. I probably won't be getting to this for a while though (pretty busy with nyx).

How do I get that email/notification?

Specify an address and we'll have DocTor send the notices there.

teor2345@…, and someone else as a backup. I think this should be nickm. He needs to know about failing fallbacks so we can decide whether to do a new point release, even if I'm the one that updates the fallback list.

comment:17 Changed 20 months ago by atagar

The fingerprint.
I guess stem can't check the fingerprint unless it speaks the ORPort protocol?

What value are you hoping to get from this? Would checking that the fingerprint matches what's in the consensus do what you're after? Stem validates signatures of a few descriptor types but I don't think that's really what you're after here.

comment:18 in reply to:  17 Changed 20 months ago by teor

Replying to atagar:

The fingerprint.
I guess stem can't check the fingerprint unless it speaks the ORPort protocol?

What value are you hoping to get from this? Would checking that the fingerprint matches what's in the consensus do what you're after? Stem validates signatures of a few descriptor types but I don't think that's really what you're after here.

Clients will refuse to connect to a fallback if it's changed its fingerprint from the fingerprint in the hard-coded list.

So yes, comparing the fingerprint in the fallback list to the current one for that IPv4:ORPort and IPv6:IPv6ORPort (if present) would discover this kind of failure. (And the IPv6 check wouldn't need any IPv6 connectivity!)

comment:19 Changed 19 months ago by atagar

Resolution: implemented
Status: reopenedclosed

This is now a thing. Feel free to reopen if ya need anything else.

Note: See TracTickets for help on using tickets.