Opened 5 months ago

Closed 5 months ago

#32842 closed defect (implemented)

Give notifications if tor26 is missing from the consensus

Reported by: Sebastian Owned by: atagar
Priority: Medium Milestone:
Component: Archived/DocTor Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

It should be possible to infer whether tor26 is down or not by looking at the consensus. Would be nice to have that if it isn't too much work :)

Child Tickets

Attachments (2)

doctor-ticket32842.patch (4.8 KB) - added by starlight 5 months ago.
git-am patch to resolve ticket
doctor-ticket32842-fix.patch (1.4 KB) - added by starlight 5 months ago.
touch-up to patch

Download all attachments as: .zip

Change History (25)

comment:1 Changed 5 months ago by Sebastian

Component: Metrics/Consensus HealthCore Tor/DocTor
Owner: changed from tom to atagar

comment:2 Changed 5 months ago by atagar

Resolution: wontfix
Status: newclosed

Hi Sebastian. tor26 intentionally sandtraps DirPort requests without a '.z' suffix, in turn breaking DocTor. Excluding it is intentional...

https://gitweb.torproject.org/doctor.git/commit/?id=5bae2023

comment:3 Changed 5 months ago by arma

atagar: would it be possible to look at the consensus and let us know if tor26 is missing from it? That has nothing to do with asking tor26 any direct questions.

comment:4 Changed 5 months ago by atagar

We could. I'd prefer for tor26 to remove its block, but a patch is welcome if someone would care to make that adjustment.

comment:5 Changed 5 months ago by starlight

why not have DocTor request compressed documents and/or obtain them from the OR port? re: #29186 #29240

comment:6 Changed 5 months ago by atagar

Hi starlight. DocTor does download compressed documents. tor26 simply breaks DirPort requests without a '.z' suffix.

comment:7 Changed 5 months ago by starlight

interesting, so compression can be requested either via .z suffix or a HTTP attribute, why not just tack on the .z anyway?

comment:8 Changed 5 months ago by atagar

Could. There's many hacks that can work around it. I just don't see merit in tor26's block.

If someone cares enough to submit a patch I'll merge it.

Last edited 5 months ago by atagar (previous) (diff)

comment:9 Changed 5 months ago by starlight

I can write one, an adjustment to _download_descriptors() in stem/descriptor/remote.py to append .z when compression is enabled covers it? Or more to it?

Before the making the change, I notice tor26 appears to have no problem serving descriptors and full consensus in PLAIN / GZIP / ZSTD form on both the dir and or ports to STEM. Perhaps the patch or iptables rule suppressing requests absent a .z suffix has been left out in recent deploys or eliminated?

Last edited 5 months ago by starlight (previous) (diff)

comment:10 Changed 5 months ago by atagar

an adjustment to _download_descriptors() in stem/descriptor/remote.py

Sorry, a patch to account for tor26's dysfunction will need to reside within DocTor rather than Stem.

comment:11 Changed 5 months ago by starlight

I see

commit 5bae2023
Date: Wed Apr 11 08:58:31 2018 -0700
To cut down on abuse tor26 rejects DirPort requests without a '.z' suffix (technically it throttles them, but I've never seen one work so effectively it's blocked). Stem no longer appends this suffix, using headers to indicatee the compression it wants instead.

This could be solved by reverting 5bae2023 and 93c3a9b4 and making the change described above to STEM to add .z when any compression is requested, but it seems you dislike the approach. Or by checking the consensus for Tor26 as first indicated above, which is probably more work though I have not looked at at it. Which is "less bad?" If the .z hack were applied only when an option in STEM was present would it be more palpable? Alternately if DocTor were to check the consensus for tor26, it might as well check for the presence of all authorities and report any missing.

Seems to me nothing special should be required for testing DocTor except changing the email notification address, is is that true?

comment:12 Changed 5 months ago by atagar

Which is "less bad?

The best option is for tor26 to remove its block. Second best is one of the following...

  1. Do nothing (which is why I closed this ticket).
  2. Adjust DocTor's _get_documents() to monkey patch or do other hackery work around tor26's dysfunction.
  3. Adjust the has_authority_flag test to specially handle tor26.

Alternately if DocTor were to check the consensus for tor26, it might as well check for the presence of all authorities and report any missing.

Actually, DocTor already does...

https://gitweb.torproject.org/doctor.git/tree/consensus_health_checker.py#n632

Seems to me nothing special should be required for testing DocTor except changing the email notification address, is is that true?

You shouldn't need to do anything. When ran locally DocTor prints its notification rather than emailing it.

https://gitweb.torproject.org/doctor.git/tree/util.py#n21

comment:13 in reply to:  12 Changed 5 months ago by starlight

Replying to atagar:

Alternately if DocTor were to check the consensus for tor26, it might as well check for the presence of all authorities and report any missing.

Actually, DocTor already does...

Would a new flag in the authority config that suppresses direct checking, in place of the present hard-coded exclusion for tor26 work? The config for tor26 goes back in with the flag set, direct checks are disabled and the existing in-consensus check will report when it's missing?

comment:14 Changed 5 months ago by starlight

oops, I incorrectly projected 'data' had a config for authorities, so instead perhaps a dict with authorities to not check directly while eliminating the hard-coded tor26 removal from the list acquired from STEM via stem.directory.Authority.from_cache()?

comment:15 Changed 5 months ago by starlight

next above seems a direct an reasonably clean way to solve it, easy to tweak to exclude different authorities that might become difficult to reach; will give that a try

Changed 5 months ago by starlight

Attachment: doctor-ticket32842.patch added

git-am patch to resolve ticket

comment:16 Changed 5 months ago by starlight

Resolution: wontfix
Status: closedreopened

Uploaded fix, please review.

comment:17 Changed 5 months ago by starlight

running 2to3 on all .py scripts updates DocTor to Python 3 successfully

comment:18 Changed 5 months ago by atagar

Resolution: fixed
Status: reopenedclosed

Thanks starlight! Patch merged and deployed with a small adjustment.

comment:19 Changed 5 months ago by starlight

I object to removing DIRAUTH_SKIP_REACHABLE and DIRAUTH_SKIP_SEEN:

a) spent several hours testing and making certain they work (touch-up fix attached)
b) they are based on the code you had running in production
c) because they are tested it will save time if/when needed, just add dirauth names
d) required in particular when an authority is completely offline for several days or weeks

The touch-up patch contains:

Trailing comma in the single entry DIRAUTH_SKIP_CHECKS tuple def, otherwise it becomes a scalar and set() converts it to a set of characters. Important.

Fix to the seen-in-consensus test, must subtract the skip list from both known_authorities and seen_authorities.

Last edited 5 months ago by starlight (previous) (diff)

comment:20 Changed 5 months ago by starlight

Resolution: fixed
Status: closedreopened

Changed 5 months ago by starlight

touch-up to patch

comment:21 Changed 5 months ago by atagar

Thanks starlight, good catch. Pushed a fix for the tuples.

As for the unused constants I'd be amenable to a 'skip all checks for realz' constant for when a dirauth is down but as its operator I didn't find the 'skip reachable/seen' separation desirable.

comment:22 Changed 5 months ago by starlight

Have you considered the case where an Authority is under prolonged DDoS
and, while functional cannot be reliably contacted?  That would be

DIRAUTH_SKIP_REACHABLE  yes
DIRAUTH_SKIP_SEEN       no
DIRAUTH_SKIP_CHECKS     yes

The case where an authority aggressively rate limits connections?

DIRAUTH_SKIP_REACHABLE  yes
DIRAUTH_SKIP_SEEN       no
DIRAUTH_SKIP_CHECKS     no

The recent case where dannenberg consensus could not sign the consensus?

DIRAUTH_SKIP_REACHABLE  no
DIRAUTH_SKIP_SEEN       no
DIRAUTH_SKIP_CHECKS     yes

A situation where an authority is not visible to the majority of
other authorities but is visible to DocTor?

DIRAUTH_SKIP_REACHABLE  no
DIRAUTH_SKIP_SEEN       yes
DIRAUTH_SKIP_CHECKS     yes

Of course the case where an authority is offline for a extended interval:

DIRAUTH_SKIP_REACHABLE  yes
DIRAUTH_SKIP_SEEN       yes
DIRAUTH_SKIP_CHECKS     yes

comment:23 Changed 5 months ago by atagar

Resolution: implemented
Status: reopenedclosed

Hi starlight. I'm in the process of moving DocTor tickets to GitHub (https://github.com/torproject/doctor/). Your skipping mechanism is great, but at the end of the day it is a tool for DocTor's operator and as such their decision if they want it.

Would you like responsibility for running and administering DocTor? I wrote DocTor as a favor to Karsten, and I'm willing to transfer ownership if you want it. If you do then please file a ticket at https://github.com/torproject/doctor/issues and I'll discuss it with the dirauth operators.

Note: See TracTickets for help on using tickets.