Opened 7 months ago

Closed 5 months ago

#20539 closed defect (implemented)

Make sure fallback directories aren't running buggy versions / can deliver a recent consensus

Reported by: teor Owned by:
Priority: Medium Milestone:
Component: Core Tor/Fallback Scripts Version:
Severity: Normal Keywords: fallback
Cc: Actual Points:
Parent ID: #18828 Points: 0.5
Reviewer: Sponsor:

Description

After #20499, we should reject fallback directories that deliver a consensus outdated by more than N hours, where N is one of [1, 2, 3].

Child Tickets

Change History (13)

comment:1 Changed 7 months ago by arma

Now that 0.2.9.5-alpha is out (which we think fixes #20499, right?), it might make sense to look through #20501 and see which relays are buggy, and contact their operators to get them to upgrade?

That said, the question I think is not whether it is delivering a bad consensus right now. The question is whether it is running the buggy versions.

Maybe a stem script to take in the list of fallback dirs, and output the ones that are running a buggy version, would be what you want? Then we can re-run the script periodically until we reach the point where we want to un-fallbackdir the ones that stubbornly remain. (And somewhere in there we should do the "send mail to all the operators running buggy versions to let them know that they need to upgrade. Pretty soon I think, right? Once 0.2.9.5-alpha packages are out and ready?)

comment:2 Changed 7 months ago by teor

There are a few separate issues here:

  1. We don't want to ever add any fallback directories running the buggy versions,
  2. We don't want to ever add any fallback directories that can't deliver a recent consensus, regardless of version, and
  3. We want to remove any fallback directories that do either of the above things.

If we modify the fallback selection script to check 1 and 2, 3 will happen automatically when we next rebuild the list (or when we remove failed fallbacks from the existing list).

comment:3 follow-up: Changed 7 months ago by arma

Sounds good. In terms of 3, we might be happiest if we contact the buggy relay operators and give them some chance to upgrade, rather than immediately cutting out all of the fallback operators who were kind enough to test the alpha version for us. :)

comment:4 in reply to: ↑ 3 Changed 7 months ago by teor

Replying to arma:

Sounds good. In terms of 3, we might be happiest if we contact the buggy relay operators and give them some chance to upgrade, rather than immediately cutting out all of the fallback operators who were kind enough to test the alpha version for us. :)

I try to mail fallback operators before removing fallbacks, as most issues are resolvable by the operator, and it helps me determine whether the issue is permanent (lost keys, lost IPs) or temporary (changed ports).

comment:5 Changed 7 months ago by arma

  • Summary changed from Make sure fallback directories deliver a recent consensus to Make sure fallback directories aren't running buggy versions / can deliver a recent consensus

comment:6 Changed 6 months ago by atagar

#20501 has a listing of relays with this issue and a small script to check if relays are serving a stale consensus or not. If you wrap that check into your fallback selection script ya should be good to go.

comment:7 Changed 6 months ago by teor

Bug #20499 affects versions from 0.2.9.1-alpha-dev to 0.2.9.4-alpha-dev and version 0.3.0.0-alpha-dev, so we need to exclude these versions as fallbacks. We can't rely on the authorities to do this, as #20509 has not been deployed to directory authorities yet.

We should also exclude authorities that can't deliver a recent microdesc consensus, based on the script in #20501. We already download a consensus from every authority to check download times. If we download a miscrodesc consensus, that's what clients will be downloading. And it's slightly faster.

comment:8 Changed 6 months ago by teor

  • Milestone changed from Tor: 0.2.9.x-final to Tor: 0.3.0.x-final
  • Status changed from new to needs_review

We can't rely on the recommended versions check in the script, because 0.2.9.4-alpha is still recommended (#20896).

This is part of the branch in #18828.

comment:9 Changed 6 months ago by teor

  • Status changed from needs_review to needs_revision

I think I might need to change this check to tolerate consensuses as old as REASONABLY_LIVE_CONSENSUS (24 hours), because of #20909.

Ideally, we should only tolerate RELAY_CLOCK_SKEW (3 hours) or maybe even RELAY_CLOCK_SKEW - 2 hours = 1 hour, because directory mirrors should update in the first hour. (Check this with the dir-spec.)

Edit: see #20942 for details.

Last edited 6 months ago by teor (previous) (diff)

comment:10 Changed 6 months ago by teor

(We'll make the fallback CONSENSUS_EXPIRY_TOLERANCE lower in #20942, after #20909 is fixed.)

comment:11 Changed 5 months ago by nickm

  • Component changed from Core Tor/Tor to Core Tor/Fallback Scripts
  • Milestone Tor: 0.3.0.x-final deleted

Batch-move updateFallbackDirs.py tickets into a new component, and remove them from maint-0.3.0.

I'm doing this as a separate component, after discussion with teor, mainly because development here seems to be decoupled from development on tor itself: they don't need to have the same release schedules, for example.

comment:12 Changed 5 months ago by nickm

Merged teor/fallbacks-20161219, which included a patch for this.

comment:13 Changed 5 months ago by nickm

  • Resolution set to implemented
  • Status changed from needs_revision to closed
Note: See TracTickets for help on using tickets.