Opened 11 years ago

Closed 10 years ago

Last modified 8 years ago

#1300 closed defect (fixed)

Authority that doesn't make a consensus never fetches one from elsewhere

Reported by: arma Owned by:
Priority: Medium Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: easy
Cc: arma, Sebastian, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

If moria1 fails to participate in making the consensus (e.g. it's down, or
it doesn't support the consensus method that is chosen for the consensus),
then it doesn't have a consensus for that period. It should realize it doesn't
have the newest consensus, and try to fetch one from the other authorities.

This bug is especially relevant while we have authorities that don't like
the consensus method in use -- currently dizum and dannenberg haven't upgraded,
so they never sign the consensus, so any clients (or relays!) who ask them for
a consensus have to retry elsewhere. I imagine the bug slows down bootstrapping

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (17)

comment:1 Changed 11 years ago by arma

For example, now shows

network-status-version 3
vote-status consensus
consensus-method 8
valid-after 2010-03-11 01:00:00
fresh-until 2010-03-11 02:00:00
valid-until 2010-03-11 04:00:00

whereas shows

network-status-version 3
vote-status consensus
consensus-method 9
valid-after 2010-03-12 03:00:00
fresh-until 2010-03-12 04:00:00
valid-until 2010-03-12 06:00:00

It's giving out a consensus that's 22 hours old -- the last one it managed to make?

comment:2 Changed 10 years ago by nickm

Milestone: Tor: 0.2.2.x-final

We should at least investigate fixing this for 0.2.2.x, if it isn't super hard.

comment:3 Changed 10 years ago by nickm

Description: modified (diff)
Keywords: easy added

This could be as easy as just removing the

  if (authdir_mode_v3(options))
    return; /* Authorities never fetch a consensus */

from update_consensus_networkstatus_downloads(). But maybe instead we should bulletproof it a little more and only fetch a consensus if either we have no consensus at all, or a certain minimum time has passed since our old one expired? Or is that unnecessary? Code investigation will be needed.

comment:4 Changed 10 years ago by nickm

Status: newneeds_review

See branch bug1300 in my public repo

comment:5 Changed 10 years ago by Sebastian

Looks good and gives newly bootstrapping authorities a consensus immediately in my test network.

In the patch, I'm wondering why we use valid_until here:

+ if (live_con && now < live_con->valid_until + AUTHORITY_NS_DOWNLOAD_DELAY)

wouldn't fresh_until be the right thing? That would mean the authority will fetch a new consensus if it didn't participate in making the current one, which seems to be what we want iiuic.

comment:6 Changed 10 years ago by nickm

Hrm. I was thinking that we probably wanted to make authorities more reluctant to use any consensus they hadn't signed themselves, and prefer using a non-fresh consensus to a non-signed-by-us consensus, and prefer that to a non-live consensus.

But now I can't remember why. Absent a compelling reason, making this "valid_until" is probably ok.

comment:7 Changed 10 years ago by arma

Priority: majornormal

Triage: it would be great to fix this for 0.2.2.x, but it's just as much a bug on for 0.2.1, so there's no reason to block 0.2.2. Also, it's an authority fix, so we don't need to time it to releases so much.

comment:8 Changed 10 years ago by nickm

bug1300 branch tweaked with suggested change.

comment:9 Changed 10 years ago by Sebastian

Looks good to me now, thanks!

comment:10 Changed 10 years ago by arma

defintely -> definitely

comment:11 Changed 10 years ago by arma

To be clear, the logic is to stick with our current consensus if these are all true:

  now >= current_consensus->valid_after 
  now <= current_consensus->valid_until
  now < current_consensus->fresh_until + AUTHORITY_NS_DOWNLOAD_DELAY

Sounds plausible.

We should be aware though that there's an interaction here with update_consensus_networkstatus_fetch_time(). Once we get a consensus, we set

time_to_download_next_consensus = start + crypto_rand_int((int)dl_interval);


    start = c->fresh_until + min_sec_before_caching;
    interval = c->fresh_until - c->valid_after;
    dl_interval = interval/2;

So if we have a consensus, we won't even think about fetching one until a random time between :02 and :32 in the next period.

That said, it looks like we try once per minute to get a new consensus, once both a) we're past that random point between :02 and :32, and b) we're past :15.

If I have my math right, for testing tor networks "b" trumps in all cases.

All of this said, what's the reasoning for the "wait at least 15 minutes" part? If it is the next voting period and we don't have a consensus, we're not going to suddenly retroactively compute one. And there's currently no other way for us to learn one besides fetching it -- that's what this bug is about. So the only merit I can see is that we spent at least 1/4 of the next period giving out our old consensus. That seems like unnecessary complexity, especially if (and we should check this) we reject the new consensus we've fetched if we don't like its signatures.

comment:12 Changed 10 years ago by arma

Another way to say "we spent at least 1/4 of the next period giving out our old consensus" is "we spend at least half the period where relays are asking us for an updated consensus stubbornly giving out the old one". That's no good.

If the "+ 15 minutes" thing is to handle clock skew, it's our own clock that's the only one that matters. We make the consensus, when we think we should. Nobody else is going to send us one.

Seems to me that we should put authorities in the "extra early" schedule category:

      /* Some clients may need the consensus sooner than others. */
      if (options->FetchDirInfoExtraEarly) {
        dl_interval = 60;
        if (min_sec_before_caching + dl_interval > interval)
          dl_interval = interval/2;

That is, add a "and authdir_mode_v3()" clause to the if.

comment:13 Changed 10 years ago by nickm

Hm. Do you mean "or authdir_mode_v3", as in

if (options->FetchDirInfoExtraEarly || authdir_mode_v3(...)) { ... }

comment:14 Changed 10 years ago by nickm

Dumping the "at least 15 minutes" thing is fine by me. We'll want to mess with it if we ever try to get better recovery for consensus-build failures, but that's not now.

comment:15 Changed 10 years ago by nickm

Tried to clean it up in bug1300. Like it now? Did I miss anything?

comment:16 Changed 10 years ago by nickm

Resolution: Nonefixed
Status: needs_reviewclosed

Merge alternate version (after IRC discussion with arma) from bug1300_alt. Thanks, all.

comment:17 Changed 8 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.