Opened 7 years ago

Closed 2 years ago

#5506 closed defect (worksforme)

Do we just keep downloading a consensus if our clock is wrong?

Reported by: arma Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-client
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In #4011 I pointed out that deciding we need a new consensus if "c->valid_after > now" freaks me out. If my clock is set 5 hours in the past and I get a consensus that's not valid yet, do I just keep fetching a new one? I guess the same question holds for if my clock is 5 hours in the future and I keeping getting consensuses that are no longer valid.

(Scheduling for 0.2.4 so we don't add to our 0.2.3 pain. Feel free to fix earlier since it's likely a real bug that adds load to our dir authorities / mirrors.)

Child Tickets

Change History (14)

comment:1 Changed 7 years ago by arma

I'm not actually sure what we should *do* in these cases. We don't want to just quit fetching, because then a jerk can give us a wrong consensus and we'll freeze in place. Some sort of "if the one we just got is the same as the one we already had, stop resetting time_to_download_next_consensus[i]" logic might be a good start. Another approach might be to reset time_to_download_next_consensus[i] if !c but be more lenient if the one we have is merely wrong. (Wrong like the consensus is from the future, I mean -- if the consensus is expired we need another one rsn.)

comment:2 Changed 7 years ago by arma

For added fun, if our user counting metrics are looking at number of consensus fetches, a few buggy clients here could be disproportionately affecting our numbers.

comment:3 Changed 7 years ago by nickm

Keywords: tor-client added

comment:4 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:5 Changed 7 years ago by arma

Priority: normalmajor

There's some early evidence that this might be happening in practice.

comment:6 in reply to:  1 Changed 7 years ago by sysrqb

Replying to arma:

I'm not actually sure what we should *do* in these cases. We don't want to just quit fetching, because then a jerk can give us a wrong consensus and we'll freeze in place. Some sort of "if the one we just got is the same as the one we already had, stop resetting time_to_download_next_consensus[i]" logic might be a good start.

We can add this catch during networkstatus_set_current_consensus. We already check if the digest of the current and digest of the fetched are the same, so a flag can be flipped there.

Another approach might be to reset time_to_download_next_consensus[i] if !c but be more lenient if the one we have is merely wrong. (Wrong like the consensus is from the future, I mean -- if the consensus is expired we need another one rsn.)

And by more lenient you mean that we should provide some flexibility +/- around the valid-after/valid-until times?

On a slightly different note, and possibly requiring a new ticket, is there a reason we don't try to learn the current clock skew based on the statuses we receive and connections we make? We determine our potential skew and possibly print out warnings, but then we throw that info away. We're always left assuming our clock is correct and not knowing how it compares to the rest of the network. If we kept an average/"probable" skew based on what we've seen, we may be able to handle a malicious consensus more wisely. (and I just realized this directly relates to #2628)

Back to this issue, I think that if we either live in the past or the future based on arma's comments we can check if we've already been here/done this. When we know the clock skew we can assume that if every consensus we fetch must have a valid-after date that is monotonically increasing from the current one (and valid-after < fresh-until < valid-until, of course), then we should be able to trust them (assuming sigs, digests, etc check out). However, a side effect of this is that we will need to track the last time we successfully fetched and accepted a consensus and only refetch after (now + (fresh-until - valid-after)) until we decide we're living in the present (for whatever reason: receive a consensus that tells us we're actually correct or the clock is set correctly).

comment:7 Changed 6 years ago by nickm

In #4011 I pointed out that deciding we need a new consensus if "c->valid_after > now" freaks me out.

So, that's only part of the check. time_to_download_next_consensus gets updated when we have no live consensus, but we don't actually launch the request if it's failed recently:

    if (!download_status_is_ready(&consensus_dl_status[i], now,
                                  CONSENSUS_NETWORKSTATUS_MAX_DL_TRIES))
      continue; /* We failed downloading a consensus too recently. */

I've looked around at the rest of the code to see whether we reset consensus_dl_status[i] too aggressively, and I can't see anything offhand, though some of the callers of routerlist_retry_directory_downloads() and router_reset_descriptor_download_failures() {which reset it indirectly} seem a little sketchy to me.

Of particular note is the once-an-hour router_reset_descriptor_download_failures() call in main.c. This isn't the right answer here, I think: the download_dl_status logic is already supposed to make failing things get retried periodically. Resetting it once an hour will make it use the early, aggressive retry logic once an hour. (!) But that could be another ticket.

comment:8 Changed 6 years ago by nickm

Just made a tor that can become confused about its time in branch "timewarp." (It's not a general solution for computers whose clocks are set wrong, since those really want a distinction between the system time and the network time. It looks at a #define in compat.c to see what time offset to use. It is not expected to *work* -- the interesting bit is about how it breaks.)

It doesn't appear that we go crazy downloading like mad. I only ran it for a few minutes in either case, but it behaved consistently with download_status_t schedules when I put it 72 hours in the past or 72 hours in the future.

comment:9 Changed 6 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final

Deferring to 0.2.5 since according to my experiment, a huge skew doesn't seem to actually break Tor in the way we're worried about. More experiments welcome of course.

comment:10 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

The tentative answer from above is "no", so this isn't a fix in 0.2.5 thing.

comment:11 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:12 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:13 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:14 Changed 2 years ago by nickm

Resolution: worksforme
Severity: Normal
Status: newclosed

Looks like we tested this and found out "no", so closing.

Note: See TracTickets for help on using tickets.