Opened 3 years ago

Closed 3 years ago

#20533 closed defect (fixed)

Each download request should only increment the failure count once

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: regression, CoreTorTeam201611, 029-proposed
Cc: Actual Points: 0.2
Parent ID: #20499 Points: 1
Reviewer: Sponsor:

Description

From #20499:

At startup (or reload) the relay fetches the microdesc consensus

1 minute later it tries to fetch it again (update_consensus_networkstatus_downloads() is called) and receives a 304 response as it hasn't been modified

download_status_increment_failure() gets called with a status_code of 304

update_consensus_networkstatus_downloads() gets called again, this time it stops at the call to connection_dir_count_by_purpose_and_resource() which returns 1 (equal to max_in_progress_conns)

download_status_increment_failure() gets called again, this time with a status_code of 0 (as a result each 304 response results in the fail count being increased by 2)

Child Tickets

Change History (10)

comment:1 Changed 3 years ago by teor

Parent ID: #20499

comment:2 Changed 3 years ago by teor

This also affects #19969

comment:3 Changed 3 years ago by nickm

I am trying to understand what you're saying in the description above. What causes the second call to download_status_increment_failure() ? Does this have something to do with parallel requests?

comment:4 in reply to:  3 Changed 3 years ago by teor

Replying to nickm:

I am trying to understand what you're saying in the description above. What causes the second call to download_status_increment_failure() ? Does this have something to do with parallel requests?

Not really.

If check_consensus_waiting_for_certs is called when we've been waiting longer than DELAY_WHILE_FETCHING_CERTS, it calls download_status_failed(dls, 0);.

This is the source of the second failure, and in most cases it should be counted as a failure - we successfully downloaded a consensus, but could not download enough certificates to validate it.

That said, we should probably fail check_consensus_waiting_for_certs if the consensus we're waiting for has expired, too. But this could wait for 0.3.0.

comment:5 Changed 3 years ago by teor

Hmm, I'm still thinking about this:

If the consensus was current when we received it, but has expired before we received all the certificates, and it's less than DELAY_WHILE_FETCHING_CERTS since we received it, then I'm not sure if we should still call download_status_failed(). Otherwise, a mirror (one of our directory guards, or our bridge) could feed us a series of almost-stale consensuses, and cause us increment our failure count exponentially. But this only works if we ask for consensuses within 20 minutes of the hour, or if our receipt of the full consensus can be delayed using a slow-delivery attack.

If we fail because the consensus is not modified, we shouldn't even get this far, we should instead wait before calling update_consensus_networkstatus_downloads().

And if the mirror feeds us a consensus that has already expired, we should call the mirror bad, ignore the consensus, and try another one.

comment:6 Changed 3 years ago by teor

Actual Points: 0.2
Keywords: CoreTorTeam201611 029-proposed added
Version: Tor: unspecified

Please see my github branch bug20533_029 for a fix for this issue.

It does two things:

  • fail a consensus if it expires while we are waiting for certificates for it
  • only count a certificate download failure as a consensus schedule failure if we actually had a chance to download certificates (currently 1 minute after requesting certificates)

The risk in this patch is that we keep trying for certificates, but this is unlikely, given that we still stop waiting for certificates for the consensus.

Fix on commit e0204f21 in 0.2.0.9-alpha.

comment:7 in reply to:  6 Changed 3 years ago by teor

Status: newneeds_review

Replying to teor:

...

  • fail a consensus if it expires while we are waiting for certificates for it

Hmm, should we allow some leeway here? Particularly if the client clock is off?

OLD_ROUTER_DESC_MAX_AGE (5 days) seems excessive, maybe REASONABLY_LIVE_TIME?

comment:8 Changed 3 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

OLD_ROUTER_DESC_MAX_AGE (5 days) seems excessive, maybe REASONABLY_LIVE_TIME?

Sure, that would be okay.

Merging your patch above.

comment:9 Changed 3 years ago by teor

Resolution: fixed
Status: closedreopened

Hmm, actually, for consistency with update_networkstatus_downloads, we can't wait any longer than valid_until. Otherwise we would start downloading a new consensus while waiting for certificates on the old one.

    if (! (c && c->valid_after <= now && now <= c->valid_until)) {
      /* No live consensus? Get one now!*/
      time_to_download_next_consensus[i] = now;
    }

comment:10 Changed 3 years ago by teor

Resolution: fixed
Status: reopenedclosed

So this is done.

Note: See TracTickets for help on using tickets.