Opened 3 years ago

Last modified 2 years ago

#20535 new defect

A 304 "Not Modified" should update the time to when we next expect a modification

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: regression, tor-client tor-directory-protocol retry
Cc: Actual Points:
Parent ID: Points: 0.5
Reviewer: Sponsor:

Description

In #20499, we discovered that when a 304 "Not Modified" is received, relays try too hard when 09a0f2d0b is reverted, and don't try enough when it is applied.

Instead, we should retry when we next expect the document to be modified.
And if we don't know, we should retry on some sensible schedule.

Child Tickets

Change History (12)

comment:1 Changed 3 years ago by teor

Parent ID: #20499

comment:2 Changed 3 years ago by teor

This also affects #19969

comment:3 Changed 3 years ago by teor

Here's my analysis:

On bootstrap:

  • download_status_increment_attempt increments the schedule for each attempt
  • each attempt causes the delay to be increased exponentially (rather than using the actual hard-coded Bootstrap schedules)

After downloading the consensus:

  • download_status_increment_failure doesn't increase the schedule on 503 (server unavailable), even though it probably should, rather than retrying immediately
  • download_status_increment_failure increases the schedule exponentially on 304 (not modified), or perhaps doesn't increase the schedule at all (see #20499), even though it should probably only increase it up to the next time we expect the document to be modified (1 hour)
  • download_status_schedule_get_delay uses the schedule to increase the backoff, if the schedule isn't increased, the backoff isn't either (rather than using the actual hard-coded Bootstrap schedules)

comment:4 Changed 3 years ago by teor

We took schedules carefully tuned in 0.2.8 to make sure that it could survive 7 relay failures and still bootstrap in 30 seconds with 99.9% reliability, and implemented exponential backoff in 0.2.9 in a way that causes retries 5 times in 10 seconds in some cases, and in other cases retries twice in the first 30 seconds.

comment:5 Changed 3 years ago by teor

Milestone: Tor: 0.2.9.x-finalTor: 0.3.0.x-final

I don't think this is easy to fix, so it shouldn't go in 0.2.9.

There are far too many edge cases here - what happens when the client's clock is wrong, or if a relay lies (or is wrong) about the document not being modified?

comment:6 in reply to:  4 Changed 3 years ago by teor

Replying to teor:

After downloading the consensus:

  • download_status_increment_failure doesn't increase the schedule on 503 (server unavailable), even though it probably should, rather than retrying immediately

Part of my code review in #20499.

  • download_status_schedule_get_delay uses the schedule to increase the backoff, if the schedule isn't increased, the backoff isn't either (rather than using the actual hard-coded Bootstrap schedules)

Part of my code review in #20499.

Replying to teor:

We took schedules carefully tuned in 0.2.8 to make sure that it could survive 7 relay failures and still bootstrap in 30 seconds with 99.9% reliability, and implemented exponential backoff in 0.2.9 in a way that causes retries 5 times in 10 seconds in some cases, and in other cases retries twice in the first 30 seconds.

I'm going to suggest some schedule tweaks in #20534.

I can't see anything else in this bug that we would ever fix. Perhaps someone else can work out how to deal with 304s sensibly.

comment:7 Changed 3 years ago by teor

Parent ID: #20499

comment:8 Changed 3 years ago by dgoulet

Keywords: triage-out-030-201612 added
Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final

Triaged out on December 2016 from 030 to 031.

comment:9 Changed 3 years ago by teor

A bridge operator has reported an upload/download disparity that may result from an 0.2.9.8 bridge repeatedly trying to download a microdesc consensus even though it gets 304 statuses.
(The ns consensus is only downloaded occasionally.)

comment:10 Changed 2 years ago by nickm

Keywords: triaged-out-20170308 added
Milestone: Tor: 0.3.1.x-finalTor: unspecified

Deferring all 0.3.1 tickets with status == new, owner == nobody, sponsor == nobody, points > 0.5, and priority < high.

I'd still take patches for most of these -- there's just nobody currently lined up to work on them in this timeframe.

comment:11 Changed 2 years ago by nickm

Keywords: triage-out-030-201612 removed

comment:12 Changed 2 years ago by nickm

Keywords: tor-client tor-directory-protocol retry added; triaged-out-20170308 removed
Note: See TracTickets for help on using tickets.