In #20499 (moved), we discovered that when a 304 "Not Modified" is received, relays try too hard when 09a0f2d0b is reverted, and don't try enough when it is applied.
Instead, we should retry when we next expect the document to be modified.
And if we don't know, we should retry on some sensible schedule.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
download_status_increment_attempt increments the schedule for each attempt
each attempt causes the delay to be increased exponentially (rather than using the actual hard-coded Bootstrap schedules)
After downloading the consensus:
download_status_increment_failure doesn't increase the schedule on 503 (server unavailable), even though it probably should, rather than retrying immediately
download_status_increment_failure increases the schedule exponentially on 304 (not modified), or perhaps doesn't increase the schedule at all (see #20499 (moved)), even though it should probably only increase it up to the next time we expect the document to be modified (1 hour)
download_status_schedule_get_delay uses the schedule to increase the backoff, if the schedule isn't increased, the backoff isn't either (rather than using the actual hard-coded Bootstrap schedules)
We took schedules carefully tuned in 0.2.8 to make sure that it could survive 7 relay failures and still bootstrap in 30 seconds with 99.9% reliability, and implemented exponential backoff in 0.2.9 in a way that causes retries 5 times in 10 seconds in some cases, and in other cases retries twice in the first 30 seconds.
I don't think this is easy to fix, so it shouldn't go in 0.2.9.
There are far too many edge cases here - what happens when the client's clock is wrong, or if a relay lies (or is wrong) about the document not being modified?
Trac: Milestone: Tor: 0.2.9.x-final to Tor: 0.3.0.x-final
download_status_increment_failure doesn't increase the schedule on 503 (server unavailable), even though it probably should, rather than retrying immediately
download_status_schedule_get_delay uses the schedule to increase the backoff, if the schedule isn't increased, the backoff isn't either (rather than using the actual hard-coded Bootstrap schedules)
We took schedules carefully tuned in 0.2.8 to make sure that it could survive 7 relay failures and still bootstrap in 30 seconds with 99.9% reliability, and implemented exponential backoff in 0.2.9 in a way that causes retries 5 times in 10 seconds in some cases, and in other cases retries twice in the first 30 seconds.
I'm going to suggest some schedule tweaks in #20534 (moved).
I can't see anything else in this bug that we would ever fix. Perhaps someone else can work out how to deal with 304s sensibly.
A bridge operator has reported an upload/download disparity that may result from an 0.2.9.8 bridge repeatedly trying to download a microdesc consensus even though it gets 304 statuses.
(The ns consensus is only downloaded occasionally.)