Opened 23 months ago

Closed 8 weeks ago

#17750 closed enhancement (fixed)

Make bootstrapping clients wait before trying an authority

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version: Tor:
Severity: Normal Keywords: tor-bootstrap
Cc: Actual Points: 0.3
Parent ID: Points: 2
Reviewer: Sponsor:


If this happens, 0 (or uninitialised data) is used instead of the zeroth schedule item. (This matters most for bridges, which have schedules that start at 60 rather than 0.)

I wonder if any of the existing Tor code does this?

We could set a flag in download_status_t to 1 in download_status_reset, then assert it's set in the low-level functions that take a download_status_t:

  • find_dl_schedule_and_len
  • download_status_increment_failure
  • (not download_status_reset, it sets the flag)
  • download_status_get_n_failures
  • download_status_get_next_attempt_at

Child Tickets

#22812defectclosedfind_dl_min_and_max_delay's DL_SCHED_DETERMINISTIC case is never used

Change History (20)

comment:1 Changed 21 months ago by nickm

Points: small/medium

comment:2 Changed 20 months ago by teor

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

No need for this in 028.

comment:3 Changed 16 months ago by nickm

Points: small/medium2

small/medium => 2.

comment:4 Changed 15 months ago by isabela

Keywords: isaremoved added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

comment:5 Changed 11 months ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:6 Changed 10 months ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:7 Changed 5 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:8 Changed 5 months ago by nickm

Keywords: isaremoved removed

comment:9 Changed 5 months ago by teor

Actual Points: 0.3
Keywords: tor-bootstrap added; easy review removed
Milestone: Tor: unspecifiedTor: 0.3.1.x-final
Status: newneeds_review
Summary: A download_status_t can be used before calling download_status_reset on itMake bootstrapping clients wait before trying an authority
Version: Tor:

This bug means that bootstrapping clients an authority and a fallback immediately, and then try another fallback 1-3 seconds later, and another 4-9 seconds after that. The intention was to try an authority after we'd tried the first 2-3 fallbacks.

(arma discovered this was happening in #22400.)

Please see my branch bug17750_029 for a general fix for this.
It could go all the way back to 0.2.9 if we wanted it to, but we should definitely test it in master first.

This bug could only ever have affected ClientBootstrapConsensusAuthorityDownloadSchedule and TestingBridgeDownloadSchedule, because every other schedule starts with 0 (the default).

And TestingBridgeDownloadSchedule is already initialised correctly.

I opened #22403 as a follow up for those cases where we directly access a download_status_t's fields rather than using an accessor function.

comment:10 Changed 5 months ago by arma

I'm not so good with this code, but I can confirm that the patch seems to have the behavior that teor describes. (The code used to launch two consensus fetches in parallel, one to an authority and one to a guard or fallback. Now it launches only one, not to an authority, and 1 second later if needed it launches another, also not to an authority. I didn't try triggering stuff after that, but hey, what could go wrong. :)

Agreed it is smart to put this patch into 0.3.1. In fact, we might not need/want to backport it before that -- since it's the sort of thing where it will be a long time until we realize our mistake, if there is one.

comment:11 Changed 5 months ago by nickm

Keywords: 030-backport 029-backport added
Milestone: Tor: 0.3.1.x-finalTor: 0.3.0.x-final

merged to master; marking for possible backport.

comment:12 Changed 5 months ago by nickm

Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final
Status: needs_reviewneeds_revision

I spoke too soon; this patch makes the unit tests fail with:

  FAIL src/test/test_dir.c:3809: assert(mock_get_options_calls == 0)
  [download_status_increment FAILED]

Not merging yet.

comment:13 Changed 4 months ago by nickm

Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final

comment:14 Changed 4 months ago by teor

Keywords: 031-backport added
Status: needs_revisionneeds_review

Please see my branch bug17750_029, which also adds some regression tests for this bug and #20534.

I marked this for 0.3.1 backport. If we do backport, we only need to backport the squashed commit "Make clients try fallbacks before authorities".

And we might not want to backport at all, given the risk of bugs in older versions. The only thing this bug causes is more load on the directory authorities.

comment:15 Changed 4 months ago by nickm

Keywords: review-group-20 added

Creating review-group-20

comment:16 Changed 4 months ago by nickm

Owner: set to teor
Status: needs_reviewassigned

setting owner

comment:17 Changed 4 months ago by nickm

Status: assignedneeds_review

comment:18 Changed 4 months ago by nickm

Keywords: review-group-20 removed
Milestone: Tor: 0.3.2.x-finalTor: 0.3.1.x-final
Status: needs_reviewmerge_ready

Squashed as "bug17750_029_squashed" and merged to master. Let's let it cook there for a little while before we decide about a backport; it's a bit large for stable or post-freeze.

comment:19 Changed 3 months ago by nickm

I have added another fix to the bug17750_029_squashed branch to fix a unit test failure, as #22924. I've re-merged the branch to master. This branch is still cooking in master, with a possibility of an eventual backport.

comment:20 Changed 8 weeks ago by teor

Keywords: 031-backport 030-backport 029-backport removed
Milestone: Tor: 0.3.1.x-finalTor: 0.3.2.x-final
Resolution: fixed
Status: merge_readyclosed

We can't backport this: it is too high-risk. (It caused #23347.)

Note: See TracTickets for help on using tickets.