Using bridges or switching to bridges sometimes does not work with tor 0.3.2

changed milestone to %Tor: 0.3.2.x-final

added actualpoints::1 component::core tor/tor milestone::Tor: 0.3.2.x-final owner::teor points::0.5 priority::very high resolution::fixed reviewer::isis severity::normal status::closed tor-bootstrap tor-bridge-client type::defect labels

(I am a bit confused about the Milestone usage: #17750 (moved) has "Tor: 0.3.1.x-final" even though the code is not on maint-0.3.1, and reading the comments of this ticket, this might not even happen. I chose "Tor: 0.3.2.x-final" as the problematic code is only on master right now. If that's wrong please readjust.)

#17750 (moved) made download schedules use the specified initial delay, rather than 0. The two schedules with a non-zero delay are the fallback authority schedule and the bridge schedule.

So my guess is that the bridge schedule's initial delay of 1 hour is wrong, and tor needs to download a bridge descriptor to bootstrap. It is possible that this bug was introduced in Tor 0.3.0 with the guard algorithm changes.

I'll work out how to reproduce this on the command-line, and submit a schedule patch. This will probably involve re-thinking the entire schedule. (Because if we really need the bridge descriptor, we shouldn't wait 2 hours if we fail the first time.)

I have closed #17750 (moved), because we really shouldn't backport it.

To reproduce this, you can use:

src/or/tor DataDirectory `mktemp -d` UseBridges 1 Bridge ...

where ... is any bridge on any transport. (I used a bridge from bridges.torproject.org, but any bridge or relay should work.)

I have confirmed that this works on 0.3.0.10, but fails on master with:

Bootstrapped 0%: Starting
Delaying directory fetches: No running bridges

Trac:
Points: N/A to 0.5
Keywords: N/A deleted, tor-bootstrap, tor-bridge-client added

Please see my branch bug23347 on https://github.com/teor2345

It fixes this issue, makes bridge bootstraps more reliable by trying each bridge 3 times (rather than once), and brings the man page up to date with the latest schedules.

Trac:
Summary: Switching from direct connection to a pluggable transport is not working anymore with tor on master to Using bridges is not working anymore with tor on master
Priority: Medium to Very High
Actualpoints: N/A to 0.5

(As an aside, we didn't catch this with chutney, because the first scheduled download in chutney was 30s, and chutney waits 60s. Now the chutney schedules are consistent with each other, too.)

Trac:
Status: new to needs_review

Trac:
Status: needs_review to assigned
Owner: N/A to teor

Trac:
Status: assigned to needs_review

I pushed a fixup to this branch to fix a bug that arma identified on IRC: bridges reset their download schedule when they're successfully downloaded, and then want to do the next download after an hour. (Maybe the way I fixed it isn't the best design, but it does maintain the pre-#17750 behaviour.)

I also added a commit that refactors bridge downloads to use the "increment on attempt" functions. (They were using the "increment on failure" functions to increment on each attempt, and never incrementing on failure, which was confusing.)

Trac:
Reviewer: N/A to isis

435952538 LGTM
61227b7b0 LGTM
ea662f00d LGTM
836cc60d4 LGTM

FWIW, clang builds are broken, but that's not your fault, it appears to be from commit 6eb9de1b8c where there's now two typedef struct response_handler_args_ts in src/or/directory.h. I made #23358 (moved) for this.

Forgot to change state after review.

Trac:
Status: needs_review to merge_ready

I'm wondering whether we shouldn't extract the magic pair of calls to download_status_implement() and turn them into some other "adjust bridge download schedule" function? Or use two separate schedules?

Replying to nickm:

I'm wondering whether we shouldn't extract the magic pair of calls to download_status_implement() and turn them into some other "adjust bridge download schedule" function? Or use two separate schedules?

I think we should use two separate schedules, like we do for bootstrapping consensuses and regular consensuses. This makes the different behavior explicit, rather than relying on magic numbers.

It also allows us to have more fine-grained control over how often we retry missing bridge descriptors.

We could use schedules that match the old behaviour:

TestingMissingBridgeDownloadSchedule 0, 1200, 900, 900, 3600
TestingBridgeDownloadSchedule 1200, 900, 900, 3600
/* And in a test network */
TestingMissingBridgeDownloadSchedule 0, 60, 30, 30, 60
TestingBridgeDownloadSchedule 60, 30, 30, 60

But we probably want something more like this:

/* If we can't get a bridge descriptor, backoff exponentially, just like authority consensus downloads */
TestingMissingBridgeDownloadSchedule 0, 3, 7, 3600, 10800, 25200, 54000, 111600, 262800
/* If the bridge keeps giving us a valid descriptor, it's ok to keep asking for one every 6 hours (this gives a bridge client 4 attempts per day to refresh each bridge descriptor) */
TestingBridgeDownloadSchedule 21600, 21600
/* And in a test network, match authority consensus downloads */
TestingMissingBridgeDownloadSchedule 0, 0, 5, 10, 15, 20, 30, 60
TestingBridgeDownloadSchedule 30, 30

Is there any reason that a bridge client with a valid bridge descriptor should re-download it every hour?

I'm happy to make this change, but it will probably be towards the end of the week. Feel free to grab this ticket if you want to do it before then.

Trac:
Status: merge_ready to needs_revision

I updated the branch with a new design that:

runs quick checks when we don't know any bridge descriptors
when we get a (new/uncached) bridge descriptor, don't check that descriptor again for a few hours

This splits the standard bridge schedule into a Bridge and a BridgeBootstrap schedule.

Trac:
Status: needs_revision to needs_review

Trac:
Actualpoints: 0.5 to 1

Nice! This looks much cleaner this way. I'm testing and merging.

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

I ran into a crash bug here: if you have Bridge lines in your configuration, but UseBridges is 0, then the download_status_reset() in 1b5e34badb06bb1a844a6e70164fc5c894d95d0a will fail. I'm going to comment it out for now so that my Tor works. I commented it out in 63af663b8c83d771ed8fd29802e9a4c5cb074c70

Trac:
Status: closed to reopened
Resolution: fixed to N/A

Ok, so there are a few issues here:

the reset is too early: it should only happen when we actually go to use bridges
there's no way to check if a bridge download status has already been reset

So it's ok to rely on the existing code to do this.

I added comments explaining this in bug23347-extra.

Trac:
Status: reopened to needs_review

I added #23524 (moved) and #23525 (moved) to clean up some related issues.

I've merged the patches for the child tickets, so I'm closing this one too. Thanks for the fast response!

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

Trac:
Summary: Using bridges is not working anymore with tor on master to Switching from direct to bridges is not working anymore with tor on master

Updating title due to feedback from gk on #24367 (moved). This fix may have caused #24367 (moved).

Trac:
Summary: Switching from direct to bridges is not working anymore with tor on master to Using bridges or switching to bridges sometimes does not work with tor 0.3.2