Opened 8 months ago

Closed 7 months ago

Last modified 5 months ago

#23347 closed defect (fixed)

Using bridges or switching to bridges sometimes does not work with tor 0.3.2

Reported by: gk Owned by: teor
Priority: Very High Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-bootstrap, tor-bridge-client
Cc: teor Actual Points: 1
Parent ID: Points: 0.5
Reviewer: isis Sponsor:

Description

Trying to switch from a direct connection to using a bridge is not working anymore with tor on master (found while using commit f2f1cab2b3c6a56f93862c424663f083b79c7bc3) on my Linux box.

Steps to reproduce:

1) Take a recent Tor Browser (e.g. an alpha version)
2) Make sure you replace tor shipped in your Tor Browser instance with one compiled from a recent master commit
3) Start Tor Browser and choose direct connection
4) Shut Tor Browser down after you got greeted with the about:tor page
5) Restart Tor Browser and press the "Open Settings" button before the bootstrapping is finished
6) Select e.g. the recommended obfs4 default bridge option
7) The start-up stalls (I quit Tor Browser after 5 minutes waiting)

The first bad commit is c21cfd28f43a969229ede02e20c6b554c1b88aae which fixed #17750. Without that one Tor Browser resumes bootstrapping after a couple of seconds and is using an obfs4 bridge.

Child Tickets

Change History (23)

comment:1 Changed 8 months ago by gk

(I am a bit confused about the Milestone usage: #17750 has "Tor: 0.3.1.x-final" even though the code is not on maint-0.3.1, and reading the comments of this ticket, this might not even happen. I chose "Tor: 0.3.2.x-final" as the problematic code is only on master right now. If that's wrong please readjust.)

comment:2 Changed 8 months ago by teor

#17750 made download schedules use the specified initial delay, rather than 0.
The two schedules with a non-zero delay are the fallback authority schedule and the bridge schedule.

So my guess is that the bridge schedule's initial delay of 1 hour is wrong, and tor needs to download a bridge descriptor to bootstrap. It is possible that this bug was introduced in Tor 0.3.0 with the guard algorithm changes.

I'll work out how to reproduce this on the command-line, and submit a schedule patch.
This will probably involve re-thinking the entire schedule. (Because if we really need the bridge descriptor, we shouldn't wait 2 hours if we fail the first time.)

I have closed #17750, because we really shouldn't backport it.

comment:3 Changed 8 months ago by teor

To reproduce this, you can use:

src/or/tor DataDirectory `mktemp -d` UseBridges 1 Bridge ...

where ... is any bridge on any transport.
(I used a bridge from bridges.torproject.org, but any bridge or relay should work.)

I have confirmed that this works on 0.3.0.10, but fails on master with:

Bootstrapped 0%: Starting
Delaying directory fetches: No running bridges

comment:4 Changed 8 months ago by teor

Keywords: tor-bootstrap tor-bridge-client added
Points: 0.5

comment:5 Changed 8 months ago by teor

Actual Points: 0.5
Priority: MediumVery High
Summary: Switching from direct connection to a pluggable transport is not working anymore with tor on masterUsing bridges is not working anymore with tor on master

Please see my branch bug23347 on https://github.com/teor2345

It fixes this issue, makes bridge bootstraps more reliable by trying each bridge 3 times (rather than once), and brings the man page up to date with the latest schedules.

comment:6 Changed 8 months ago by teor

Status: newneeds_review

(As an aside, we didn't catch this with chutney, because the first scheduled download in chutney was 30s, and chutney waits 60s. Now the chutney schedules are consistent with each other, too.)

comment:7 Changed 8 months ago by teor

Owner: set to teor
Status: needs_reviewassigned

comment:8 Changed 8 months ago by teor

Status: assignedneeds_review

comment:9 Changed 8 months ago by teor

I pushed a fixup to this branch to fix a bug that arma identified on IRC: bridges reset their download schedule when they're successfully downloaded, and then want to do the next download after an hour. (Maybe the way I fixed it isn't the best design, but it does maintain the pre-#17750 behaviour.)

I also added a commit that refactors bridge downloads to use the "increment on attempt" functions. (They were using the "increment on failure" functions to increment on each attempt, and never incrementing on failure, which was confusing.)

comment:10 Changed 8 months ago by isis

Reviewer: isis

comment:11 Changed 8 months ago by isis

  • 435952538 LGTM
  • 61227b7b0 LGTM
  • ea662f00d LGTM
  • 836cc60d4 LGTM

FWIW, clang builds are broken, but that's not your fault, it appears to be from commit 6eb9de1b8c where there's now two typedef struct response_handler_args_ts in src/or/directory.h. I made #23358 for this.

comment:12 Changed 8 months ago by isis

Status: needs_reviewmerge_ready

Forgot to change state after review.

comment:13 Changed 8 months ago by nickm

I'm wondering whether we shouldn't extract the magic pair of calls to download_status_implement() and turn them into some other "adjust bridge download schedule" function? Or use two separate schedules?

comment:14 in reply to:  13 Changed 8 months ago by teor

Status: merge_readyneeds_revision

Replying to nickm:

I'm wondering whether we shouldn't extract the magic pair of calls to download_status_implement() and turn them into some other "adjust bridge download schedule" function? Or use two separate schedules?

I think we should use two separate schedules, like we do for bootstrapping consensuses and regular consensuses. This makes the different behavior explicit, rather than relying on magic numbers.

It also allows us to have more fine-grained control over how often we retry missing bridge descriptors.

We could use schedules that match the old behaviour:

TestingMissingBridgeDownloadSchedule 0, 1200, 900, 900, 3600
TestingBridgeDownloadSchedule 1200, 900, 900, 3600
/* And in a test network */
TestingMissingBridgeDownloadSchedule 0, 60, 30, 30, 60
TestingBridgeDownloadSchedule 60, 30, 30, 60

But we probably want something more like this:

/* If we can't get a bridge descriptor, backoff exponentially, just like authority consensus downloads */
TestingMissingBridgeDownloadSchedule 0, 3, 7, 3600, 10800, 25200, 54000, 111600, 262800
/* If the bridge keeps giving us a valid descriptor, it's ok to keep asking for one every 6 hours (this gives a bridge client 4 attempts per day to refresh each bridge descriptor) */
TestingBridgeDownloadSchedule 21600, 21600
/* And in a test network, match authority consensus downloads */
TestingMissingBridgeDownloadSchedule 0, 0, 5, 10, 15, 20, 30, 60
TestingBridgeDownloadSchedule 30, 30

Is there any reason that a bridge client with a valid bridge descriptor should re-download it every hour?

I'm happy to make this change, but it will probably be towards the end of the week. Feel free to grab this ticket if you want to do it before then.

comment:15 Changed 7 months ago by teor

Status: needs_revisionneeds_review

I updated the branch with a new design that:

  • runs quick checks when we don't know any bridge descriptors
  • when we get a (new/uncached) bridge descriptor, don't check that descriptor again for a few hours

This splits the standard bridge schedule into a Bridge and a BridgeBootstrap schedule.

comment:16 Changed 7 months ago by teor

Actual Points: 0.51

comment:17 Changed 7 months ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Nice! This looks much cleaner this way. I'm testing and merging.

comment:18 Changed 7 months ago by nickm

Resolution: fixed
Status: closedreopened

I ran into a crash bug here: if you have Bridge lines in your configuration, but UseBridges is 0, then the download_status_reset() in 1b5e34badb06bb1a844a6e70164fc5c894d95d0a will fail. I'm going to comment it out for now so that my Tor works. I commented it out in 63af663b8c83d771ed8fd29802e9a4c5cb074c70

comment:19 Changed 7 months ago by teor

Status: reopenedneeds_review

Ok, so there are a few issues here:

  • the reset is too early: it should only happen when we actually go to use bridges
  • there's no way to check if a bridge download status has already been reset

So it's ok to rely on the existing code to do this.

I added comments explaining this in bug23347-extra.

comment:20 Changed 7 months ago by teor

I added #23524 and #23525 to clean up some related issues.

comment:21 Changed 7 months ago by nickm

Resolution: fixed
Status: needs_reviewclosed

I've merged the patches for the child tickets, so I'm closing this one too. Thanks for the fast response!

comment:22 Changed 5 months ago by teor

Summary: Using bridges is not working anymore with tor on masterSwitching from direct to bridges is not working anymore with tor on master

comment:23 Changed 5 months ago by teor

Summary: Switching from direct to bridges is not working anymore with tor on masterUsing bridges or switching to bridges sometimes does not work with tor 0.3.2

Updating title due to feedback from gk on #24367.
This fix may have caused #24367.

Note: See TracTickets for help on using tickets.