#27080 closed defect (fixed)

bridges fail on Tor 0.3.4.1-alpha and later

Reported by: teor Owned by:
Priority: High Milestone: Tor: 0.3.4.x-final
Component: Core Tor/Tor Version: Tor: 0.3.4.1-alpha
Severity: Blocker Keywords: 034-must, 035-must, regression, tor-bridge 034-backport
Cc: dgoulet, intrigeri, rl1987 Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

When I run the chutney bridges-min and bridges+ipv6-min tests on my mac, they fail on every release from 0.3.4.1-alpha through to master (837f11a532). But they succeed on maint-0.3.3, maint-0.3.2, and maint-0.2.9.

Child Tickets

TicketStatusOwnerSummaryComponent
#27211closedteorAdd chutney-git-bisect.sh to Tor's scripts directoryCore Tor/Tor
#27236closedteorWhen checking usable exit descriptors, always check exit policiesCore Tor/Tor

Attachments (4)

27080_bisect.log (2.1 KB) - added by nickm 15 months ago.
git bisect log (nickm)
git_bisect.log (4.6 KB) - added by teor 14 months ago.
#27080 bisect log using chutney-git-bisect.sh
chutney-git-bisect.sh (842 bytes) - added by teor 14 months ago.
git bisect run script for chutney
chutney-git-bisect-27080.sh (867 bytes) - added by teor 14 months ago.
updated chutney bisect script

Download all attachments as: .zip

Change History (24)

comment:1 Changed 15 months ago by teor

To run chutney on master, you'll need the fix in #27067.

comment:2 Changed 15 months ago by nickm

Keywords: 034-backport added

Changed 15 months ago by nickm

Attachment: 27080_bisect.log added

git bisect log (nickm)

comment:3 Changed 15 months ago by nickm

Cc: dgoulet added
Priority: MediumHigh
Severity: NormalMajor

Somebody should find out whether this affects bridges and/or bridge clients on the real network, or whether it is a chutney-only bug. That will determine whether it is super-high-priority or just regular high-priority.

I've reproduced this on a Fedora system, and tried to bisect. My result was that the first bad commit is ed89bb32535fbf354b406a36f3176380a4e226bf" "Specialize the periodic events on a per-role basis." So I would guess that this does affect real users. I've attached my "git bisect" log, in case I messed up somewhere.

comment:4 Changed 15 months ago by teor

Status: newneeds_information

Thanks!

I had a quick look through that commit. The code changes shouldn't change tor's behaviour. I wonder if the bug is in this commit, but the behaviour actually starts happening in the commit after it (a4fcdc5dec).

In a4fcdc5dec, the following roles seem inconsistent:

+  CALLBACK(retry_dns, PERIODIC_EVENT_ROLE_ROUTER),
+  CALLBACK(check_dns_honesty, PERIODIC_EVENT_ROLE_RELAY),

It's probably unrelated, but I don't think clients write stats:

+  CALLBACK(write_stats_file, PERIODIC_EVENT_ROLE_ALL),

(Do onion services write stats?)

I'll do some testing over the next 2 days, and try to work out what's happening.

comment:5 Changed 14 months ago by intrigeri

FWIW the Tails automated integration tests that exercise regular bridges and obfs4 pass just fine on an ISO that has tor installed from the deb.tpo tor-nightly-0.3.4.x-stretch APT suite. We use a rather old version of Chutney and not the real network. The host (that runs Chutney) has tor 0.2.9.15-1.

comment:6 Changed 14 months ago by intrigeri

Cc: intrigeri added

comment:7 in reply to:  5 ; Changed 14 months ago by teor

Replying to intrigeri:

FWIW the Tails automated integration tests that exercise regular bridges and obfs4 pass just fine on an ISO that has tor installed from the deb.tpo tor-nightly-0.3.4.x-stretch APT suite. We use a rather old version of Chutney and not the real network. The host (that runs Chutney) has tor 0.2.9.15-1.

I'm not sure which Tor versions pass your bridge tests. Are you running Tor 0.3.4 or Tor 0.2.9 in chutney?

comment:8 in reply to:  7 Changed 14 months ago by intrigeri

Replying to teor:

Replying to intrigeri:

FWIW the Tails automated integration tests that exercise regular bridges and obfs4 pass just fine on an ISO that has tor installed from the deb.tpo tor-nightly-0.3.4.x-stretch APT suite. We use a rather old version of Chutney and not the real network. The host (that runs Chutney) has tor 0.2.9.15-1.

I'm not sure which Tor versions pass your bridge tests. Are you running Tor 0.3.4 or Tor 0.2.9 in chutney?

The client (in the Tails system under test, that's running in a VM and exercises bridges and obfs4 usage) runs tor 0.3.4.6-rc-dev-20180815T162312Z-1~d90.stretch+1. The host, that runs the Chutney network, has tor 0.2.9.15-1.

Changed 14 months ago by teor

Attachment: git_bisect.log added

#27080 bisect log using chutney-git-bisect.sh

Changed 14 months ago by teor

Attachment: chutney-git-bisect.sh added

git bisect run script for chutney

comment:9 Changed 14 months ago by teor

I get the following result from git bisect:

1f739e9b06974566685c662c3526384efd68ed32 is the first bad commit
commit 1f739e9b06974566685c662c3526384efd68ed32
Author: David Goulet <dgoulet@torproject.org>
Date:   Wed May 2 13:42:24 2018 -0400

    dirauth: Move authdir_mode_v3() to module
    
    This function must return false if the module is not compiled in. In order to
    do that, we move the authdir_mode_v3() function out of router.c and into the
    dirauth module new header file named mode.h.
    
    It is always returning false if we don't have the module.
    
    Closes #25990
    
    Signed-off-by: David Goulet <dgoulet@torproject.org>

I used the attached script to generate the attached log.

comment:10 Changed 14 months ago by teor

Hmm, using a slightly different method I get:

488e2b00bf881b97bcc8e4bbe304845ff1d79a03 is the first bad commit
commit 488e2b00bf881b97bcc8e4bbe304845ff1d79a03
Author: Nick Mathewson <nickm@torproject.org>
Date:   Tue Apr 17 11:39:16 2018 -0400

    Refactor the "block the connection on bandwidth" logic
    
    Right now, this patch just introduces and exposes some new
    functions. Later, these functions will get a little more complexity.

Time to look at the tor logs, I think.

comment:11 Changed 14 months ago by nickm

Severity: MajorBlocker

comment:12 Changed 14 months ago by nickm

I found one issue, which might or might not be a complete explanation here. This is from a client in the bridges-min network:

Aug 20 13:35:57.787 [notice] No Tor server allows exit to 127.0.0.1:4747. Rejecting.

Looking at the cached-microdescs in the authority's directory, I saw that there is one with a correct policy line p accept 1-65535 -- corresponding to the network's exit, 002r. But the client did not have that microdescriptor cached: it had an older one one.

Digging deeper, it appears that the client is receiving new conensuses, but it is not re-launching microdescriptor downloads fast enough. The router_have_minimum_dir_info() function is returning 1, so directory_info_has_arrived() is not launching any new descriptor downloads here.

comment:13 Changed 14 months ago by nickm

Ah. And the reason that router_have_minimum_dir_info() is returning 1 here is twofold: It thinks that we have 44% of paths available (which seems suspect), and we have set PathsNeededToBuildCircuits 0.25 in the chutney torrc.

As an experiment, I tried increasing PathsNeededToBuildCircuits to .75 in common.i. With that change, the bridge-min chutney test passes on master.

Now, here's the question: what did we change in 0.3.4 that broke this?

And is it truly affecting real bridges?

comment:14 Changed 14 months ago by teor

This code in router_have_minimum_dir_info() is part of the problem, and we should fix it:

/* if the consensus has no exits, treat the exit fraction as 100% */
if (router_have_consensus_path() != CONSENSUS_PATH_EXIT) {
  f_exit = 1.0;
}

The code is intended to allow onion service connections in Tor networks with no exits.

But if Tor actually needs exit circuits, it should keep trying to download new descriptors, in case any of them have better exit info.

And if we want to use router_have_minimum_dir_info() to delay downloading descriptors, we need to check the exit policies in those descriptors, not just the flags.

I also opened a child ticket to remove the chutney PathsNeeded option.

comment:15 in reply to:  14 Changed 14 months ago by teor

Replying to teor:

This code in router_have_minimum_dir_info() is part of the problem, and we should fix it:

/* if the consensus has no exits, treat the exit fraction as 100% */
if (router_have_consensus_path() != CONSENSUS_PATH_EXIT) {
  f_exit = 1.0;
}

The code is intended to allow onion service connections in Tor networks with no exits.

Hang on, this code should never trigger in the public network, because its consensus always has exits.
Since it's using consensus flags, I don't think we need to change it.

But if Tor actually needs exit circuits, it should keep trying to download new descriptors, in case any of them have better exit info.

And if we want to use router_have_minimum_dir_info() to delay downloading descriptors, we need to check the exit policies in those descriptors, not just the flags.

I think this might be one of the bugs that we're looking for.

comment:16 Changed 14 months ago by teor

I did a manual git bisect using a script which I'll attach.

> 50 good
> 25 bad
> 37 bad
> 43 good
> 40 skip
> 40 bbf0b92b1c bad
> 42 fdc01cb40e skip
> 42 d8509b450a bad - dgoulet/ticket25610_034_01-squashed
> 42 b0224bf728 skip - dgoulet/ticket25610_034_01-squashed~2 (before merge)
> 42 b0224bf728 + cherry pick bbf0b92b1c good - dgoulet/ticket25610_034_01-squashed~2 (before merge) + fix assertion failure
> 42 bbf0b92b1c bad - dgoulet/ticket25610_034_01 + fix assertion failure

So I think that dgoulet/ticket25610_034_01 revealed the existing bugs, but our bisects couldn't find it due to the assertion failures around that merge.

I might try bisecting dgoulet/ticket25610_034_01 tomorrow to find the problematic commit.

I added some child tickets for various specific bugs. I'll also try fixing some of the child ticket bugs to see if they are causing the issue.

Changed 14 months ago by teor

Attachment: chutney-git-bisect-27080.sh added

updated chutney bisect script

comment:17 Changed 14 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

comment:18 Changed 14 months ago by rl1987

Cc: rl1987 added

comment:19 Changed 14 months ago by teor

We can close this ticket once all the children close.

comment:20 Changed 14 months ago by nickm

Resolution: fixed
Status: needs_informationclosed

Children are closed; calling this done.

Note: See TracTickets for help on using tickets.