Opened 8 weeks ago

Last modified 5 weeks ago

#32165 needs_revision defect

On first boot, Tor mistakenly tells me "The current consensus has no exit nodes"

Reported by: arma Owned by:
Priority: Medium Milestone: Tor: 0.4.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: bootstrap, BugSmashFund, 042-backport-maybe, 041-backport-maybe, 040-backport-maybe, 035-backport-maybe
Cc: Actual Points:
Parent ID: Points: 1
Reviewer: ahf Sponsor:

Description

Starting up 0.4.3.0-alpha-dev (git-71daad1692bc3f24) without any cached-* files in my DataDirectory, I get:

Oct 20 04:44:56.026 [notice] Bootstrapped 30% (loading_status): Loading networkstatus consensus
Oct 20 04:44:56.636 [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
Oct 20 04:44:56.758 [notice] Bootstrapped 40% (loading_keys): Loading authority key certs
Oct 20 04:44:56.936 [notice] The current consensus has no exit nodes. Tor can only build internal paths, such as paths to onion services.
Oct 20 04:44:56.936 [notice] Bootstrapped 45% (requesting_descriptors): Asking for relay descriptors
Oct 20 04:44:56.936 [notice] I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/5841, and can only build 0% of likely paths. (We have 0% of guards bw, 0% of midpoint bw, and 0% of end bw (no exits in consensus, using mid) = 0% of path bw.)
Oct 20 04:44:57.337 [notice] Bootstrapped 50% (loading_descriptors): Loading relay descriptors
Oct 20 04:44:57.592 [notice] The current consensus contains exit nodes. Tor can build exit and internal paths.
Oct 20 04:44:58.178 [notice] Bootstrapped 58% (loading_descriptors): Loading relay descriptors

It's that "The current consensus has no exit nodes." line that is out of place.

Child Tickets

TicketStatusOwnerSummaryComponent
#27448closedrouter_have_consensus_path() logs an incorrect "no exits in consensus"Core Tor/Tor

Change History (14)

comment:1 Changed 8 weeks ago by arma

It looks like compute_frac_paths_available() prints that line as a log_notice whenever it is called and count_usable_descriptors() returns "np" (number present) of 0.

Specifically, count_usable_descriptors() counts up how many of the exit relays in the consensus are "present" and how many are "usable". At the beginning, when we've gotten the consensus but no microdescriptors, many of the exit relays are "usable" but none of them are "present" yet. But! The logic inside count_usable_descriptors() uses the opposite meaning: it says to itself that many of the exit relays are "present" in the consensus but none of them are "usable" by this Tor yet (because we don't have a microdescriptor for them yet).

So the simple fix is that it needs to check if nu > 0, not np.

And the broader fix is that maybe we need better words for these two notions.

comment:2 Changed 8 weeks ago by arma

(I noticed this log line in Tor Browser's new "View Logs..." feature, which means other people will be noticing it and wondering what's up too.)

comment:3 Changed 8 weeks ago by arma

Status: newneeds_review

Bug went into Tor 0.2.6.2-alpha in commit 9b2d106.

My branch bug32165 resolves this bug.

teor would be a good reviewer, because #27236 and because #13814.

I based my branch on maint-0.4.2 since I don't think there's a need to go earlier than that -- it is just a "misleading log" problem so far as I can tell.

comment:4 Changed 8 weeks ago by arma

I added a second commit because there was a nearby log_debug() line that was printing the wrong parameters.

comment:5 Changed 8 weeks ago by arma

For renaming them, I would propose "present" -> "ready", and "usable" -> "listed". If we like these better names, I or somebody could go through and make a new commit changing them.

comment:6 Changed 8 weeks ago by dgoulet

Keywords: bootstrap added
Milestone: Tor: 0.4.2.x-final
Reviewer: ahf

comment:7 Changed 8 weeks ago by ahf

Status: needs_reviewneeds_revision

Needs revision for same reason as #32108

comment:8 Changed 8 weeks ago by arma

comment:9 Changed 8 weeks ago by teor

We've switched this check a few times, and it breaks different things each time.
So we need to be very careful about this change.
I'm not sure if it is correct.
Please check the history of changes to this line.

We also need to change variable names, log messages, and documentation, see:
https://trac.torproject.org/projects/tor/ticket/27448#comment:3

comment:10 in reply to:  9 ; Changed 8 weeks ago by arma

Replying to teor:

We've switched this check a few times, and it breaks different things each time.
So we need to be very careful about this change.
I'm not sure if it is correct.
Please check the history of changes to this line.

Ok, there have been three changes to this line:

The original commit, 9b2d106e4, which documented np and nu clearly (yay) but got their meanings reversed (oops).

Nick's fix in #14918, which went in as commit 8eb3d81e6: it clarified the function comment in count_usable_descriptors (yay), fixed np to nu as it should, but left the (still wrong) comments explaining np and nu in place (oops).

teor's update for #27236, which went in as commit 3ebbc1c84, which reintroduced the np vs nu bug.

I just checked num_present vs num_usable again, and I am confident that the fix in 94cb4f8 (which was the same fix as 8eb3d81e6) is right: num_usable is how many are in the consensus that we would use, and num_present is how many of those we actually have descriptors for right now. So on first boot, when we have no descriptors present, we must check nu, not np, or we will wrongly conclude that we have a consensus with no exits.

Now, #27236 speaks of internal-only onion service networks and chutney. But I don't see how commit 3ebbc1c8 does anything about those -- it essentially just reverts Nick's fix. I'm guessing it has something to do with the nearby commit 588c7767. It looks like that commit is trying to make sure that at least one relay in the consensus can exit to at least one port. Makes sense -- but it appears to accidentally also be checking that *we have a local descriptor* for such a relay. Can you help me understand what exactly we need to check for, in the "internal-only onion service networks and chutney" case? We shouldn't merge this new patch until we understand the chutney requirements better.

We also need to change variable names, log messages, and documentation, see:
https://trac.torproject.org/projects/tor/ticket/27448#comment:3

Yes, I agree. I'll give this a go if you like my proposed name changes above: "present" -> "ready" and "usable" -> "listed". (I figure going through and doing it for the wrong names isn't a good use of anybody's time.)

Thanks!

comment:11 in reply to:  10 Changed 8 weeks ago by teor

Replying to arma:

Replying to teor:

We've switched this check a few times, and it breaks different things each time.
So we need to be very careful about this change.
I'm not sure if it is correct.
Please check the history of changes to this line.

Ok, there have been three changes to this line:

The original commit, 9b2d106e4, which documented np and nu clearly (yay) but got their meanings reversed (oops).

Nick's fix in #14918, which went in as commit 8eb3d81e6: it clarified the function comment in count_usable_descriptors (yay), fixed np to nu as it should, but left the (still wrong) comments explaining np and nu in place (oops).

teor's update for #27236, which went in as commit 3ebbc1c84, which reintroduced the np vs nu bug.

I just checked num_present vs num_usable again, and I am confident that the fix in 94cb4f8 (which was the same fix as 8eb3d81e6) is right: num_usable is how many are in the consensus that we would use, and num_present is how many of those we actually have descriptors for right now. So on first boot, when we have no descriptors present, we must check nu, not np, or we will wrongly conclude that we have a consensus with no exits.

It's not just first boot, it's also "if tor is launched after most descriptors have expired".

Whatever change you make, we need all the chutney tests in CI to pass. And we also need to avoid breaking Tor Browser.

I think there were some weird complexities here last time. But let's have a go :-)

Now, #27236 speaks of internal-only onion service networks and chutney. But I don't see how commit 3ebbc1c8 does anything about those -- it essentially just reverts Nick's fix. I'm guessing it has something to do with the nearby commit 588c7767. It looks like that commit is trying to make sure that at least one relay in the consensus can exit to at least one port. Makes sense -- but it appears to accidentally also be checking that *we have a local descriptor* for such a relay. Can you help me understand what exactly we need to check for, in the "internal-only onion service networks and chutney" case? We shouldn't merge this new patch until we understand the chutney requirements better.

Chutney runs some networks without exits, to make sure we are testing onion service circuits. (And to speed up the tests, by removing redundant relays.) Chutney doesn't care much about the difference between the exit flag and exit policy, because those networks don't have any of either.

However, some tor code does care about the difference. Because it's trying to predict if any relays with exit policies will appear, once it downloads descriptors. Maybe it shouldn't care so much. Maybe it needs to care just enough to tell the user what is going on.

We also need to change variable names, log messages, and documentation, see:
https://trac.torproject.org/projects/tor/ticket/27448#comment:3

Yes, I agree. I'll give this a go if you like my proposed name changes above: "present" -> "ready" and "usable" -> "listed". (I figure going through and doing it for the wrong names isn't a good use of anybody's time.)

We want to avoid confusion. So let's call the variables what they *are*, not how we are using them. So we should end up using at least two names like:

  • n_exit_flag
  • n_exit_policy
  • n_exit_flag_and_policy

(I can't easily work out what present, ready, usable, or listed mean. And that's part of the problem here.)

comment:12 Changed 8 weeks ago by teor

Copying information from #27448, an older duplicate ticket:

router_have_consensus_path() logs an incorrect "no exits in consensus"

Instead, it should distinguish between:

  • exit flags, but no (usable) exit descriptors, and
  • no exit flags

We need to fix "The current consensus has no exit nodes" and "no exits in consensus, using mid".

If there are Exit flags in the consensus, we should say: "waiting for Exit descriptors"
If not, we should say: "no Exits in consensus"

comment:13 Changed 8 weeks ago by teor

One of your PRs fails chutney with:

ping6 ::1 and ping ::1 failed, skipping IPv6 flavors: bridges+ipv6-min ipv6-exit-min hs-v23-ipv6-md single-onion-v23-ipv6-md.
tor-stable not found, skipping mixed flavors: mixed+hs-v2.
SKIP: bridges+ipv6-min
SKIP: ipv6-exit-min
SKIP: hs-v23-ipv6-md
SKIP: single-onion-v23-ipv6-md
SKIP: mixed+hs-v2
PASS: basic-min
PASS: bridges-min
FAIL: hs-v2-min
FAIL: hs-v3-min
FAIL: single-onion-v23
Log and result files are available in ./test_network_log.
make: *** [test-network-all] Error 1
The command "if [[ "$CHUTNEY" != "" ]]; then make test-network-all; fi" exited with 2.

https://travis-ci.org/torproject/tor/jobs/601052393#L3680

Here are the logs:
https://travis-ci.org/torproject/tor/jobs/601052393#L4732

The issues seem to be:

test005c: (50, 'loading_descriptors', 'Loading relay descriptors')
test006h: (50, 'loading_descriptors', 'Loading relay descriptors')
Tor bootstrap failed, ignoring for now.
Launching chutney using Python 2.7.6
NOTE: Registering send-data-1
NOTE: Registering check-2
Verifying data transmission: (retrying for up to 60 seconds)
Connecting:
  HS to fjqckbajebsxujm6.onion:5858 (127.0.0.1:4747) via client localhost:9005
Transmitting Data:
NOTE: Status:
{'send-data-1': 'connected, sending socks handshake'}

comment:14 Changed 5 weeks ago by teor

Keywords: BugSmashFund 042-backport-maybe 041-backport-maybe 040-backport-maybe 035-backport-maybe added
Milestone: Tor: 0.4.2.x-finalTor: 0.4.3.x-final
Points: 1

This ticket is unlikely to make 0.4.2-rc, but we can backport it.

Note: See TracTickets for help on using tickets.