Opened 10 months ago

Closed 12 days ago

#21534 closed defect (fixed)

"Client asked me to extend back to the previous hop" in small networks

Reported by: teor Owned by: dgoulet
Priority: Very High Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: regression?, guard-selection, dirauth
Cc: Actual Points:
Parent ID: #21573 Points: 1
Reviewer: Sponsor:

Description

When chutney runs under tor's make test-network (bridges+hs flavour) on master, it sometimes logs a message:

Client asked me to extend back to the previous hop

This could be due to the new guard code, so I'm putting it in 0.3.0 until we know for sure. (I've not seen it in 0.2.9, but it's hard to know for sure, because it's intermittent.)

Child Tickets

Change History (12)

comment:1 Changed 10 months ago by nickm

If it's due to the guard code, the likeliest place is with respect to some problem in the entry_guard_restriction_t logic.

comment:2 Changed 10 months ago by teor

Parent ID: #21573

comment:3 Changed 10 months ago by nickm

Keywords: TorCoreTeam201703 added

comment:4 Changed 7 months ago by nickm

Milestone: Tor: 0.3.0.x-finalTor: 0.3.2.x-final

comment:5 Changed 7 months ago by nickm

Keywords: TorCoreTeam201703 removed

comment:6 Changed 4 weeks ago by dgoulet

Keywords: dirauth added
Priority: MediumVery High

I can still hit that on master. This seems to be caused only by authority who can't pick a node for a circuit and this shows up:

Nov 16 09:27:50.380 [info] router_choose_random_node(): We couldn't find any live, fast, stable, guard routers; falling back to list of all routers.

I've added more logging and when they need a Guard, no nodes are considered Guards by any of the authorities leading to a probability of picking the same nodes twice in the path selection because tor is falling back to all nodes. Below are logs I've added within nodes_is_unreliable() called by router_choose_random_node()->router_add_running_nodes_to_smartlist():

Nov 16 10:07:24.018 [warn] Node $B6813ACD5E30C9560CB8F3CAAE08EB1A9643FFE7~test002a at 127.0.0.1 is stable: 1, is fast: 1, is possible guard: 0. We needed: Uptime, Capacity, Guard
Nov 16 10:07:24.018 [warn] node $B6813ACD5E30C9560CB8F3CAAE08EB1A9643FFE7~test002a at 127.0.0.1 is unreliable
Nov 16 10:07:24.018 [warn] Node $24F57B943178DA7D1351F9566FDBD0620B2921CF~test000a at 127.0.0.1 is stable: 1, is fast: 1, is possible guard: 0. We needed: Uptime, Capacity, Guard
Nov 16 10:07:24.018 [warn] node $24F57B943178DA7D1351F9566FDBD0620B2921CF~test000a at 127.0.0.1 is unreliable
Nov 16 10:07:24.018 [warn] Node $A183E34C4F3465994DC8D69378A05F1B43141AF3~test003ba at 127.0.0.1 is stable: 1, is fast: 1, is possible guard: 0. We needed: Uptime, Capacity, Guard
Nov 16 10:07:24.018 [warn] node $A183E34C4F3465994DC8D69378A05F1B43141AF3~test003ba at 127.0.0.1 is unreliable
Nov 16 10:07:24.018 [warn] node $492A22ABAD8203EA2B6A10076F251AC50AB1EFE0~test001a at 127.0.0.1 (is_runnig: 0, is_valid: 1)
Nov 16 10:07:24.018 [warn] Node $01D04CBA14565AA7EFC4612F5B388B07802475AC~test004r at 127.0.0.1 is stable: 0, is fast: 0, is possible guard: 0. We needed: Uptime, Capacity, Guard
Nov 16 10:07:24.018 [warn] node $01D04CBA14565AA7EFC4612F5B388B07802475AC~test004r at 127.0.0.1 is unreliable
Nov 16 10:07:24.018 [warn] Node $70CAC7E209998F7B073F3C13950BDE2787231D18~test005r at 127.0.0.1 is stable: 0, is fast: 0, is possible guard: 0. We needed: Uptime, Capacity, Guard
Nov 16 10:07:24.018 [warn] node $70CAC7E209998F7B073F3C13950BDE2787231D18~test005r at 127.0.0.1 is unreliable
Nov 16 10:07:24.018 [warn] Node $2836767EF7120AF306B8A1AE87E364073334B247~test006r at 127.0.0.1 is stable: 0, is fast: 0, is possible guard: 0. We needed: Uptime, Capacity, Guard
Nov 16 10:07:24.018 [warn] node $2836767EF7120AF306B8A1AE87E364073334B247~test006r at 127.0.0.1 is unreliable
Nov 16 10:07:24.018 [warn] Node $D1657CB2A8D479F2D5D617819326C49FBB6D1133~test007r at 127.0.0.1 is stable: 0, is fast: 0, is possible guard: 0. We needed: Uptime, Capacity, Guard
Nov 16 10:07:24.018 [warn] node $D1657CB2A8D479F2D5D617819326C49FBB6D1133~test007r at 127.0.0.1 is unreliable

You can see that all is_possible_guard equals 0 when "Guard" was needed (or need_guard = 1). Which btw leads to this warning on the other relays when this happens:

Nov 16 10:19:16.664 [warn] connection_edge_process_relay_cell (away from origin) failed.
Nov 16 10:19:16.664 [warn] circuit_receive_relay_cell (forward) failed. Closing.

That is something I've been seeing quite a bit on my test relay recently on the real network. So I think this might affect authorities outside TestingNetwork.

comment:7 Changed 4 weeks ago by dgoulet

I appears that longclaw is seeing those warnings quite a bit. Info logs from October 25th shows that it happens a lot thus it is confirmed that this is on the real network as well.

Fortunately, things like reachability test from authorities don't require a Guard so this random circuit breakage wouldn't affect that at least thus failing to vote on Running for some relays.

comment:8 Changed 4 weeks ago by teor

Authorities do not use guards for anything.

And chutney uses TestingTorNetwork, which turns off the IPv4 /16 restriction, which normally stops us choosing a guard and middle that are the same relay. When it is turned off, we still need to avoid choosing exactly the same relay for two hops in the same circuit.

I'm not sure why this is happening on the live network and the test network. Because the /16 restriction should prevent any node from choosing the same relay twice in a path.

Does the /16 restriction work when we're using an IPv6 address to extend to the guard?

I think we should add an unconditional same-id restriction, and bug-log (and log the ip addresses and fingerprints in the path) when it's triggered when the /16 restriction is on. And we should check if IPv6 guards trigger it. And if anything else triggers it.

comment:9 in reply to:  8 Changed 4 weeks ago by dgoulet

Replying to teor:

Authorities do not use guards for anything.

Well they do for couple reason I can find with the logs. First, self reachability testing, it goes through a 3 path length circuit and thus requiring a Guard (consider_testing_reachability())

Second, client hidden service preemptive circuit (needs_hs_client_circuits()). This one happens quite a bit when CBT is learning (#24228).

So what I mean here is that we should definitely investigate why authorities (so far I can only see them hitting this issue) don't consider any nodes a Guard and makes them fallback to the entire routerset. Actually, it is a bit worst then that because of:

    flags &= ~ (CRN_NEED_UPTIME|CRN_NEED_CAPACITY|CRN_NEED_GUARD|
                CRN_PREF_ADDR);
    choice = router_choose_random_node(
                     excludedsmartlist, excludedset, flags);

BUT, as it turns out, this is definitely not the problem of this ticket....

comment:10 Changed 4 weeks ago by dgoulet

Owner: set to dgoulet
Status: newaccepted

Pending more information.

comment:11 Changed 4 weeks ago by teor

Authorities don't consider any nodes a guard, because they (effectively) set UseEntryGuards 0.
That's deliberate, so they can bootstrap in a network with no Guards.

They use a first hop on multi-hop circuits, but as you say, it's not required to have the Guard flag.

comment:12 Changed 12 days ago by dgoulet

Resolution: fixed
Status: acceptedclosed

Merged with TROVE-2017-012 in #24333.

Note: See TracTickets for help on using tickets.