Opened 10 months ago

Closed 2 months ago

#28027 closed defect (fixed)

Tor keeps opening circuits while waiting for bridge descriptors

Reported by: dgoulet Owned by: neel
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.3.0.1-alpha
Severity: Normal Keywords: tor-hs, regression, tor-guard, 034-backport, 035-backport, postfreeze-ok, 040-deferred-20190220, 033-backport-unreached, reviewer-was-teor-20190422
Cc: neel, catalyst Actual Points:
Parent ID: #29875 Points:
Reviewer: Sponsor:

Description

From someone in #tor-dev (Tovok7), they migrated an HS from 029 to 034 and some feature broke.

Tor, configured with an HS, starts fine, bootstraps and all is good.

Then, through the control port, setconf UseBridge=1 Bridge=... created these logs:

2018-10-12 16:18:54.440 I/TorPlugin: Enabling network, using bridges
2018-10-12 16:18:54.456 I/TorPlugin: NOTICE Switching to guard context "bridges" (was using "default") 
2018-10-12 16:18:54.608 I/TorPlugin: NOTICE Delaying directory fetches: No running bridges 
2018-10-12 16:18:55.031 I/TorPlugin: WARN Error launching circuit to node [scrubbed] for service [scrubbed]. 
2018-10-12 16:18:55.032 I/TorPlugin: WARN Error launching circuit to node [scrubbed] for service [scrubbed]. 
2018-10-12 16:18:55.616 I/TorPlugin: OR connection LAUNCHED $02069A3C5362476936B62BA6F5ACC41ABD573A9B
2018-10-12 16:18:55.616 I/TorPlugin: OR connection LAUNCHED $5A2D2F4158D0453E00C7C176978D3F41D69C45DB
2018-10-12 16:18:55.616 I/TorPlugin: OR connection LAUNCHED $B31A7DAD9AACEDDB9915A16617BB8F06BA429D6B
2018-10-12 16:18:55.642 I/TorPlugin: WARN Hidden service [scrubbed] exceeded launch limit with 13 intro points in the last 13 seconds. Intro circuit launches are limited to 10 per 300 seconds. 
2018-10-12 16:18:55.642 I/TorPlugin: WARN Service configured in [EPHEMERAL]: 
2018-10-12 16:18:55.642 I/TorPlugin: WARN   Intro point 0 at [scrubbed]: no circuit 
2018-10-12 16:18:55.643 I/TorPlugin: WARN   Intro point 1 at [scrubbed]: no circuit 
2018-10-12 16:18:55.643 I/TorPlugin: WARN   Intro point 2 at [scrubbed]: no circuit 
2018-10-12 16:18:55.643 I/TorPlugin: WARN   Intro point 3 at [scrubbed]: no circuit 
2018-10-12 16:18:55.643 I/TorPlugin: WARN   Intro point 4 at [scrubbed]: no circuit 
2018-10-12 16:18:56.652 I/TorPlugin: OR connection CONNECTED $02069A3C5362476936B62BA6F5ACC41ABD573A9B
2018-10-12 16:18:57.013 I/TorPlugin: NOTICE new bridge descriptor 'pointingRespighi' (fresh): [scrubbed] 
2018-10-12 16:18:57.014 I/TorPlugin: NOTICE Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/2 of our primary entry guards (total microdescriptors: 6457/6457). 
2018-10-12 16:18:57.080 I/TorPlugin: OR connection CONNECTED $5A2D2F4158D0453E00C7C176978D3F41D69C45DB
2018-10-12 16:18:57.184 I/TorPlugin: OR connection CONNECTED $B31A7DAD9AACEDDB9915A16617BB8F06BA429D6B
2018-10-12 16:18:57.586 I/TorPlugin: NOTICE new bridge descriptor 'AlliumGermanicus' (fresh): [scrubbed]
2018-10-12 16:18:57.641 I/TorPlugin: NOTICE Bridge 'nonLinearGeometry' has both an IPv4 and an IPv6 address.  Will prefer using its IPv4 address ([scrubbed]) based on the configured Bridge address. 
2018-10-12 16:18:57.641 I/TorPlugin: NOTICE new bridge descriptor 'nonLinearGeometry' (fresh): [scrubbed]
2018-10-12 16:18:57.641 I/TorPlugin: NOTICE We now have enough directory information to build circuits. 
2018-10-12 16:19:00.201 I/TorPlugin: NOTICE Tor has successfully opened a circuit. Looks like client functionality is working. 

It appears that the HS tried to open intro points even though tor didn't have bridge descriptors (guard state got switched).

The HS subsystem is safeguarded by this check (for circuit events):

  if (router_have_consensus_path() == CONSENSUS_PATH_UNKNOWN ||
      !have_completed_a_circuit()) {
    return;
  }

In other words, if we can't open circuits, tor will never proceed with HS service circuits.

The main theory, discussed with armadev, can be deduced with the three first log line:

2018-10-12 16:18:54.456 I/TorPlugin: NOTICE Switching to guard context "bridges" (was using "default") 
2018-10-12 16:18:54.608 I/TorPlugin: NOTICE Delaying directory fetches: No running bridges 
2018-10-12 16:18:55.031 I/TorPlugin: WARN Error launching circuit to node [scrubbed] for service [scrubbed].

First line: Guard context switched to bridges. All is good.

Second line: router_have_minimum_dir_info() is called from, actually wherever... It is used quite often in many places including our mainloop. The point is that within that function, we do look at should_delay_dir_fetches() which is the one creating that notice. However, because 1 was returned, we never went to update_router_have_minimum_dir_info() which would have mark that we can't complete circuits (with note_that_we_maybe_cant_complete_circuits()).

Third line: Circuit launch failure.

Once the guard context was switched, all circuits were marked as unusable (normal) so the HS service has to rebuild all its intro points but the have_completed_a_circuit() was still returning true.

Whatever is the cause, there is a clear issue that when we switch guard context, we should _always_ stop circuit creation until the guard state is usable.

Child Tickets

Change History (18)

comment:1 Changed 10 months ago by arma

For context, I think tovok7 is from briar.

comment:2 Changed 10 months ago by dgoulet

Keywords: tor-guard added

comment:3 Changed 9 months ago by neel

Cc: neel added
Owner: set to neel
Status: newassigned

comment:4 Changed 9 months ago by neel

For this patch, I am thinking about the following:

  • If UseBridges was originally set to 0, but then set to 1 later on, keep a flag in feature/client/entrynodes.c
  • Make should_delay_dir_fetches() check for this proposed flag if UseBridges is 1 to fetch the bridge descriptors, and if both are set, return 0
  • In options_act(), initialize the proposed flag to 0 if we don't have old_options
  • In options_act(), if we have old_options, and old_options->UseBridges and options->UseBridges differ, set this flag to 1

comment:5 Changed 9 months ago by catalyst

Cc: catalyst added

comment:6 Changed 9 months ago by dgoulet

Hmmmm I think the problem lies in the fact that "tor" thinks it can complete a circuit but in fact it can't because it has no running bridges as in it just switched to using one and is waiting for it to be usable.

That being said, I think the avenue to resolving this is rather to signal tor that we can NOT complete circuits when our Guard state just changed.

To do that, simply call note_that_we_maybe_cant_complete_circuits(). Quick look at this, probably it should be done around this in config.c:

    if (transition_affects_guards) {
      if (guards_update_all()) {
        abandon_circuits = 1;
      }
    }

The tricky part will be to confirm that tor does at some point realizes it can complete circuits once the bridge is usable and thus the HS subsystem can start building circuits.

comment:7 Changed 9 months ago by neel

Thanks for the clarification.

Also, if a bridge is usable, is it okay to call note_that_we_completed_a_circuit() in learned_bridge_descriptor() if we get a bridge descriptor?

comment:8 Changed 9 months ago by neel

I have a PR here: https://github.com/torproject/tor/pull/558

I do have one concern, since note_that_we_maybe_cant_complete_circuits() is called in learned_bridge_descriptor(), I am worried that my patch may have race conditions. This claim may be hypothetical as I did not experience any from my tests.

Also, this patch works from bridge to non-bridge, and non-bridge to bridge with a onion service.

comment:9 Changed 9 months ago by teor

Keywords: 033-backport 034-backport 035-backport added
Milestone: Tor: 0.3.5.x-finalTor: 0.4.0.x-final
Status: assignedneeds_review
Version: Tor: 0.3.4.8Tor: 0.3.0.1-alpha

comment:10 Changed 9 months ago by teor

Reviewer: teor
Status: needs_reviewneeds_revision

Do we need the extra call to note_that_we_completed_a_circuit()?

The existing call happens when tor completes a multi-hop circuit:
https://github.com/torproject/tor/blob/586c3a7c902dfb6fdf52390ecb2a08d6fceef77a/src/core/or/circuitbuild.c#L1055

So calling note_that_we_completed_a_circuit() when we make a one-hop circuit to a bridge might not be a good idea.

comment:11 Changed 9 months ago by neel

Status: needs_revisionneeds_review

Removed that line and pushed it. Same PR.

comment:12 in reply to:  8 Changed 9 months ago by teor

Status: needs_reviewneeds_information

Hi,

I just need to check that you've tested this case with the latest code:

The tricky part will be to confirm that tor does at some point realizes it can complete circuits once the bridge is usable and thus the HS subsystem can start building circuits.

Replying to neel:

Also, this patch works from bridge to non-bridge, and non-bridge to bridge with a onion service.

When you've tested this case, please put the ticket into merge_ready?

comment:13 Changed 7 months ago by nickm

Keywords: postfreeze-ok added

Mark some tickets as postfreeze-ok, to indicate that I think they are okay to accept in 0.4.0 post-freeze. Does not indicate that they are all necessary to do postfreeze.

comment:14 Changed 6 months ago by nickm

Keywords: 040-deferred-20190220 added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

Deferring 51 tickets from 0.4.0.x-final. Tagging them with 040-deferred-20190220 for visibility. These are the tickets that did not get 040-must, 040-can, or tor-ci.

comment:15 Changed 6 months ago by teor

Keywords: 033-backport removed

These open, non-merge_ready tickets can not get backported to 0.3.3, because 0.3.3 is now unsupported.

comment:16 Changed 6 months ago by teor

Keywords: 033-backport-unreached added

Hmm, I guess they should still get 033-backport-unreached

comment:17 Changed 4 months ago by teor

Keywords: reviewer-was-teor-20190422 added
Reviewer: teor

If these tickets go back in to needs_review, and I am on leave, they will need another reviewer.

comment:18 Changed 2 months ago by teor

Parent ID: #29875
Resolution: fixed
Status: needs_informationclosed

We think this is fixed by #29875 in Tor 0.4.1.2-alpha and later. Once the fix has been tested in a few alphas, we will backport it to 0.3.5 and 0.4.0.

Note: See TracTickets for help on using tickets.