Opened 3 months ago

Last modified 2 weeks ago

#31561 needs_information defect

hs-v3: Service can keep unused intro points in its list

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs, hv-v3
Cc: mikeperry Actual Points: 0.1
Parent ID: #30200 Points: 0.2
Reviewer: asn, mikeperry Sponsor: Sponsor27-must

Description

Tor always selects an extra number of intro points in addition to the configured HiddenServiceNumIntroductionPoints.

It launches all of them and the first NumIntro... to finish are used (considering #31548 is resolved).

Once the circuit of the remaining intro opens, we notice that we have too many and then re-purpose the extra ones.

However, I've noticed that sometimes establishing an intro circuit timeouts during build, basically expiring due to our CBT policy. In that case, the circuit is simply close but the intro point remains in the service descriptor list.

This is bad because of #31548, this means an intro point can end up in the descriptor even though the service never established any circuits to it...

We simply need to callback into the HS subsystem when we are expiring an HS circuit so appropriate actions can be taken such as in this case, removing the IP from the list.

Child Tickets

Change History (21)

comment:1 Changed 3 months ago by dgoulet

Actual Points: 0.1

PR: https://github.com/torproject/tor/pull/1275
Branch: ticket31561_042_01

No backport since #31548 mitigates this issue.

comment:2 Changed 3 months ago by dgoulet

Status: assignedneeds_review

comment:3 Changed 2 months ago by asn

Status: needs_reviewneeds_revision

Looks reasonable, but I left a few questions and suggestions in the PR!

comment:4 Changed 2 months ago by dgoulet

Status: needs_revisionneeds_review

Back in needs_review. Answered the PR. One bit I did not change, let me know if you agree or not.

comment:5 Changed 2 months ago by asn

Status: needs_reviewneeds_revision

A few more corrections are needed I believe.

Also, would it be possible to write a small unittest for hs_service_circuit_timed_out() just to make sure that the codeflow works properly?

comment:6 Changed 2 months ago by dgoulet

Status: needs_revisionneeds_review

All fixed. Pushed unit test in commit e55235f72f. Sorry about that, I had one but it was stuck in the stash :(.

comment:7 Changed 2 months ago by asn

Status: needs_reviewneeds_revision

Seems like the circ timeout unittest failed in clang. I also added a minor issue as a comment.
After this we are all good for merge.

comment:8 Changed 2 months ago by nickm

Keywords: 042-should added

Mark some needs_revision tickets as 042-should

comment:9 Changed 6 weeks ago by dgoulet

Status: needs_revisionneeds_review

comment:10 Changed 5 weeks ago by asn

Status: needs_reviewneeds_revision

Great! Just one small discussion item remaining about the vanguard thing. Marking as needs_revision so that you reply/address the topic.

comment:11 Changed 5 weeks ago by dgoulet

Status: needs_revisionneeds_review

Fixed.

comment:12 Changed 5 weeks ago by asn

Status: needs_reviewmerge_ready

LGTM! Thanks!

comment:13 Changed 5 weeks ago by nickm

I'm concerned that this patch makes purpose == CIRCUIT_PURPOSE_HS_VANGUARDS count as both a client purpose and a service purpose. Won't that result in us sometimes calling hs_service_circuit_timed_out on a client circuit?

comment:14 in reply to:  13 ; Changed 5 weeks ago by asn

Cc: mikeperry added
Status: merge_readyneeds_revision

Replying to nickm:

I'm concerned that this patch makes purpose == CIRCUIT_PURPOSE_HS_VANGUARDS count as both a client purpose and a service purpose. Won't that result in us sometimes calling hs_service_circuit_timed_out on a client circuit?

Hmm, there is indeed a bug here! Thanks for catching that, and sorry for missing that.

CIRCUIT_PURPOSE_HS_VANGUARDS is the purpose that Tor uses to pre-built vanguard circuits before they are used by the vanguard subsystem for specific HS-related purposes (they can be used for HS client or for HS service activities).

However, while the circuit is in this vanguards purpose, it shouldn't have any duties as an HS circuit (i.e. be an active intro circuit) and hence we shouldn't call hs_service_circuit_timed_out() on it.

Vanguards circuits are taken into account in the circuit_purpose_is_hidden_service() function because it is used by the vanguard subsystem to take decision about vanguards; but for our use case it's safe to ignore vanguard circuits.

So takeaways and plans for the patch:
a) The circuit_purpose_is_hidden_service() should remain as is (behavior-wise) because it's used (correctly) by the vanguard subsystem. But IMO the name is a bit misleading because IIUC vanguard circuits don't have active hidden service duties, but they can be repurposed to HS circs in the future. Perhaps the function should have a vanguard-centric name, instead of an onion service-centric name.
b) Our new function circuit_purpose_is_hs_service() should not take into account vanguard circuits since we only care about active onion service circs. Perhaps circuit_purpose_is_hidden_service() can use our new function to avoid code dedup.
c) Let's ask Mike if he agrees with the above.

Last edited 5 weeks ago by asn (previous) (diff)

comment:15 in reply to:  14 Changed 5 weeks ago by mikeperry

Replying to asn:

Replying to nickm:

I'm concerned that this patch makes purpose == CIRCUIT_PURPOSE_HS_VANGUARDS count as both a client purpose and a service purpose. Won't that result in us sometimes calling hs_service_circuit_timed_out on a client circuit?

Hmm, there is indeed a bug here! Thanks for catching that, and sorry for missing that.

CIRCUIT_PURPOSE_HS_VANGUARDS is the purpose that Tor uses to pre-built vanguard circuits before they are used by the vanguard subsystem for specific HS-related purposes (they can be used for HS client or for HS service activities).

However, while the circuit is in this vanguards purpose, it shouldn't have any duties as an HS circuit (i.e. be an active intro circuit) and hence we shouldn't call hs_service_circuit_timed_out() on it.

This is a good reason to have vanguard purpose circuits count as neither client *nor* service HS circs. Maybe they should not even count as HS circs at all, for function sanity.

Vanguards circuits are taken into account in the circuit_purpose_is_hidden_service() function because it is used by the vanguard subsystem to take decision about vanguards; but for our use case it's safe to ignore vanguard circuits.

I think for every HS case it is safe to ignore vanguards purpose circuits. They are pre-built only and not used yet.. This might be a good reason to have every place where vanguards calls circuit_purpose_is_hidden_service() *also* check explicitly for its vanguard purpose...

So takeaways and plans for the patch:
a) The circuit_purpose_is_hidden_service() should remain as is (behavior-wise) because it's used (correctly) by the vanguard subsystem. But IMO the name is a bit misleading because IIUC vanguard circuits don't have active hidden service duties, but they can be repurposed to HS circs in the future. Perhaps the function should have a vanguard-centric name, instead of an onion service-centric name.

If we decide to leave this as including vanguards, they definitely should not count as hs client circuits (which the branch currently does). There should just be an extra check for them in this function.

b) Our new function circuit_purpose_is_hs_service() should not take into account vanguard circuits since we only care about active onion service circs. Perhaps circuit_purpose_is_hidden_service() can use our new function to avoid code dedup.
c) Let's ask Mike if he agrees with the above.

Additionally, circuits can time out in circuit_build_times_handle_completed_hop(), where they can get marked as measurement circuits via circuit_change_purpose(). However, the fix for #29034 cause them to get removed from the hs circuit map whenever we change purposes. Is this enough to keep them out of the HS descriptor's intropoint list as well? If so, then I guess that part is handled, at least.

comment:16 Changed 4 weeks ago by dgoulet

Reviewer: asnasn, mikeperry
Status: needs_revisionneeds_review

Thanks Mike!!!

I'm passing this review on to you since asn is away and you have a good grasp on this problem.

I basically removed the HS_VANGUARD purpose from the new functions and put it back in the is_hidden_service() like it is today.

New PR with branch rebased/squashed on latest 042: https://github.com/torproject/tor/pull/1411
Branch: ticket31561_042_02

Last edited 4 weeks ago by dgoulet (previous) (diff)

comment:17 Changed 4 weeks ago by dgoulet

Status: needs_reviewneeds_information

As per discussion on IRC, Mike spotted that circuit_build_times_handle_completed_hop() also can timeout a circuit.

Which means, it would go through circuit_change_purpose() and the intro point would get retried instead of ditched.

In other words, we have two different behaviors with this patch with regards to the intro circuit:

  1. Time out: intro point is ditched, a new one is picked later.
  2. Closes/Re-purposed: intro point is set as "not established" and then will be retried later.

The change in behavior with this ticket is (1) which makes the service ditch an intro point on timeout issues instead of retrying it later.

I'll delay this one until asn comes back for more discussion.

comment:18 Changed 4 weeks ago by mikeperry

A related question to comment:17: Why do we try hard to hold on to specific intropoints at all? If their circuit collapses, fails, or stops carrying data, why do we try to build a new circuit to the old intropoint as opposed to just picking a whole new intropoint?

comment:19 in reply to:  17 Changed 2 weeks ago by asn

Replying to dgoulet:

As per discussion on IRC, Mike spotted that circuit_build_times_handle_completed_hop() also can timeout a circuit.

Which means, it would go through circuit_change_purpose() and the intro point would get retried instead of ditched.

In other words, we have two different behaviors with this patch with regards to the intro circuit:

  1. Time out: intro point is ditched, a new one is picked later.
  2. Closes/Re-purposed: intro point is set as "not established" and then will be retried later.

I am a bit confused by the above because circuit_build_times_handle_completed_hop() is also about "time out" and above you say that we retry instead of ditching, but (1) above says the opposite. What is the case?

Maybe you want to distinguish between timeout while building a circuit, and timeout on a completed circuit? Or maybe I'm a bit confuse.

The change in behavior with this ticket is (1) which makes the service ditch an intro point on timeout issues instead of retrying it later.

Hmm, I don't mind this behavior that much as long as we document it somewhere and we are OK with it. It does seem a bit arbitrary to me, because if we want to keep our intro points, then we should do it in this case too. Would it be hard to do?

comment:20 in reply to:  18 Changed 2 weeks ago by asn

Replying to mikeperry:

A related question to comment:17: Why do we try hard to hold on to specific intropoints at all? If their circuit collapses, fails, or stops carrying data, why do we try to build a new circuit to the old intropoint as opposed to just picking a whole new intropoint?

Some history here at #8239.

comment:21 Changed 2 weeks ago by dgoulet

Keywords: 042-should removed
Milestone: Tor: 0.4.2.x-finalTor: unspecified

After discussing on IRC with asn, we decided to put this on hold until we get a consensus on the retry vs ditch behavior.

I've just written a tor-dev@ thread that tries to summarize it all but it needs input from mikeperry on the "ditch argument". Until then, this ticket goes into dormancy.

https://lists.torproject.org/pipermail/tor-dev/2019-October/014073.html

Note: See TracTickets for help on using tickets.