Tor always selects an extra number of intro points in addition to the configured HiddenServiceNumIntroductionPoints.
It launches all of them and the first NumIntro... to finish are used (considering #31548 (moved) is resolved).
Once the circuit of the remaining intro opens, we notice that we have too many and then re-purpose the extra ones.
However, I've noticed that sometimes establishing an intro circuit timeouts during build, basically expiring due to our CBT policy. In that case, the circuit is simply close but the intro point remains in the service descriptor list.
This is bad because of #31548 (moved), this means an intro point can end up in the descriptor even though the service never established any circuits to it...
We simply need to callback into the HS subsystem when we are expiring an HS circuit so appropriate actions can be taken such as in this case, removing the IP from the list.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I'm concerned that this patch makes purpose == CIRCUIT_PURPOSE_HS_VANGUARDS count as both a client purpose and a service purpose. Won't that result in us sometimes calling hs_service_circuit_timed_out on a client circuit?
I'm concerned that this patch makes purpose == CIRCUIT_PURPOSE_HS_VANGUARDS count as both a client purpose and a service purpose. Won't that result in us sometimes calling hs_service_circuit_timed_out on a client circuit?
Hmm, there is indeed a bug here! Thanks for catching that, and sorry for missing that.
CIRCUIT_PURPOSE_HS_VANGUARDS is the purpose that Tor uses to pre-built vanguard circuits before they are used by the vanguard subsystem for specific HS-related purposes (they can be used for HS client or for HS service activities).
However, while the circuit is in this vanguards purpose, it shouldn't have any duties as an HS circuit (i.e. be an active intro circuit) and hence we shouldn't call hs_service_circuit_timed_out() on it.
Vanguards circuits are taken into account in the circuit_purpose_is_hidden_service() function because it is used by the vanguard subsystem to take decision about vanguards; but for our use case it's safe to ignore vanguard circuits.
So takeaways and plans for the patch:
a) The circuit_purpose_is_hidden_service() should remain as is (behavior-wise) because it's used (correctly) by the vanguard subsystem. But IMO the name is a bit misleading because IIUC vanguard circuits don't have active hidden service duties, but they can be repurposed to HS circs in the future. Perhaps the function should have a vanguard-centric name, instead of an onion service-centric name.
b) Our new function circuit_purpose_is_hs_service() should not take into account vanguard circuits since we only care about active onion service circs. Perhaps circuit_purpose_is_hidden_service() can use our new function to avoid code dedup.
c) Let's ask Mike if he agrees with the above.
Trac: Cc: N/Ato mikeperry Status: merge_ready to needs_revision
I'm concerned that this patch makes purpose == CIRCUIT_PURPOSE_HS_VANGUARDS count as both a client purpose and a service purpose. Won't that result in us sometimes calling hs_service_circuit_timed_out on a client circuit?
Hmm, there is indeed a bug here! Thanks for catching that, and sorry for missing that.
CIRCUIT_PURPOSE_HS_VANGUARDS is the purpose that Tor uses to pre-built vanguard circuits before they are used by the vanguard subsystem for specific HS-related purposes (they can be used for HS client or for HS service activities).
However, while the circuit is in this vanguards purpose, it shouldn't have any duties as an HS circuit (i.e. be an active intro circuit) and hence we shouldn't call hs_service_circuit_timed_out() on it.
This is a good reason to have vanguard purpose circuits count as neither client nor service HS circs. Maybe they should not even count as HS circs at all, for function sanity.
Vanguards circuits are taken into account in the circuit_purpose_is_hidden_service() function because it is used by the vanguard subsystem to take decision about vanguards; but for our use case it's safe to ignore vanguard circuits.
I think for every HS case it is safe to ignore vanguards purpose circuits. They are pre-built only and not used yet.. This might be a good reason to have every place where vanguards calls circuit_purpose_is_hidden_service() also check explicitly for its vanguard purpose...
So takeaways and plans for the patch:
a) The circuit_purpose_is_hidden_service() should remain as is (behavior-wise) because it's used (correctly) by the vanguard subsystem. But IMO the name is a bit misleading because IIUC vanguard circuits don't have active hidden service duties, but they can be repurposed to HS circs in the future. Perhaps the function should have a vanguard-centric name, instead of an onion service-centric name.
If we decide to leave this as including vanguards, they definitely should not count as hs client circuits (which the branch currently does). There should just be an extra check for them in this function.
b) Our new function circuit_purpose_is_hs_service() should not take into account vanguard circuits since we only care about active onion service circs. Perhaps circuit_purpose_is_hidden_service() can use our new function to avoid code dedup.
c) Let's ask Mike if he agrees with the above.
Additionally, circuits can time out in circuit_build_times_handle_completed_hop(), where they can get marked as measurement circuits via circuit_change_purpose(). However, the fix for #29034 (moved) cause them to get removed from the hs circuit map whenever we change purposes. Is this enough to keep them out of the HS descriptor's intropoint list as well? If so, then I guess that part is handled, at least.
A related question to comment:17: Why do we try hard to hold on to specific intropoints at all? If their circuit collapses, fails, or stops carrying data, why do we try to build a new circuit to the old intropoint as opposed to just picking a whole new intropoint?
As per discussion on IRC, Mike spotted that circuit_build_times_handle_completed_hop() also can timeout a circuit.
Which means, it would go through circuit_change_purpose() and the intro point would get retried instead of ditched.
In other words, we have two different behaviors with this patch with regards to the intro circuit:
Time out: intro point is ditched, a new one is picked later.
Closes/Re-purposed: intro point is set as "not established" and then will be retried later.
I am a bit confused by the above because circuit_build_times_handle_completed_hop() is also about "time out" and above you say that we retry instead of ditching, but (1) above says the opposite. What is the case?
Maybe you want to distinguish between timeout while building a circuit, and timeout on a completed circuit? Or maybe I'm a bit confuse.
The change in behavior with this ticket is (1) which makes the service ditch an intro point on timeout issues instead of retrying it later.
Hmm, I don't mind this behavior that much as long as we document it somewhere and we are OK with it. It does seem a bit arbitrary to me, because if we want to keep our intro points, then we should do it in this case too. Would it be hard to do?
A related question to comment:17: Why do we try hard to hold on to specific intropoints at all? If their circuit collapses, fails, or stops carrying data, why do we try to build a new circuit to the old intropoint as opposed to just picking a whole new intropoint?
After discussing on IRC with asn, we decided to put this on hold until we get a consensus on the retry vs ditch behavior.
I've just written a tor-dev@ thread that tries to summarize it all but it needs input from mikeperry on the "ditch argument". Until then, this ticket goes into dormancy.
#32020 (moved): Added the code to remove a circuit from the HS circuitmap on close, free and repurpose.
#32094 (moved): Made it that we only use the HS circuitmap when launching introduction points so the launch_intro_point_circuits() actually notices now that the IP that timed out has not more circuit and thus a retry is done up to MAX_INTRO_POINT_CIRCUIT_RETRIES = 3.
Combining both, this means that cleanup_intro_points() will remove the IP after 3 fail retries and thus clean it up properly the service list.
This is bad because of #31548 (moved), this means an intro point can end up in the descriptor even though the service never established any circuits to it...
This is not the case now since we use the HS circuitmap to learn if the circuit is established or not.
We are done here. Resolving as "fixed" even though this ticket resulted in no actual patches applied to master.
Trac: Status: needs_revision to closed Resolution: N/Ato fixed