Opened 6 months ago

Last modified 3 days ago

#29034 merge_ready defect

circuit: Cleanup an HS circuit when it is being re-purposed

Reported by: dgoulet Owned by: mikeperry
Priority: High Milestone: Tor: 0.4.0.x-final
Component: Core Tor/Tor Version: Tor: 0.3.2.1-alpha
Severity: Normal Keywords: tor-hs-reachability, 029-backport-maybe, 034-backport-maybe, 035-backport, 040-backport, postfreeze-ok, network-team-roadmap-july
Cc: asn, mikeperry Actual Points:
Parent ID: Points: 3
Reviewer: asn Sponsor: Sponsor27-must

Description

Mike found out that when an IP/RP circuit fails to build in the right amount of time (for instance through circuit_expire_building()), it is re-purposed to become a measurement circuit.

The issue is that those HS circuits are set in the HS circuitmap and have an hs_ident or rend_data set to them that should really not linger in the circuit object if the circuit is not an HS one anymore.

Offenders: circuit_build_times_mark_circ_as_measurement_only() and pathbias_send_usable_probe().

Solution:

circuit_change_purpose() is probably the right place to make a callback within the HS subsystem specific to cleaning up a circuit for a purpose change. I think we need a new function that specifically does that and not use hs_circ_cleanup() since it won't remove the ident.

Lingering circuits in the HS circuitmap is bad and this bug could probably explain some of the issues we had with clients unable to establish connections because the IP auth key wouldn't match the one in the circuit ident.

I strongly believe this should be backported up to 0.3.5 at the very least.

Child Tickets

Change History (46)

comment:1 Changed 6 months ago by nickm

Keywords: postfreeze-ok added

comment:2 Changed 6 months ago by mikeperry

Keywords: tor-hs-reachability added; tor-hs removed

comment:3 Changed 5 months ago by nickm

Keywords: 040-must added

Marking tickets as 040-must based on triage with dgoulet.

comment:4 Changed 5 months ago by dgoulet

Owner: set to dgoulet
Status: newassigned

comment:5 Changed 4 months ago by teor

Owner: changed from dgoulet to mikeperry

dgoulet is on leave, assigning to Mike, because asn already has 3 tickets in 040-must.

comment:6 Changed 4 months ago by teor

Owner: mikeperry deleted

Un-assigning 040-must bugs from Mike, because he's overloaded.
We'll work out what to do with these tickets at the meeting.

comment:7 Changed 4 months ago by asn

Points: 3
Sponsor: Sponsor27-must

comment:8 Changed 4 months ago by nickm

Keywords: 040-must removed

comment:9 Changed 4 months ago by teor

Keywords: 040-backport added

This is not a release blocker, we can do it later and backport

comment:10 Changed 4 months ago by teor

Milestone: Tor: 0.4.0.x-finalTor: 0.4.1.x-final

comment:11 Changed 4 months ago by teor

Keywords: 029-backport-maybe 034-backport-maybe added

comment:12 Changed 4 months ago by pili

Parent ID: #29995

comment:13 Changed 3 months ago by mikeperry

Status: assignedneeds_review

Ok I did the simplest narrow fix, based on what dgoulet said in the description:
https://github.com/torproject/tor/pull/931

Someone who is deeply familiar with both v2 and v3 state should review this. It's simple, and I think it's ok, but I don't know both v2 and v3 code bases well enough to really be sure.

comment:14 Changed 3 months ago by teor

Owner: set to mikeperry
Status: needs_reviewassigned
Version: Tor: 0.3.2.1-alpha

comment:15 Changed 3 months ago by teor

Status: assignedneeds_review

comment:16 Changed 3 months ago by asn

Reviewer: asn

comment:17 Changed 3 months ago by asn

Status: needs_reviewneeds_revision

LGTM!

I have a slight revision in https://github.com/torproject/tor/pull/969, because I opted for not calling the new functions hs_circ_free() since *_free() functions have a very specific meaning in Tor (they free and NULL their input). Instead I hijacked hs_circ_cleanup() which seamed a better target.

If you like, let's merge_ready.

comment:18 Changed 3 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

Add keyword for tickets in the network team roadmap.

comment:19 Changed 3 months ago by mikeperry

Status: needs_revisionmerge_ready

Ok, dgoulet said it should be a different function, but I figure you guys can hash that out. I'm fine with asn's https://github.com/torproject/tor/pull/969.

comment:20 Changed 3 months ago by asn

Cc: dgoulet added

Hey David, what needs to happen here?

comment:21 Changed 3 months ago by dgoulet

Status: merge_readyneeds_revision

Commented on PR. The approach is not correct I believe if we are serious about the v2 and v3 code separation.

comment:22 Changed 3 months ago by mikeperry

<mikeperry> dgoulet: what do you think about what I did earlier in that case? (https://github.com/torproject/tor/pull/931/commits/bee7979e87d51735d2d5f746d69761d89d7dd928)
<dgoulet> mikeperry: yeah something like that but it needs to call hs_circ_cleanup() for v3 only and the v2 stuff needs to ideally not be done there... so I think the place it needs to go is hs_common.c and then call the v2 or v3 specialized calls

Hrmm, I am not sure how to differentiate the v2 vs v3 calls+data. What is the convention for that?

comment:23 Changed 2 months ago by mikeperry

Parent ID: #29995#28634

This is needed for #28634+#28780 for 0.4.1.

comment:24 in reply to:  22 Changed 2 months ago by dgoulet

Replying to mikeperry:

Hrmm, I am not sure how to differentiate the v2 vs v3 calls+data. What is the convention for that?

Within the circuit, v2 has a rend_data and v3 has hs_ident. One circuit can _not_ have both. If none exist also, it is not an HS circuit.

comment:25 Changed 2 months ago by asn

further clarification by dgoulet:

11:27 <+dgoulet> asn: anyway, my main point about #29043 is that if the cleanup function we need needs to be v2 and v3 , we should do one high level in hs_common.h and             then specialized in hs/ and rend/ ... but simply not use hs_circuit.c for v2

comment:26 Changed 2 months ago by dgoulet

Owner: changed from mikeperry to dgoulet
Status: needs_revisionassigned

Taking over as part of s27 since we believe #28970 and #27471 will get fixed with this.

comment:27 Changed 2 months ago by nickm

Keywords: 041-should added

comment:28 Changed 2 months ago by mikeperry

dgoulet - So far we have not experienced any obvious reliability issues due to this bug while testing #28634 (which uses #28780, which changes client-side INTRO and REND circ purposes to PADDING circs).

I think this means that the hs_circuitmap reliability issues do not apply to client-side intros?

My guess is it is because clients only seem to use this map for the 128bit rend cookie, and not any permanent mappings that could be invalid later. In which case, this is "just" a client-side memory leak of this state with #28780, and not a reliability problem?

Last edited 2 months ago by mikeperry (previous) (diff)

comment:29 Changed 8 weeks ago by dgoulet

Status: assignedneeds_review

New version. It is the same from the initial branches but with a v2/v3 separation:

Branch: ticket29034_041_01
PR: https://github.com/torproject/tor/pull/1048

comment:30 Changed 8 weeks ago by asn

Status: needs_reviewmerge_ready

LGTM!

comment:31 Changed 8 weeks ago by asn

Should this get backported due to its significance to bugs #28970 and #27471 ?
This way we learn faster if those bugs get fixed?

comment:32 Changed 8 weeks ago by dgoulet

Ah yes I think so. Here is an 035 branch:

ticket29034_035_01
https://github.com/torproject/tor/pull/1053

comment:33 Changed 7 weeks ago by teor

Reminder: please merge the 0.3.5 branch into master, so the backport merge forward is clean.

comment:34 Changed 7 weeks ago by dgoulet

Cc: dgoulet removed
Status: merge_readyneeds_revision

Need unit tests here per nickm request.

comment:35 Changed 7 weeks ago by dgoulet

Status: needs_revisionneeds_review

Unit test pushed. CI is running on the 035 branch only.

comment:36 Changed 7 weeks ago by asn

Unittest LGTM, but has this been tested on the real network as well, to see if it works as intended? IMO, it should be tested before we merge.

comment:37 Changed 7 weeks ago by asn

Status: needs_reviewmerge_ready

OK let's get this merged. We still don't know if it helps against #28970 and #27471 but the unittests show that it will help with the circpad issues.

comment:38 Changed 7 weeks ago by nickm

Keywords: 041-should removed
Milestone: Tor: 0.4.1.x-finalTor: 0.4.0.x-final

Okay, merged! Marking for possible backport, but before we do so, let's confirm that backporting would actually help something.

comment:39 Changed 7 weeks ago by nickm

Parent ID: #28634

comment:40 Changed 6 weeks ago by nickm

We should not backport this; instead we should backport c525135dac354892a45ad3d2f6de9450d393f09f combined with the changes file for this branch.

(Edited to add: See #30773 for rationale)

Last edited 6 weeks ago by nickm (previous) (diff)

comment:41 Changed 6 weeks ago by nickm

Owner: changed from dgoulet to mikeperry
Status: merge_readyassigned

comment:42 Changed 6 weeks ago by nickm

Status: assignedneeds_revision

comment:43 Changed 6 weeks ago by mikeperry

Status: needs_revisionneeds_review

Here is the new backport branch for 0.3.5+: https://github.com/torproject/tor/pull/1077

comment:44 Changed 6 weeks ago by nickm

Status: needs_reviewmerge_ready

comment:45 Changed 6 weeks ago by nickm

Removing 034-backport from all open tickets: 034 has reached EOL.

comment:46 Changed 3 days ago by gaba

Keywords: network-team-roadmap-july added; network-team-roadmap-2019-Q1Q2 removed
Note: See TracTickets for help on using tickets.