Mike found out that when an IP/RP circuit fails to build in the right amount of time (for instance through circuit_expire_building()), it is re-purposed to become a measurement circuit.
The issue is that those HS circuits are set in the HS circuitmap and have an hs_ident or rend_data set to them that should really not linger in the circuit object if the circuit is not an HS one anymore.
Offenders: circuit_build_times_mark_circ_as_measurement_only() and pathbias_send_usable_probe().
Solution:
circuit_change_purpose() is probably the right place to make a callback within the HS subsystem specific to cleaning up a circuit for a purpose change. I think we need a new function that specifically does that and not use hs_circ_cleanup() since it won't remove the ident.
Lingering circuits in the HS circuitmap is bad and this bug could probably explain some of the issues we had with clients unable to establish connections because the IP auth key wouldn't match the one in the circuit ident.
I strongly believe this should be backported up to 0.3.5 at the very least.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Someone who is deeply familiar with both v2 and v3 state should review this. It's simple, and I think it's ok, but I don't know both v2 and v3 code bases well enough to really be sure.
I have a slight revision in https://github.com/torproject/tor/pull/969, because I opted for not calling the new functions hs_circ_free() since *_free() functions have a very specific meaning in Tor (they free and NULL their input). Instead I hijacked hs_circ_cleanup() which seamed a better target.
dgoulet: what do you think about what I did earlier in that case? (https://github.com/torproject/tor/pull/931/commits/bee7979e87d51735d2d5f746d69761d89d7dd928)
mikeperry: yeah something like that but it needs to call hs_circ_cleanup() for v3 only and the v2 stuff needs to ideally not be done there... so I think the place it needs to go is hs_common.c and then call the v2 or v3 specialized calls
Hrmm, I am not sure how to differentiate the v2 vs v3 calls+data. What is the convention for that?
11:27 <+dgoulet> asn: anyway, my main point about #29043 is that if the cleanup function we need needs to be v2 and v3 , we should do one high level in hs_common.h and then specialized in hs/ and rend/ ... but simply not use hs_circuit.c for v2
dgoulet - So far we have not experienced any obvious reliability issues due to this bug while testing #28634 (moved) (which uses #28780 (moved), which changes client-side INTRO and REND circ purposes to PADDING circs).
I think this means that the hs_circuitmap reliability issues do not apply to client-side intros?
My guess is it is because clients only seem to use this map for the 128bit rend cookie, and not any permanent mappings that could be invalid later. In which case, this is "just" a client-side memory leak of this state with #28780 (moved), and not a reliability problem?
OK let's get this merged. We still don't know if it helps against #28970 (moved) and #27471 (moved) but the unittests show that it will help with the circpad issues.