Opened 5 months ago

Closed 3 months ago

Last modified 6 weeks ago

#32020 closed defect (fixed)

hsv3: Client do not report failing circuit back into HS subsystem

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.4.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs, tor-client
Cc: Actual Points: 1
Parent ID: #30200 Points: 1
Reviewer: asn Sponsor: Sponsor27-must

Description

This is a subtask of the bigger larger problem in #25882.

A v2 client does report intro point failures within circuit_about_to_free() but not v3.

Actually, any HS circuit client side is not looked at. The hs_circ_cleanup() was intended for this as the entry point in the HS subsystem but only the service uses it.

Intro circuit failure needs to be noted in the failure cache (hs_cache_client_intro_state_note()).

Rendezvous circuit need to be also somehow handled. If the RP circuit keeps closing on us, we might want to stop trying maybe?

Same goes for HSDir circuit, if they close, client needs to be notified and launch a refetch.

Child Tickets

Change History (15)

comment:1 Changed 5 months ago by neel

In the origin_circuit_t struct, is struct hs_ident_circuit_t *hs_ident; to version 3 what rend_data_t *rend_data; is to version 2?

comment:2 Changed 4 months ago by dgoulet

Owner: set to dgoulet
Status: newassigned

comment:3 Changed 4 months ago by dgoulet

Related is #26806 which mentions that possibly because the HSv3 client is not noticing the introduction timeout (as in the ACK never came back), we resend onto that same intro point. Good or bad?

comment:4 Changed 4 months ago by dgoulet

Actual Points: 1
Cc: asn removed
Reviewer: asn
Status: assignedneeds_review

Branch: ticket32020_043_01
PR: https://github.com/torproject/tor/pull/1490

@asn: The initial commits are mostly moving and refactoring code on how we handle the circuit cleanup. It is not that simple so let me know if the approach is sensible. But overall, there needs to be nuanced cleanup between close/free/repurpose.

Also, out of this work, I found #32349 in v2.

comment:5 in reply to:  3 ; Changed 4 months ago by asn

Replying to dgoulet:

Related is #26806 which mentions that possibly because the HSv3 client is not noticing the introduction timeout (as in the ACK never came back), we resend onto that same intro point. Good or bad?

Hmm, questions and answers:

1) Why doesn't the ACK or NACK come to the client? Is it because the intro point never sent it (why?)? Or because we timeout before receiving it? Or just general Tor network SNAFU?

2) If the above happens, why would the client decide to resend on the same intro point and same circuit? Is this an explicit decision?

3) Regarding "Good or bad?" I would say it's bad-ish because if the NACK never came back, I would prefer to retry a different intro point since that one might be suffering networking issues, or being overloaded, or downright maliciously DoSing the service.

PS: #26806 mentions "rendezvous circuits" in the title, but I think it should be intro circuits

Last edited 4 months ago by asn (previous) (diff)

comment:6 in reply to:  5 Changed 4 months ago by dgoulet

Replying to asn:

Replying to dgoulet:

Related is #26806 which mentions that possibly because the HSv3 client is not noticing the introduction timeout (as in the ACK never came back), we resend onto that same intro point. Good or bad?

Hmm, questions and answers:

1) Why doesn't the ACK or NACK come to the client? Is it because the intro point never sent it (why?)? Or because we timeout before receiving it? Or just general Tor network SNAFU?

SNAFU is probably the answer. Circuit collapsing, timing out, etc...

2) If the above happens, why would the client decide to resend on the same intro point and same circuit? Is this an explicit decision?

It doesn't in theory. Depending on the SNAFU (see patch I did), we either flag the intro point in the failure cache (see patch I did) or we go on with our lives maybe retrying a new one.

3) Regarding "Good or bad?" I would say it's bad-ish because if the NACK never came back, I would prefer to retry a different intro point since that one might be suffering networking issues, or being overloaded, or downright maliciously DoSing the service.

Yes, in theory, that is what is suppose to happen. The patch I did would fix this that is note down the intro point in the failure cache.

PS: #26806 mentions "rendezvous circuits" in the title, but I think it should be intro circuits

Yes.

comment:7 Changed 4 months ago by asn

Status: needs_reviewneeds_revision

Great branch! Review submitted!

comment:8 in reply to:  7 Changed 4 months ago by dgoulet

Status: needs_revisionneeds_review

Replying to asn:

Great branch! Review submitted!

I replied. Would be great to get your feedback on my answers before I jump in and provide a new version of the branch. Thanks!!

comment:9 Changed 4 months ago by asn

Status: needs_reviewneeds_revision

Replied!

comment:10 Changed 3 months ago by dgoulet

Status: needs_revisionneeds_review

As discussed with asn on IRC, new PR/branch since much have changed from asn's review.

Branch: ticket32020_043_02
PR: https://github.com/torproject/tor/pull/1554

comment:11 Changed 3 months ago by asn

Status: needs_reviewneeds_revision

Looks really good. Just a bit more of nitpicking and we are done.

comment:12 Changed 3 months ago by dgoulet

Status: needs_revisionneeds_review

Done. Hope this works out :)

comment:13 Changed 3 months ago by asn

OK. Please find latest PR here: https://github.com/torproject/tor/pull/1573

It squashed the latest branch, removed the reverted commits, fixed a typo (wrods -> words), and improved a bit the changes file (to clarify that the reachability delta is positive).

I made a PR just to make sure that CI passes. I will merge when it finishes.

comment:14 Changed 3 months ago by asn

Resolution: fixed
Status: needs_reviewclosed

Merged! Thanks!

comment:15 Changed 6 weeks ago by arma

see #32847 for a new bug that might have been introduced by this fix.

Note: See TracTickets for help on using tickets.