#23762 closed defect (fixed)

hs-v3: Client request with missing dirinfo will always timeout

Reported by: dgoulet Owned by: dgoulet
Priority: High Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs, prop224, tor-client, review-group-24
Cc: Actual Points:
Parent ID: Points:
Reviewer: dgoulet Sponsor:

Description

When the SOCKS request is handled in connection_ap_handle_onion(), if we are missing dirinfo (including missing a live consensus), the connection state is put in "AP_CONN_STATE_CIRCUIT_WAIT".

Then it is retried every second through connection_ap_handshake_attach_circuit() which does a gazillion things but among those it tries to open the IP/RP circuits. Of course, we can't get an IP because we have no descriptor so circuit_get_open_circ_or_launch(), which tries to get that IP circuit, won't be able so that function will get an intro point from the descriptor (that doesn't exists because we can't even fetch it) but won't work.

It then triggers a descriptor fetch and then puts the connection in AP_CONN_STATE_RENDDESC_WAIT. Now, because we don't look at the returned code, we don't know what really happened and if tor still doesn't have enough dirinfo to proceed, the connection is still put in "renddesc wait" making it NOT a pending connection anymore thus never retried after that.

The only way for a connection to get out of that "renddesc wait" state is either timing out (which is what happens for v3) or the descriptor arrives and then there is a callback in the HS client subsystem to handle that desc.

The possible solutions are I believe:

  1. Either we keep the connection in "circuit wait" so it gets retried regularly.
  1. We create a new "AP CONN" state that is "waiting for dirinfo" and then when we get the live consensus or/and minimum dirinfo, we callback the HS subsystem and look for SOCKS conn in that state.

This is not a problem v2 HS suffers because it only care about a "reasonably live consensus".

Child Tickets

Change History (9)

comment:1 Changed 18 months ago by dgoulet

Reviewer: asn
Status: newneeds_review

After discussing with asn, I went with a third option which was is to let the connection state in "renddesc wait" and add a callback to the HS client subsystem if the directory information changes.

This turns out to be tricky but here is an attempt. I believe it is not perfect but it gives us a baseline to discuss:

See branch: bug23762_032_01.

First commit is a refactor splitting our refetch function into validation and fetching. Nothing to do with the fix per-se, we can easily just remove it but originally it was part of the fix and I kept it because I think it is nicer.

comment:2 Changed 18 months ago by nickm

Owner: set to dgoulet
Status: needs_reviewassigned

setting owner

comment:3 Changed 18 months ago by nickm

Status: assignedneeds_review

comment:4 Changed 17 months ago by nickm

Keywords: review-group-24 added

review-group-24 is now open.

comment:5 Changed 17 months ago by nickm

Refactor looks okay to me. Asn, what do you think of this change?

comment:6 Changed 17 months ago by asn

Refactor commit looks okay. I'd like to think more about the next commit, but today I'd like to focus on microdescriptor tickets. Will be back here.

comment:7 Changed 17 months ago by asn

Please check my branch bug23762_032_01. It includes a changes file and some doc improvements.

I'm sad we don't have a unittest for this functionality. I wonder if it's worth delaying further... since it does not seem straightforward to write one.

comment:8 Changed 17 months ago by dgoulet

Reviewer: asndgoulet
Status: needs_reviewmerge_ready

Changes lgtm;

Tests were failing and leaking. See branch with a fixup commit for that: bug23762_032_02

Agree on the testing but it ain't easy to test this in unit test which requires _many_ thing to interact. We kind of have to bet on our integration test and user base for testing :S. However, that code path is hit in unit test for sure at least.

comment:9 Changed 17 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

squashed and merged into maint-0.3.2 and forward. Let's hope it's good!

Note: See TracTickets for help on using tickets.