Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#15937 closed defect (fixed)

Clients fail on the 7th rapid SOCKS request to the same HS

Reported by: teor Owned by: dgoulet
Priority: Low Milestone: Tor: 0.2.8.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs
Cc: Actual Points:
Parent ID: Points: small/medium
Reviewer: Sponsor: SponsorR-must

Description

If a tor client gets 7 SOCKS connection requests to a hidden service with an uncached descriptor in rapid succession, it launches a HSDir request for each connection. It doesn't wait for the circuit to any of the HSDirs to be built, or wait for a response from any of the HSDirs.

After 6 connections, it fails on the 7th attempted fetch, because it has tried to (rapidly) fetch the descriptor 6 times, and hasn't got it yet. It then fails each of the outstanding SOCKS requests 1-7.

It can do this all in the same second, before any circuits have a chance to be built.

The client then backs off after the descriptor fetch failure, the circuit build attempts succeed, and then the 8th and subsequent requests succeed.

This behaviour is unlikely to be triggered by HTML-based hidden services. There is typiclly 1 connection with an uncached descriptor to load the initial HTML page, then further connections with a cached descriptor load any page resources.

Observed in 0.2.7.0-dev.

tor logs attached for 3 HSDirs, 1 client, and 1 HS in a chutney network hs-100-clients (branch with these changes in chutney upcoming in #15936)

Child Tickets

Attachments (1)

hsdir-multi-request-bug.tgz (93.9 KB) - added by teor 4 years ago.
Logs of HSDirs, Client, and HS in hs-100-clients chutney network

Download all attachments as: .zip

Change History (27)

Changed 4 years ago by teor

Attachment: hsdir-multi-request-bug.tgz added

Logs of HSDirs, Client, and HS in hs-100-clients chutney network

comment:1 Changed 4 years ago by dgoulet

Keywords: tor-hs added
Milestone: Tor: 0.2.8.x-finalTor: 0.2.7.x-final

comment:2 Changed 4 years ago by teor

On second thoughts, there is one scenario where TorBrowser could trigger this behavior on a system with a fast CPU and slow network link:

  1. Bookmark the same onion site 7 times in a bookmarks folder
  2. Wait until the cached HS entry expires
  3. Click "Open All In Tabs"
  4. If TorBrowser opens the tabs fast enough, it could trigger tor to launch 6 requests then fail on the 7th, while still waiting for the other 6 to connect

comment:3 Changed 4 years ago by teor

This issue has also been logged as #16501, in the following scenario:

 I have a small threaded program that connects to the tor socks port and issues a GET request on hidden services.

I noticed that if it makes 7 concurrent calls to one hidden service, then all of the requests will fail immediately.

#16501 also has further log details and the same diagnosis I provided above. It's been closed as a duplicate.

comment:4 Changed 4 years ago by arma

It sounds like the Tor client isn't properly checking to see if there is already an hsdesc fetch in progress.

(If there is, the right behavior is to put this stream into whatever state it is that pending streams go into such that when the hsdesc finishes, it notifies all streams in that state so they can move forward.)

comment:5 Changed 4 years ago by teor

dgoulet was working on this, but I'm not sure where he got up to.

I can confirm it's still an issue in the latest git sources, as I triggered this bug using src/test/test-network.sh --connections 10 --flavour hs --sleep 30
as I was testing the chutney performance measurement changes in #14175.

comment:6 Changed 4 years ago by dgoulet

Keywords: TorCoreTeam201508 added
Owner: set to dgoulet
Status: newassigned

comment:7 Changed 3 years ago by nickm

Keywords: TorCoreTeam201509 added; TorCoreTeam201508 removed

comment:8 Changed 3 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:9 Changed 3 years ago by nickm

Keywords: SponsorR removed
Sponsor: SponsorR

Bulk-replace SponsorR keyword with SponsorR sponsor field in Tor component.

comment:10 Changed 3 years ago by dgoulet

Keywords: TorCoreTeam201509 removed
Points: small/medium

comment:11 Changed 3 years ago by nickm

Priority: MediumLow

comment:12 Changed 3 years ago by dgoulet

Severity: Normal
Status: assignedneeds_information

Hrm ok it's a fun puzzle but I think I figured it out and the solution could be simple.

In rend_client_refetch_v2_renddesc(), at the bottom if the fetch request fails by not finding a new HSDir because all 6 have been queried already, we call rend_client_desc_trynow() which closes all RENDDESC_WAIT connections for a .onion thus losing all previous 6.

I can't figure out why we do that if we can't find an HSDir... so can we simply remove this?:

  if (ret <= 0) {
    /* Close pending connections on error or if no hsdir can be found. */
    rend_client_desc_trynow(rend_query->onion_address);
  }

When the fetch succeeds (in connection_dir_client_reached_eof()), we already call that function to move to the next stage for HS connection so I propose we remove the above.

comment:13 Changed 3 years ago by cypherpunks

Resolution: not a bug
Status: needs_informationclosed

This is not a bug.

comment:14 in reply to:  13 Changed 3 years ago by dgoulet

Resolution: not a bug
Status: closedreopened

Replying to cypherpunks:

This is not a bug.

Cool cool... please explain why. Reopening...

comment:15 Changed 3 years ago by cypherpunks

Resolution: not a bug
Status: reopenedclosed

comment:16 Changed 3 years ago by dgoulet

Resolution: not a bug
Status: closedreopened

comment:17 in reply to:  12 ; Changed 3 years ago by teor

Replying to dgoulet:

Hrm ok it's a fun puzzle but I think I figured it out and the solution could be simple.

In rend_client_refetch_v2_renddesc(), at the bottom if the fetch request fails by not finding a new HSDir because all 6 have been queried already, we call rend_client_desc_trynow() which closes all RENDDESC_WAIT connections for a .onion thus losing all previous 6.

I can't figure out why we do that if we can't find an HSDir... so can we simply remove this?:

  if (ret <= 0) {
    /* Close pending connections on error or if no hsdir can be found. */
    rend_client_desc_trynow(rend_query->onion_address);
  }

When the fetch succeeds (in connection_dir_client_reached_eof()), we already call that function to move to the next stage for HS connection so I propose we remove the above.

Makes sense to me, but do we limit the number of pending connections to one per HSDir?
And do we retry when the connections actually fail?

(I don't know this code very well. I am concerned that we could end up asking a HSDir multiple times, or end up stalling if the network is slow when we first try all 6, but speeds up later.)

comment:18 in reply to:  17 Changed 3 years ago by dgoulet

Replying to teor:

Replying to dgoulet:

Hrm ok it's a fun puzzle but I think I figured it out and the solution could be simple.

In rend_client_refetch_v2_renddesc(), at the bottom if the fetch request fails by not finding a new HSDir because all 6 have been queried already, we call rend_client_desc_trynow() which closes all RENDDESC_WAIT connections for a .onion thus losing all previous 6.

I can't figure out why we do that if we can't find an HSDir... so can we simply remove this?:

  if (ret <= 0) {
    /* Close pending connections on error or if no hsdir can be found. */
    rend_client_desc_trynow(rend_query->onion_address);
  }

When the fetch succeeds (in connection_dir_client_reached_eof()), we already call that function to move to the next stage for HS connection so I propose we remove the above.

Makes sense to me, but do we limit the number of pending connections to one per HSDir?

One socks connection can query all 6 HSDirs.

And do we retry when the connections actually fail?

There is a "cache" of last queried HSDir that is we don't query an HSDir more than once for the same request. Once the descriptor arrives or the SOCKS connection ends, the cache is purge for the requested onion.

(I don't know this code very well. I am concerned that we could end up asking a HSDir multiple times, or end up stalling if the network is slow when we first try all 6, but speeds up later.)

See last_hid_serv_requests in rendclient.c for more info.

comment:19 Changed 3 years ago by cypherpunks

This behaviour is unlikely to be triggered by HTML-based hidden services.

Unless you're performing an attack. Should further tickets like this one be filed to make other attacks easier?

comment:20 Changed 3 years ago by dgoulet

Status: reopenedneeds_revision

comment:21 Changed 3 years ago by dgoulet

Status: needs_revisionneeds_review

See branch bug15937_028_01.

Tested and the 7th connection no longer kill all the others. Easy test:

for i in `seq 1 7`; do
	torsocks wget http://someaddress.onion &
done

Note that once the descriptor arrives, the 5 other HSDir connections are _not_ terminated immediately. It doesn't cause a bug or anything, it's maybe an unnecessary load on the network to keep them finishing. I'll open a bug later about that.

comment:22 Changed 3 years ago by nickm

hm, that seems easy. Thanks also for the comment. Two questions:

  • Is there any reason that rend_client_fetch_v2_desc() could fail other than already having 6 directory connections? If so the check should turn into if ret < 0, right?
  • Yes, please open that other bug.
  • Do we really want to open a number of directory fetches that depends on how many client requests we got at once? That behavior seems kind of weird. If you agree, please open a bug for 0.2.9?

comment:23 in reply to:  22 Changed 3 years ago by dgoulet

Replying to nickm:

hm, that seems easy. Thanks also for the comment. Two questions:

  • Is there any reason that rend_client_fetch_v2_desc() could fail other than already having 6 directory connections? If so the check should turn into if ret < 0, right?

Yes it can fail for other reasons but since rend_client_refetch_v2_renddesc() is a void function, there is not much we can do if it does fail. The appropriate logs will be displayed though.

  • Yes, please open that other bug.

On my stack.

  • Do we really want to open a number of directory fetches that depends on how many client requests we got at once? That behavior seems kind of weird. If you agree, please open a bug for 0.2.9?

Indeed. We shouldn't do that much requests (unless some historical reasons made us do that). I think it should be only 2 or 3 (max half) in parallel and once we have the descriptor, close the pending ones. That's actually the behavior with introduction points.

I'll open a bug for 029 and if it's not what we want in the end, we'll comment there. Thx!

comment:24 Changed 3 years ago by nickm

Okay, makes sense. Merging this. Please close this ticket when you've opened the other ones?

comment:25 Changed 3 years ago by dgoulet

Resolution: fixed
Sponsor: SponsorRSponsorR-must
Status: needs_reviewclosed

Issues found through this bug are in #18564. Closing.

comment:26 Changed 3 years ago by teor

(This bug was likely introduced in commit 59f8dced in 0.2.7.1-alpha.)

Note: See TracTickets for help on using tickets.