When accessing onion service with no fetchable descriptor, Tor sits around until timeout rather than hanging up

changed milestone to %Tor: unspecified

added 034-removed-20180328 034-triage-20180328 component::core tor/tor milestone::Tor: unspecified owner::asn priority::medium severity::normal status::assigned tor-hs type::defect version::tor 0.2.8.2-alpha labels

Trac:
Cc: N/A to asn, dgoulet

Here are the relevant logs from the 0.2.5 case:

Sep 26 00:07:40.912 [info] connection_dir_client_reached_eof(): Received rendezvous descriptor (size 0, status 404 ("Not found"))
Sep 26 00:07:40.912 [info] connection_dir_client_reached_eof(): Fetching v2 rendezvous descriptor failed: Retrying at another directory.
Sep 26 00:07:40.912 [debug] conn_close_if_marked(): Cleaning up connection (fd -1).
Sep 26 00:07:40.912 [debug] rend_client_refetch_v2_renddesc(): Fetching v2 rendezvous descriptor for service qiu3onp7v7z25u5i
Sep 26 00:07:40.912 [info] directory_get_from_hs_dir(): Could not pick one of the responsible hidden service directories, because we requested them all recently without success.
Sep 26 00:07:40.912 [info] directory_get_from_hs_dir(): Could not pick one of the responsible hidden service directories, because we requested them all recently without success.
Sep 26 00:07:40.912 [info] rend_client_refetch_v2_renddesc(): Could not pick one of the responsible hidden service directories to fetch descriptors, because we already tried them all unsuccessfully.
Sep 26 00:07:40.912 [notice] Closing stream for 'qiu3onp7v7z25u5i.onion': hidden service is unavailable (try again later).
Sep 26 00:07:40.912 [info] rend_client_note_connection_attempt_ended(): Connection attempt for qiu3onp7v7z25u5i has ended; cleaning up temporary state.

I noticed this bug because onionshare's behavior is to launch the transient onion service via the control port, and then try connecting to the onion address repeatedly until it works. In the old onionshare behavior (Tor 0.2.5), it tried a few times, with not much delay between tries, and then it was ready. In the new onionshare behavior (Tor 0.3.2), it has to wait the whole 120 seconds before it learns that its first try didn't work.

Tor 0.2.8.2-alpha has the bad behavior.

Tor 0.2.8.1-alpha has the good behavior.

I am suspecting #15937 (moved).

Trac:
Version: N/A to Tor: 0.2.8.2-alpha

commit d8b93b3 is where it's at:

-  ret = rend_client_fetch_v2_desc(rend_query, NULL);
-  if (ret <= 0) {
-    /* Close pending connections on error or if no hsdir can be found. */
-    rend_client_desc_trynow(rend_query->onion_address);
-  }
+  rend_client_fetch_v2_desc(rend_query, NULL);
+  /* We don't need to look the error code because either on failure or
+   * success, the necessary steps to continue the HS connection will be
+   * triggered once the descriptor arrives or if all fetch failed. */
   return;

Where does the "if all fetch failed" logic kick in?

Notice that in git master now, rend_client_desc_trynow() has an

  } else { /* 404, or fetch didn't get that far */

clause, which is never reached because the function is only called from one point in handle_response_fetch_renddesc_v2() where we just successfully got the descriptor.

v3 onion addresses exhibit the same bug:

Sep 26 01:50:06.898 [info] hs_pick_hsdir(): Could not pick one of the responsible hidden service directories, because we requested them all recently without success.
Sep 26 01:50:06.898 [info] fetch_v3_desc(): Couldn't pick a v3 hsdir.

yet I wait the full 120 seconds until

Sep 26 01:51:56.882 [notice] Tried for 120 seconds to get a connection to 7ga6xlti5joarlmkuhjaifa47ukgcwz6tfndgax45ocyn4rixm632jie:80. Giving up. (waiting for rendezvous desc)

arma asked me on IRC how to make a hidden service address no-one has ever posted.

This command will do it:

export TORDIR=`mktemp -d`
src/or/tor DisableNetwork 1 DataDirectory "$TORDIR/data" HiddenServiceDir "$TORDIR/hs" HiddenServiceVersion 3 HiddenServicePort 1
# type ctrl-C
cat "$TORDIR/hs/hostname"

It can be used for v2 or v3 by modifying the HiddenServiceVersion option.

Please see branch bug23653 in my repo for a fix for the v2 and v3 case.

A bug was also found during this fix, since we were computing a blinded key using time(NULL) instead of consensus time which caused issues when purging the HSDir req cache.

Trac:
Status: new to needs_review

Trac:
Keywords: regression deleted, regression backport? added
Milestone: N/A to Tor: 0.3.2.x-final

setting owner

Trac:
Owner: N/A to asn
Status: needs_review to assigned

Trac:
Status: assigned to needs_review

Worth mentioning that my branch will fix this issue but reveal the original issue of #15937 (moved). Perhaps #15937 (moved) is a minor problem tho.

Also this might be worth backporting back to 0.2.8.

All commits lgtm. Below is for one single commit e278d84df5294872:

if (base_conn->marked_for_close is dead code because connection_list_by_type_state() does NOT send you back marked for close connections.
I think this should be done outside of the for each loop: purge_hid_serv_request(identity_pk);
The other thing I'm thinking about is that I've always been annoyed by the v2 warning that doesn't tell me why tor had to give up on the service. You think we can move the error code we get from fetch_v3_desc() up to that function?
Also, I think we could do this warning only once instead of for each SOCKS connection? Else, we should try to put some identifier of the SOCKS request in the log if we really want one per-socks. I often get those like 5 in a row because I had 5 SOCKS request for 5 different email account going to the same .onion...
conns memory leaks, it needs to be smartlist_free().

Trac:
Cc: asn, dgoulet to asn
Status: needs_review to needs_revision
Reviewer: N/A to dgoulet

Ok some changes happened. I took over the branch as discussed and implemented a way to not trigger a new fetch request if we already have one pending for a given service.

So, if we launch 7 SOCKS requests, the first one will fetch then the 6 others will wait patiently that either the descriptor arrives or the fetches fail and they are all closed after.

See branch: bug23653_032_01

This does NOT implement the above for v2, only v3 for now. If we are satisfied with this, we should fix v2 for an improved user experience.

Trac:
Status: needs_revision to needs_review
Reviewer: dgoulet to asn

I think this approach makes sense for v3.

I pushed my own bug23653_032_01 with an added fixup which does:

Fixes a comment typo.
Renames close_all_conn_wait_desc() since it was not only doing that (it was also cleaning up the HSDir request cache)
Made it clean the HSDir request cache even if we don't kill any SOCKS connections, because you never know, and we always want to be able to retry if the application-layer asks us to.

BTW the branch now will also fix this for v2, and leave #15937 (moved) open. I'm fine with this. Maybe we can open a ticket about fixing #15937 (moved) for v2 properly?

We will also need a changes file.

Trac:
Status: needs_review to needs_revision

Ok thanks asn! Moving the v3 fix to #23672 (moved) and let's leave this ticket for v2 only now.

I think the bottom line for the v2 fix is do the same as #23672 (moved) which would require a slight refactoring for v2.

Trac:
Reviewer: asn to N/A
Status: needs_revision to new

When accessing onion service with no fetchable descriptor, Tor sits around until timeout rather than hanging up

Child items 0

Activity