Running Tor git master, and trying to connect to an onion address that I just made up (nothing has ever been there):
$ torify telnet qiu3onp7v7z25u5i.onion 80telnet: Unable to connect to remote host: Connection timed out
And on the Tor log I see
Sep 25 23:54:17.776 [notice] Tried for 120 seconds to get a connection to qiu3onp7v7z25u5i:80. Giving up. (waiting for rendezvous desc)
That is, it took 120 seconds to fail.
Compare to when using Tor release-0.2.5:
$ torify telnet qiu3onp7v7z25u5i.onion 80telnet: Unable to connect to remote host: No route to host
and the Tor log says
Sep 26 00:06:37.515 [notice] Closing stream for 'qiu3onp7v7z25u5i.onion': hidden service is unavailable (try again later).
In the Tor 0.2.5 case, I got my answer in 5-10 seconds: it tried each of the hsdirs, and when the last one said 404, it hung up on the stream. In the Tor master case, it knew the answer in 5-10 seconds, but it just let my stream sit there doing nothing until the timeout arrived.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Sep 26 00:07:40.912 [info] connection_dir_client_reached_eof(): Received rendezvous descriptor (size 0, status 404 ("Not found"))Sep 26 00:07:40.912 [info] connection_dir_client_reached_eof(): Fetching v2 rendezvous descriptor failed: Retrying at another directory.Sep 26 00:07:40.912 [debug] conn_close_if_marked(): Cleaning up connection (fd -1).Sep 26 00:07:40.912 [debug] rend_client_refetch_v2_renddesc(): Fetching v2 rendezvous descriptor for service qiu3onp7v7z25u5iSep 26 00:07:40.912 [info] directory_get_from_hs_dir(): Could not pick one of the responsible hidden service directories, because we requested them all recently without success.Sep 26 00:07:40.912 [info] directory_get_from_hs_dir(): Could not pick one of the responsible hidden service directories, because we requested them all recently without success.Sep 26 00:07:40.912 [info] rend_client_refetch_v2_renddesc(): Could not pick one of the responsible hidden service directories to fetch descriptors, because we already tried them all unsuccessfully.Sep 26 00:07:40.912 [notice] Closing stream for 'qiu3onp7v7z25u5i.onion': hidden service is unavailable (try again later).Sep 26 00:07:40.912 [info] rend_client_note_connection_attempt_ended(): Connection attempt for qiu3onp7v7z25u5i has ended; cleaning up temporary state.
I noticed this bug because onionshare's behavior is to launch the transient onion service via the control port, and then try connecting to the onion address repeatedly until it works. In the old onionshare behavior (Tor 0.2.5), it tried a few times, with not much delay between tries, and then it was ready. In the new onionshare behavior (Tor 0.3.2), it has to wait the whole 120 seconds before it learns that its first try didn't work.
- ret = rend_client_fetch_v2_desc(rend_query, NULL);- if (ret <= 0) {- /* Close pending connections on error or if no hsdir can be found. */- rend_client_desc_trynow(rend_query->onion_address);- }+ rend_client_fetch_v2_desc(rend_query, NULL);+ /* We don't need to look the error code because either on failure or+ * success, the necessary steps to continue the HS connection will be+ * triggered once the descriptor arrives or if all fetch failed. */ return;
Where does the "if all fetch failed" logic kick in?
Notice that in git master now, rend_client_desc_trynow() has an
} else { /* 404, or fetch didn't get that far */
clause, which is never reached because the function is only called from one point in handle_response_fetch_renddesc_v2() where we just successfully got the descriptor.
Sep 26 01:50:06.898 [info] hs_pick_hsdir(): Could not pick one of the responsible hidden service directories, because we requested them all recently without success.Sep 26 01:50:06.898 [info] fetch_v3_desc(): Couldn't pick a v3 hsdir.
yet I wait the full 120 seconds until
Sep 26 01:51:56.882 [notice] Tried for 120 seconds to get a connection to 7ga6xlti5joarlmkuhjaifa47ukgcwz6tfndgax45ocyn4rixm632jie:80. Giving up. (waiting for rendezvous desc)
Please see branch bug23653 in my repo for a fix for the v2 and v3 case.
A bug was also found during this fix, since we were computing a blinded key using time(NULL) instead of consensus time which caused issues when purging the HSDir req cache.
All commits lgtm. Below is for one single commit e278d84df5294872:
if (base_conn->marked_for_close is dead code because connection_list_by_type_state() does NOT send you back marked for close connections.
I think this should be done outside of the for each loop: purge_hid_serv_request(identity_pk);
The other thing I'm thinking about is that I've always been annoyed by the v2 warning that doesn't tell me why tor had to give up on the service. You think we can move the error code we get from fetch_v3_desc() up to that function?
Also, I think we could do this warning only once instead of for each SOCKS connection? Else, we should try to put some identifier of the SOCKS request in the log if we really want one per-socks. I often get those like 5 in a row because I had 5 SOCKS request for 5 different email account going to the same .onion...
conns memory leaks, it needs to be smartlist_free().
Trac: Cc: asn, dgoulet to asn Status: needs_review to needs_revision Reviewer: N/Ato dgoulet
Ok some changes happened. I took over the branch as discussed and implemented a way to not trigger a new fetch request if we already have one pending for a given service.
So, if we launch 7 SOCKS requests, the first one will fetch then the 6 others will wait patiently that either the descriptor arrives or the fetches fail and they are all closed after.
See branch: bug23653_032_01
This does NOT implement the above for v2, only v3 for now. If we are satisfied with this, we should fix v2 for an improved user experience.
Trac: Status: needs_revision to needs_review Reviewer: dgoulet to asn
I pushed my own bug23653_032_01 with an added fixup which does:
Fixes a comment typo.
Renames close_all_conn_wait_desc() since it was not only doing that (it was also cleaning up the HSDir request cache)
Made it clean the HSDir request cache even if we don't kill any SOCKS connections, because you never know, and we always want to be able to retry if the application-layer asks us to.
BTW the branch now will also fix this for v2, and leave #15937 (moved) open. I'm fine with this. Maybe we can open a ticket about fixing #15937 (moved) for v2 properly?
onionshare still uses v2 onions and I assume it still exhibits this bug. :( How close were we to fixing it before we sent the ticket into the dreaded 'unspecified'?