Opened 9 years ago

Closed 8 years ago

Last modified 7 years ago

#2401 closed defect (not a bug)

rend_cache_store(): We already have a v2 descriptor for service xxxxx

Reported by: prof7bit Owned by: rransom
Priority: High Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Keywords: tor-hs
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

This is happening with the current 0.2.1.29 and the current alpha.

When a hidden service is restarted the old intro points are not valid anymore. Tor seems to notice this and will fetch a new descriptor when it has run out of intro points but then after doing this expensive operation it simply discards it with the excuse that there is already *some* descriptor there. The expensive operation of fetching has been done for nothing.

The next time it will again try the old descriptor of which it already *knows* that there are no valid intro points and fetch a new one (possibly better one) again, but only to discard it without even looking at it, again and again and again.

I have seen this going on for quite a long time sometimes. Sometimes I have been unable to connect to a hidden service of which I *knew* that was running for many hours and I'm sure if it had not discarded the newly downloaded descriptor every time it would have been able to connect eventually.

Something is still wrong with this whole caching logic. I am making heavy use of hidden services with TorChat and sometimes it is really painful watching the logs and watching it not being able to connect to a service that *IS* running for hours and not being able to do anything about it.

rend_cache_store() should store the descriptor and replace the old one in all cases. Either it is the exact same, then it has no effect at all or it is different (then it is in all likelihood a better one than the one it had before because it was fetched later than the cached one)

Alternatively there could be made a possibility to manually clear the cache via the control protocol, either the complete cache or for an individual service.

below is a log from the current stable version:

bernd@t40:~\ $ tail -f .torchat/Tor/tor.log | grep wa5w
Jan 17 14:36:10.388 [notice] Tried for 120 seconds to get a connection to wa5wauj22un2au6o:11009. Giving up. (waiting for rendezvous desc)
Jan 17 14:36:16.904 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'wa5wauj22un2au6o'
Jan 17 14:36:16.904 [info] circuit_get_open_circ_or_launch(): No intro points for 'wa5wauj22un2au6o': re-fetching service descriptor.
Jan 17 14:36:16.904 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "wa5wauj22un2au6o"
Jan 17 14:36:18.682 [info] rend_cache_store(): We already have a v2 descriptor for service wa5wauj22un2au6o.
Jan 17 14:38:16.832 [notice] Tried for 120 seconds to get a connection to wa5wauj22un2au6o:11009. Giving up. (waiting for rendezvous desc)
Jan 17 14:38:27.712 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'wa5wauj22un2au6o'
Jan 17 14:38:27.712 [info] circuit_get_open_circ_or_launch(): No intro points for 'wa5wauj22un2au6o': re-fetching service descriptor.
Jan 17 14:38:27.712 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "wa5wauj22un2au6o"
Jan 17 14:38:29.226 [info] rend_cache_store(): We already have a v2 descriptor for service wa5wauj22un2au6o.
Jan 17 14:40:27.460 [notice] Tried for 120 seconds to get a connection to wa5wauj22un2au6o:11009. Giving up. (waiting for rendezvous desc)
Jan 17 14:40:31.442 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'wa5wauj22un2au6o'
Jan 17 14:40:31.442 [info] circuit_get_open_circ_or_launch(): No intro points for 'wa5wauj22un2au6o': re-fetching service descriptor.
Jan 17 14:40:31.442 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "wa5wauj22un2au6o"
Jan 17 14:40:32.682 [info] rend_cache_store(): We already have a v2 descriptor for service wa5wauj22un2au6o.
Jan 17 14:42:31.484 [notice] Tried for 120 seconds to get a connection to wa5wauj22un2au6o:11009. Giving up. (waiting for rendezvous desc)
Jan 17 14:42:41.493 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'wa5wauj22un2au6o'
Jan 17 14:42:41.493 [info] circuit_get_open_circ_or_launch(): No intro points for 'wa5wauj22un2au6o': re-fetching service descriptor.
Jan 17 14:42:41.493 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "wa5wauj22un2au6o"
Jan 17 14:42:42.191 [info] rend_cache_store(): We already have a v2 descriptor for service wa5wauj22un2au6o.
Jan 17 14:44:41.605 [notice] Tried for 120 seconds to get a connection to wa5wauj22un2au6o:11009. Giving up. (waiting for rendezvous desc)
Jan 17 14:44:54.441 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'wa5wauj22un2au6o'
Jan 17 14:44:54.441 [info] circuit_get_open_circ_or_launch(): No intro points for 'wa5wauj22un2au6o': re-fetching service descriptor.
Jan 17 14:44:54.441 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "wa5wauj22un2au6o"
Jan 17 14:44:55.229 [info] rend_cache_store(): We already have a v2 descriptor for service wa5wauj22un2au6o.
Jan 17 14:46:54.864 [notice] Tried for 120 seconds to get a connection to wa5wauj22un2au6o:11009. Giving up. (waiting for rendezvous desc)
Jan 17 14:47:03.401 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'wa5wauj22un2au6o'
Jan 17 14:47:03.401 [info] connection_ap_handshake_rewrite_and_attach(): Stale descriptor wa5wauj22un2au6o. Re-fetching.
Jan 17 14:47:03.418 [info] directory_get_from_hs_dir(): Sending fetch request for v2 descriptor for service 'wa5wauj22un2au6o' with descriptor ID 'z3zqz265onwi4jdsvqip5xclgglnjk4g', auth type 0, and descriptor cookie '[none]' to hidden service directory 'Slipstream' on port 9030.
Jan 17 14:47:03.418 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "wa5wauj22un2au6o"
Jan 17 14:47:05.364 [info] rend_cache_store(): We already have a v2 descriptor for service wa5wauj22un2au6o.
Jan 17 14:47:11.039 [info] circuit_get_open_circ_or_launch(): Chose '$714AF90AFF324373D6F6A4393084D781C4219FFD' as intro point for 'wa5wauj22un2au6o'.
Jan 17 14:47:43.529 [info] circuit_get_open_circ_or_launch(): Chose '$CAC54282E75D1459B6174BE696303035929894CD' as intro point for 'wa5wauj22un2au6o'.
Jan 17 14:47:49.841 [info] circuit_get_open_circ_or_launch(): Chose '$52AEEA15F22B17F6DC760FB4F0DAC7F89DF7934D' as intro point for 'wa5wauj22un2au6o'.


(btw 0.2.1.29 is missing from the dropdown list below)

Child Tickets

Change History (10)

comment:1 Changed 9 years ago by prof7bit

I did not actually look deep inside the code yet, it is the huge time differene between refetch and store that makes me believe that it is indeed doing something very expensive here like fetching a new descriptor. But maybe I am also completely misinterpreting the logs.

comment:2 Changed 9 years ago by rransom

Component: - Select a componentTor Client
Milestone: Tor: 0.2.1.x-final

comment:3 Changed 9 years ago by prof7bit

Below is an example where it took far longer than 15 minutes, actually it never succeeded although I know that the service existed.

Although it was older than 15 minutes it did not try any of the introduction points of the newly fetched descriptor or even tried to compare whether it is still the same old descriptor or a different one. It just throws it away after fetching it, it won't ever update its cache, whenever there is *any* v2 descriptor in the cache, no matter how old it is.

The following comment is in rendcommon.c at the top of the function rend_cache_store():

 998  * If we are acting as client due to the published flag and have any v2
 999  * descriptor with the same ID, reject this one in order to not get
1000  * confused with having both versions for the same service.

and the following is in the code:

1043   /* Do we have a v2 descriptor and fetched this descriptor as a client? */
1044   tor_snprintf(key, sizeof(key), "2%s", query);
1045   if (!published && strmap_get_lc(rend_cache, key)) {
1046     log_info(LD_REND, "We already have a v2 descriptor for service %s.",
1047              safe_str_client(query));
1048     rend_service_descriptor_free(parsed);
1049     return -1;
1050   }

To me this looks plain wrong. It wont ever be able to connect to this hidden service ever again once it has *any* descriptor for this service in its cache, no matter how old it is. I can vaguely remember a time in the past when hidden services worked flawlessly but at the moment it is really difficult, almost impossible sometimes.You won't notice the problem if you fire up Tor and connect to some random hidden service website with the browser but when you are using an application like torchat which will desperately try to establish a long lived connection to each of the buddies on its list from which it got a ping (and thus *knows* that it's service must be online) the problem becomes obvious.

It all seems to be related to v2 descriptors. I have seen this situation resolve itself sometimes as soon as a new v0 descriptor appears on the scene from somewhere, then it will immediately connect but the whole v2 mechanism seems completely broken.

Jan 22 14:45:41.013 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'gtvsxxxxxxxxxxxx'
Jan 22 14:45:41.013 [info] connection_ap_handshake_rewrite_and_attach(): Stale descriptor gtvsxxxxxxxxxxxx. Re-fetching.
Jan 22 14:45:41.043 [info] directory_get_from_hs_dir(): Sending fetch request for v2 descriptor for service 'gtvsxxxxxxxxxxxx' with descriptor ID '25zu6xfciiexu46ejtdgl36a47dd62fv', auth type 0, and descriptor cookie '[none]' to hidden service directory 'infogtor' on port 9030.
Jan 22 14:45:41.043 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "gtvsxxxxxxxxxxxx"
Jan 22 14:45:44.475 [info] rend_cache_store(): We already have a v2 descriptor for service gtvsxxxxxxxxxxxx.
Jan 22 14:46:07.872 [info] circuit_get_open_circ_or_launch(): Chose '$9C1F4DFBCBB38C8DA3CB1EAEE1B564BA83462E7B' as intro point for 'gtvsxxxxxxxxxxxx'.
Jan 22 14:46:19.839 [info] rend_client_remove_intro_point(): 2 options left for "gtvsxxxxxxxxxxxx".
Jan 22 14:46:19.839 [info] rend_client_introduction_acked(): Got nack for "gtvsxxxxxxxxxxxx" from $9C1F4DFBCBB38C8DA3CB1EAEE1B564BA83462E7B. Re-extending circ 2401, this time to $F3C71028D177715BB93F6D0E3440E24F1832A234.
Jan 22 14:46:22.140 [info] rend_client_remove_intro_point(): 1 options left for "gtvsxxxxxxxxxxxx".
Jan 22 14:46:22.141 [info] rend_client_introduction_acked(): Got nack for "gtvsxxxxxxxxxxxx" from $F3C71028D177715BB93F6D0E3440E24F1832A234. Re-extending circ 2401, this time to $0748226B219A3C7CFD7217A5B552BCE8599CC9C4.
Jan 22 14:46:22.457 [info] circuit_get_open_circ_or_launch(): Chose '$0748226B219A3C7CFD7217A5B552BCE8599CC9C4' as intro point for 'gtvsxxxxxxxxxxxx'.
Jan 22 14:46:27.640 [info] rend_client_remove_intro_point(): No more intro points remain for "gtvsxxxxxxxxxxxx". Re-fetching descriptor.
Jan 22 14:46:27.640 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "gtvsxxxxxxxxxxxx"
Jan 22 14:46:36.692 [info] rend_cache_store(): We already have a v2 descriptor for service gtvsxxxxxxxxxxxx.
Jan 22 14:48:07.996 [notice] Tried for 120 seconds to get a connection to gtvsxxxxxxxxxxxx:11009. Giving up. (waiting for rendezvous desc)
Jan 22 14:54:46.009 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'gtvsxxxxxxxxxxxx'
Jan 22 14:54:46.010 [info] circuit_get_open_circ_or_launch(): No intro points for 'gtvsxxxxxxxxxxxx': re-fetching service descriptor.
Jan 22 14:54:46.010 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "gtvsxxxxxxxxxxxx"
Jan 22 14:54:51.787 [info] rend_cache_store(): We already have a v2 descriptor for service gtvsxxxxxxxxxxxx.
Jan 22 14:56:46.364 [notice] Tried for 120 seconds to get a connection to gtvsxxxxxxxxxxxx:11009. Giving up. (waiting for rendezvous desc)
Jan 22 15:02:18.385 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'gtvsxxxxxxxxxxxx'
Jan 22 15:02:18.385 [info] connection_ap_handshake_rewrite_and_attach(): Stale descriptor gtvsxxxxxxxxxxxx. Re-fetching.
Jan 22 15:02:18.408 [info] directory_get_from_hs_dir(): Sending fetch request for v2 descriptor for service 'gtvsxxxxxxxxxxxx' with descriptor ID '25zu6xfciiexu46ejtdgl36a47dd62fv', auth type 0, and descriptor cookie '[none]' to hidden service directory 'SNOWOUTSIDE' on port 9030.
Jan 22 15:02:18.408 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "gtvsxxxxxxxxxxxx"
Jan 22 15:02:20.437 [info] rend_cache_store(): We already have a v2 descriptor for service gtvsxxxxxxxxxxxx.
Jan 22 15:02:38.010 [info] rend_cache_store_v2_desc_as_client(): We already have this service descriptor gtvsxxxxxxxxxxxx.
Jan 22 15:04:18.197 [notice] Tried for 120 seconds to get a connection to gtvsxxxxxxxxxxxx:11009. Giving up. (waiting for rendezvous desc)
Jan 22 15:10:36.209 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'gtvsxxxxxxxxxxxx'
Jan 22 15:10:36.210 [info] circuit_get_open_circ_or_launch(): No intro points for 'gtvsxxxxxxxxxxxx': re-fetching service descriptor.
Jan 22 15:10:36.210 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "gtvsxxxxxxxxxxxx"
Jan 22 15:10:38.559 [info] rend_cache_store(): We already have a v2 descriptor for service gtvsxxxxxxxxxxxx.
Jan 22 15:12:36.280 [notice] Tried for 120 seconds to get a connection to gtvsxxxxxxxxxxxx:11009. Giving up. (waiting for rendezvous desc)
Jan 22 15:18:38.293 [info] connection_ap_handshake_rewrite_and_attach(): Got a hidden service request for ID 'gtvsxxxxxxxxxxxx'
Jan 22 15:18:38.293 [info] connection_ap_handshake_rewrite_and_attach(): Stale descriptor gtvsxxxxxxxxxxxx. Re-fetching.
Jan 22 15:18:38.320 [info] directory_get_from_hs_dir(): Sending fetch request for v2 descriptor for service 'gtvsxxxxxxxxxxxx' with descriptor ID '25zu6xfciiexu46ejtdgl36a47dd62fv', auth type 0, and descriptor cookie '[none]' to hidden service directory 'infogtor' on port 9030.
Jan 22 15:18:38.324 [info] rend_client_refetch_renddesc(): Fetching rendezvous descriptor for service "gtvsxxxxxxxxxxxx"
Jan 22 15:18:40.954 [info] rend_cache_store(): We already have a v2 descriptor for service gtvsxxxxxxxxxxxx.
Jan 22 15:18:50.249 [info] rend_cache_store_v2_desc_as_client(): We already have this service descriptor gtvsxxxxxxxxxxxx.
Jan 22 15:20:38.344 [notice] Tried for 120 seconds to get a connection to gtvsxxxxxxxxxxxx:11009. Giving up. (waiting for rendezvous desc)

comment:4 Changed 9 years ago by arma

Component: Tor ClientTor hidden services

comment:5 Changed 8 years ago by rransom

Owner: set to rransom
Status: newassigned

comment:6 Changed 8 years ago by nickm

Milestone: Tor: 0.2.1.x-finalTor: 0.2.2.x-final
Priority: normalmajor

comment:7 Changed 8 years ago by arma

I wonder if this bug still applies to 0.2.2, given that we dropped the v0 hidden service descriptors.

If it does, it would be worth fixing, and Robert (with his recent messings on #3000) is just the guy for it.

comment:8 in reply to:  7 Changed 8 years ago by rransom

Resolution: not a bug
Status: assignedclosed

Replying to arma:

I wonder if this bug still applies to 0.2.2, given that we dropped the v0 hidden service descriptors.

The log message prof7bit is complaining about can't occur in 0.2.2.x (even though the code to emit it was left in), because the 0.2.2.x branch doesn't fetch v0 HS descriptors. It is also not a bug.

The “We already have a v2 descriptor” message can only occur when Tor has just fetched a v0 descriptor; it will happily replace a v2 descriptor with another v2 descriptor, or replace a v0 descriptor with another v0 descriptor. The client code shouldn't replace a v2 descriptor with a v0 descriptor or vice versa, for at least two reasons:

  • If the client replaces a v2 descriptor with a v0 descriptor, and later retrieves another copy of the same v2 descriptor it had before, it won't notice that that's the same descriptor, it'll try all of the introduction points in that descriptor again and fail, and then it'll go looking for another descriptor.
  • If the client replaces a v0 descriptor with a v2 descriptor at the wrong time, it will trigger #1930 and fail to connect to the HS outright.

The only actual bug shown in prof7bit's reports here is the following:

Jan 22 15:02:38.010 [info] rend_cache_store_v2_desc_as_client(): We already have this service descriptor gtvsxxxxxxxxxxxx.

This is also not a bug in the client code or hidden service server code. It is a bug in the directory authorities' assignment of the HSDir flag: the set of relays with the HSDir flag needs to remain fairly stable in order for the HS directory system to work, but instead the set of HSDir relays changes significantly in nearly every new consensus. In this case, the client fetched a descriptor from an HSDir relay which had an old descriptor because it was using a different consensus from the one that the hidden service server had. #2649 is one of the tickets opened for this issue.

comment:9 Changed 7 years ago by nickm

Keywords: tor-hs added

comment:10 Changed 7 years ago by nickm

Component: Tor Hidden ServicesTor
Note: See TracTickets for help on using tickets.