Hidden services fail to load if you have a stale descriptor
Using git head.
In connection_ap_handshake_rewrite_and_attach(), we check rend_cache_lookup_entry(). If >0 (meaning we have a rend descriptor), we then evaluate:
/** How long after we receive a hidden service descriptor do we consider
- it valid? / #define NUM_SECONDS_BEFORE_HS_REFETCH (6015) if (now - entry->received < NUM_SECONDS_BEFORE_HS_REFETCH) {
and if it's stale (we got it more than 15 minutes ago), we call
log_info(LD_REND, "Stale descriptor %s. Re-fetching.",
safe_str(conn->rend_data->onion_address));
rend_client_refetch_v2_renddesc(conn->rend_data);
However, in rend_client_refetch_v2_renddesc() we don't care whether it's stale. Towards the top of the function we call:
if (rend_cache_lookup_entry(rend_query->onion_address, -1, &e) > 0) { log_info(LD_REND, "We would fetch a v2 rendezvous descriptor, but we " "already have that descriptor here. Not fetching."); return; }
So now that we don't try to fetch v0 rend descriptors, that means that Tor simply times out on all socks requests to hidden services for which we have a stale descriptor.
Presumably this is a bug on 0.2.1.x and 0.2.0.x too.
My original rend desc design was "if you have a fresh one, use it. If you have one but it's stale, fetch a new one; then whether you get a new one or no, use the one you have." We seem to have clobbered that design in 0.2.0.20-rc (r13540) when we added support for v0 and v2 rend descs. In any case, refetching after 15 minutes was cheap back in the days of Tor 0.0.9, but is expensive now.
So the two extremes we have are:
- connection_ap_handshake_rewrite_and_attach() which says "it's stale after 15 minutes; try to fetch a new one if it's stale" and
- /** Time period for which a v2 descriptor will be valid. / #define REND_TIME_PERIOD_V2_DESC_VALIDITY (2460*60)
Do we want something in between?
(Thanks to neoeinstein for tracking down the bug.)
[Automatically added by flyspray2trac: Operating System: All]