Opened 10 years ago

Closed 9 years ago

Last modified 5 years ago

#997 closed defect (fixed)

Hidden services fail to load if you have a stale descriptor

Reported by: arma Owned by: karsten
Priority: High Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: 0.2.1.15-rc
Severity: Keywords:
Cc: arma, nickm, neoeinstein, karsten, Sebastian Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by arma)

Using git head.

In connection_ap_handshake_rewrite_and_attach(), we check
rend_cache_lookup_entry(). If >0 (meaning we have a rend descriptor),
we then evaluate:

/ How long after we receive a hidden service descriptor do we consider

  • it valid? */

#define NUM_SECONDS_BEFORE_HS_REFETCH (60*15)

if (now - entry->received < NUM_SECONDS_BEFORE_HS_REFETCH) {

and if it's stale (we got it more than 15 minutes ago), we call

log_info(LD_REND, "Stale descriptor %s. Re-fetching.",

safe_str(conn->rend_data->onion_address));

rend_client_refetch_v2_renddesc(conn->rend_data);

However, in rend_client_refetch_v2_renddesc() we don't care whether
it's stale. Towards the top of the function we call:

if (rend_cache_lookup_entry(rend_query->onion_address, -1, &e) > 0) {

log_info(LD_REND, "We would fetch a v2 rendezvous descriptor, but we "

"already have that descriptor here. Not fetching.");

return;

}

So now that we don't try to fetch v0 rend descriptors, that means that
Tor simply times out on all socks requests to hidden services for which
we have a stale descriptor.

Presumably this is a bug on 0.2.1.x and 0.2.0.x too.

My original rend desc design was "if you have a fresh one, use it. If
you have one but it's stale, fetch a new one; then whether you get a new
one or no, use the one you have." We seem to have clobbered that design
in 0.2.0.20-rc (r13540) when we added support for v0 and v2 rend descs.
In any case, refetching after 15 minutes was cheap back in the days of
Tor 0.0.9, but is expensive now.

So the two extremes we have are:
1) connection_ap_handshake_rewrite_and_attach() which says "it's stale
after 15 minutes; try to fetch a new one if it's stale" and
2) / Time period for which a v2 descriptor will be valid. */
#define REND_TIME_PERIOD_V2_DESC_VALIDITY (24*60*60)

Do we want something in between?

(Thanks to neoeinstein for tracking down the bug.)

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (14)

comment:1 Changed 10 years ago by arma

Marcus Griep suggested on or-dev that we change
rend_client_refetch_v2_renddesc() so it tries to use the rend desc
if it has one:

--- a/src/or/rendclient.c
+++ b/src/or/rendclient.c
@@ -495,6 +495,7 @@ rend_client_refetch_v2_renddesc(const rend_data_t *rend_quer
y)

if (rend_cache_lookup_entry(rend_query->onion_address, -1, &e) > 0) {

log_info(LD_REND, "We would fetch a v2 rendezvous descriptor, but we "

"already have that descriptor here. Not fetching.");

+ rend_client_desc_trynow(rend_query->onion_address);

return;

}
log_debug(LD_REND, "Fetching v2 rendezvous descriptor for service %s",

But I tracked things further, and the call in question is happening from
connection_ap_handshake_rewrite_and_attach() when that code gets triggered.

So if we want to go that route, we should simply remove the "if it's stale
then try to refetch" code from connection_ap_handshake_rewrite_and_attach().

comment:2 Changed 10 years ago by arma

Specifically, the better version of Marcus's patch is:

diff --git a/src/or/connection_edge.c b/src/or/connection_edge.c
index 6207ad3..fe25c25 100644
--- a/src/or/connection_edge.c
+++ b/src/or/connection_edge.c
@@ -1678,22 +1678,12 @@ connection_ap_handshake_rewrite_and_attach(edge_connecti
on_t *conn,
                safe_str(conn->rend_data->onion_address));
       rend_client_refetch_v2_renddesc(conn->rend_data);
     } else { /* r > 0 */
-/** How long after we receive a hidden service descriptor do we consider
- * it valid? */
-#define NUM_SECONDS_BEFORE_HS_REFETCH (60*15)
-      if (now - entry->received < NUM_SECONDS_BEFORE_HS_REFETCH) {
-        conn->_base.state = AP_CONN_STATE_CIRCUIT_WAIT;
-        log_info(LD_REND, "Descriptor is here and fresh enough. Great.");
-        if (connection_ap_handshake_attach_circuit(conn) < 0) {
-          if (!conn->_base.marked_for_close)
-            connection_mark_unattached_ap(conn, END_STREAM_REASON_CANT_ATTACH);
-          return -1;
-        }
-      } else {
-        conn->_base.state = AP_CONN_STATE_RENDDESC_WAIT;
-        log_info(LD_REND, "Stale descriptor %s. Re-fetching.",
-                 safe_str(conn->rend_data->onion_address));
-        rend_client_refetch_v2_renddesc(conn->rend_data);
+      conn->_base.state = AP_CONN_STATE_CIRCUIT_WAIT;
+      log_info(LD_REND, "Descriptor is here. Great.");
+      if (connection_ap_handshake_attach_circuit(conn) < 0) {
+        if (!conn->_base.marked_for_close)
+          connection_mark_unattached_ap(conn, END_STREAM_REASON_CANT_ATTACH);
+        return -1;
       }
     }
     return 0;
Last edited 5 years ago by arma (previous) (diff)

comment:3 Changed 10 years ago by karsten

The second patch looks good to me. We shouldn't really care whether a
descriptor is more than 15 minutes in our cache or not. There is a separate
logic that ensures that we don't ask a v2 hidden service directory more
than once within 15 minutes. So, I think it's fine to apply this patch to
master (see 6a32beb in my public branch bug997-hidservfetch).

Roger had two valid points in #tor-dev:

a) As a client we might want to expire descriptors earlier than 24 hours
after receiving them. A descriptor that is 23:59 hours old is rather
unlikely to contain valid introduction points. However, making a descriptor
fetch is just as expensive as trying an introduction point. It's hard to
tell when would be a better time 15min < x < 24h when fetching a new
descriptor beats trying the remaining introduction points. Depends on how
stable introduction points are.

b) We should make sure that the code to fetch a new descriptor when all
introduction points have failed works. From
rend_client_remove_intro_point() it looks like it should work. Whenever we
don't have a descriptor or no more introduction points remain, we start
fetching a new descriptor. Of course, if that means fetching the same
descriptor from a directory more than once within 15 minutes, we don't do
that. If we have tried all directories before within the past 15 minutes,
we give up.

What is the plan wrt maint-0.2.1 and maint-0.2.0? The current behavior
there is that if we have a stale descriptor, a v0 descriptor is re-fetched,
but a v2 descriptor is not (which is why we didn't find this bug earlier).
With more services running on v2 in the future we should change this. We
could use the same logic of screwing the 15-minutes freshness period and
fetching a new descriptor only when the one we got has no working
introduction points in it anymore. I'll backport manually (that code has
lots of changes in it from dropping v0 descriptors) when we think this is a
good idea.

comment:4 Changed 10 years ago by arma

It looks to me like rend_client_remove_intro_point() doesn't do what we want.
It removes the intro point, and if no intro points remain, it triggers a
fetch for the descriptor.

But isn't rend_client_refetch_v2_renddesc() going to call
rend_cache_lookup_entry() which will return the descriptor and then cancel
the fetch?

One not-so-bad kludge would be to return "no i don't have it" in
rend_cache_lookup_entry() if the entry has no intro points. That would
make me wonder about fetching a hidserv desc that explicitly has no intro
points; then we in effect refuse to cache it.

Another not-so-bad kludge would be to actually delete the entry if
rend_client_remove_intro_point() determines there are no intro points
remaining. I think that's the better one.

comment:5 Changed 10 years ago by arma

Ung. I just looked through all the rend_cache code. While option #2 above is
the cleaner one in theory, in practice it totally isn't. So I recommend option
#1 for 0.2.0 and 0.2.1:

diff --git a/src/or/rendcommon.c b/src/or/rendcommon.c
index c813c49..df7195e 100644
--- a/src/or/rendcommon.c
+++ b/src/or/rendcommon.c
@@ -911,6 +911,11 @@ rend_cache_lookup_entry(const char *query, int version, ren
   }
   if (!*e)
     return 0;
+  tor_assert((*e)->parsed && (*e)->parsed->intro_nodes);
+  /* XXX022 hack for now, to return "not found" if there are no intro
+   * points remaining. See bug 997. */
+  if (smartlist_len((*e)->parsed->intro_nodes) == 0)
+    return 0;
   return 1;
 }

Then we can put that into 0.2.2 for now too, and decide later if we want
to fix it better.

Last edited 5 years ago by arma (previous) (diff)

comment:6 Changed 10 years ago by karsten

Hmm, looks like rend_client_remove_intro_point() really doesn't do what we
want. The proposed option #1 fixes that, right. I also changed the comment
for rend_client_remove_intro_point() to actually say what it's doing.

As for descriptors with no introduction points, we might refuse to cache
them. But that isn't that bad. We will try the other hidden service
directories and see if they have a descriptor with introduction points in
it. As a client we want to reach a hidden service and not stop after
looking at the first descriptor that says it's unreachable. Maybe that is
the one that was not updated, but the newer descriptors contain valid
introduction points.

See ca8708a in my public branch bug997-hidservfetch.

comment:7 Changed 10 years ago by karsten

In retrospect, I'm wondering if backporting the unchanged patch to 0.2.1
and 0.2.0 was such a good idea. There is no limit for downloading the same
v0 descriptor over and over again.

Consider the following situation: A hidden service publishing only v0
descriptors publishes a descriptor with 0 introduction points. The 0.2.1
and 0.2.0 clients download it, store it, see that it has no valid
introduction points, and start downloading it again. They don't realize
anymore that the descriptor is less than 15 minutes in their cache. Should
we change this?

This is different for 0.2.2 as clients only download v2 descriptors and
there is a limit for downloading the same descriptor only once from any
given directory. So, master should be fine.

comment:8 Changed 10 years ago by karsten

We agreed that the backport from 0.2.2 might break things in 0.2.1 and
0.2.0 with version 0 and 2 descriptors being fetched in parallel.

This is the new (less invasive) approach for 0.2.1 and 0.2.0:

13:48:35 < armadev> another option is to change 0.2.1.x so that in the
check in rend_client_refetch_v2_renddesc(), it also checks if the desc it
has is fresher than 15 minutes, in which case only then does it quit.

See the two commits f266ecb and 20883f5 in branch bug997-maint-0.2.1 in my
public repository at git://git.torproject.org/~karsten/git/tor.

comment:9 Changed 10 years ago by karsten

This error happened with git master:

02:36:19 < armadev> Jun 22 20:07:21.571 [warn] rend_client_send_introduction(): Bug: Internal error:

could not find intro key.

02:36:19 < armadev> Jun 22 20:07:21.572 [warn] Rendezvous attempt failed. Closing.
02:37:00 < armadev> more specifically,
02:37:01 < armadev> Jun 22 20:07:21.571 [info] connection_ap_handshake_attach_circuit(): found open
02:37:01 < armadev> intro circ 59068 (rend 59056); sending introduction. (stream 0 sec old)
02:37:01 < armadev> Jun 22 20:07:21.571 [warn] rend_client_send_introduction(): Bug: Internal error:
02:37:01 < armadev> could not find intro key.
02:37:21 < armadev> i assume this is related to our bug 997 "fixes"

comment:10 Changed 9 years ago by nickm

Milestone: 0.2.1.x-finalTor: 0.2.2.x-final

comment:11 Changed 9 years ago by arma

Description: modified (diff)
Owner: set to karsten
Priority: minormajor
Status: newassigned

Triage: it looks like we committed some code for this, into various branches, without writing on the bug report what we did.

We should look into this, and figure out the current bug status.

Marking as blocking 0.2.2.x so we be sure to get a better handle on it.

comment:12 Changed 9 years ago by Sebastian

Hm. It looks like we merged all the patches for 0.2.2.x. I wonder if there actually is something missing still? I think we would be fine closing this.

comment:13 Changed 9 years ago by arma

Resolution: Nonefixed
Status: assignedclosed

I looked over the various patches, and the ones that look useful look like they're part of the codebase now. So I'm going to guess that we fixed all the bugs at the time, and never closed the bug report because by the time we became convinced the bugs were fixed, we'd forgotten about the bug report.

So I'm going to close. If people experience new hidden service bugs, they should make a new trac entry.

comment:14 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.