Opened 11 years ago

Closed 2 years ago

Last modified 2 years ago

#847 closed defect (worksforme)

Extreme values in downloading directory information

Reported by: karsten Owned by: nickm
Priority: Low Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.1.2.19
Severity: Normal Keywords: performance bootstrap tor-client
Cc: karsten, nickm, arma, Sebastian Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

Sometimes it takes a really long time until Tor has downloaded enough
directory information to build circuits. In an experiment with 1,220
attempts to start new Tor instances, the following times were measured for
downloading directory information (in seconds):

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.732 5.016 7.602 35.680 17.210 681.300

The highest 50 values are:

141.560 151.013 151.880 164.672 169.010 184.269 184.280 187.733 207.903
222.899 238.430 248.529 250.513 252.959 265.249 269.104 281.840 295.144
307.421 307.719 307.863 307.940 308.403 308.438 308.672 308.730 308.741
309.053 309.293 309.357 309.839 310.033 310.259 310.275 311.930 313.203
313.385 313.597 315.260 317.916 318.327 325.420 362.017 367.175 452.797
487.972 489.534 537.029 615.759 681.341

A look at the log files reveals that Tor attempts to fetch network
statuses or authority certificates from a single directory that is
unavailable at the moment. The request times out after 120 seconds, and
another attempt is made---possibly to the same directory. During the
measurements, dannenberg appeared to be offline, so that attempts to that
authority timed out:

egrep "Bootstrap|86.59.21.38|216.224.124.114|213.73.91.31|141.13.4.202|128.31.0.34|194.109.206.212" log
Nov 01 08:42:02.390 [info] Bootstrapped 0%: Starting.
Nov 01 08:42:02.390 [info] connection_ap_make_link(): Making internal direct tunnel to 194.109.206.212:443 ...
Nov 01 08:42:02.391 [notice] Bootstrapped 5%: Connecting to directory server.
Nov 01 08:42:02.393 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ...
Nov 01 08:42:02.393 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z
Nov 01 08:42:02.413 [notice] Bootstrapped 10%: Finishing handshake with directory server.
Nov 01 08:42:02.603 [info] command_process_versions_cell(): Negotiated version 2 with 194.109.206.212; sending NETINFO.
Nov 01 08:42:02.604 [notice] Bootstrapped 15%: Establishing an encrypted directory connection.
Nov 01 08:42:02.605 [info] command_process_netinfo_cell(): Got good NETINFO cell from 194.109.206.212; OR connection is now open, using protocol version 2
Nov 01 08:42:02.693 [notice] Bootstrapped 20%: Asking for networkstatus consensus.
Nov 01 08:42:02.790 [info] connection_dir_client_reached_eof(): Received authority certificates (size 9804) from server '194.109.206.212:443'
Nov 01 08:42:35.033 [info] command_process_versions_cell(): Negotiated version 2 with 213.73.91.31; sending NETINFO.
Nov 01 08:44:02.857 [info] connection_ap_expire_beginning(): Tried for 120 seconds to get a connection to 213.73.91.31:443. Giving up. (waiting for circuit)
Nov 01 08:44:02.857 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ...
Nov 01 08:44:02.858 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z
Nov 01 08:46:02.317 [info] connection_ap_expire_beginning(): Tried for 120 seconds to get a connection to 213.73.91.31:443. Giving up. (waiting for circuit)
Nov 01 08:47:07.565 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ...
Nov 01 08:47:07.565 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z
Nov 01 08:47:35.669 [info] run_connection_housekeeping(): Expiring non-open OR connection to fd 16 (213.73.91.31:443).
Nov 01 08:47:35.669 [info] connection_ap_fail_onehop(): Closing onehop stream to '$7BE683E65D48141321C5ED92F075C55364AC7123/213.73.91.31' because the OR conn just failed.
Nov 01 08:53:13.929 [info] connection_ap_make_link(): Making internal direct tunnel to 216.224.124.114:9090 ...
Nov 01 08:53:13.930 [info] directory_send_command(): Downloading consensus from 216.224.124.114:9090 using /tor/status-vote/current/consensus.z
Nov 01 08:53:15.788 [info] command_process_versions_cell(): Negotiated version 2 with 216.224.124.114; sending NETINFO.
Nov 01 08:53:15.788 [info] command_process_netinfo_cell(): Got good NETINFO cell from 216.224.124.114; OR connection is now open, using protocol version 2
Nov 01 08:53:16.677 [notice] Bootstrapped 25%: Loading networkstatus consensus.
Nov 01 08:53:21.322 [info] connection_dir_client_reached_eof(): Received consensus directory (size 256583) from server '216.224.124.114:9090'
Nov 01 08:53:21.345 [notice] Bootstrapped 45%: Asking for relay descriptors.
Nov 01 08:53:21.742 [notice] Bootstrapped 50%: Loading relay descriptors.
Nov 01 08:53:22.105 [notice] Bootstrapped 59%: Loading relay descriptors.
Nov 01 08:53:22.652 [notice] Bootstrapped 69%: Loading relay descriptors.
Nov 01 08:53:23.092 [notice] Bootstrapped 80%: Connecting to the Tor network.
Nov 01 08:53:24.408 [notice] Bootstrapped 90%: Establishing a Tor circuit.
Nov 01 08:53:25.394 [notice] Bootstrapped 100%: Done.
Nov 01 08:53:59.577 [info] connection_ap_make_link(): Making internal anonymized tunnel to 141.13.4.202:443 ...
Nov 01 09:08:24.117 [info] run_connection_housekeeping(): Expiring non-used OR connection to fd 15 (194.109.206.212:443) [Not in clique mode].

Possible fixes could be shorter timeouts or parallel requests. However,
these fixes have side-effects on network load. Maybe there is also a way to
detect and exclude unavailable directory more quickly (requests are
tunneled directory requests using one-hop circuits).

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (17)

comment:1 Changed 11 years ago by nickm

Hm. Perhaps our usual methods for noticing that directories aren't working and excluding them don't work
when we're using one-hop tunnels. This needs more investigation.

comment:2 Changed 11 years ago by karsten

r17189 should be part of the solution. Tunneled directory requests and client-side
introduction circuits should time out after the 30 seconds, which they didn't. This
should at least mitigate the problem.

comment:3 Changed 11 years ago by karsten

Mystery resolved. The bad values are simply the result of directory
downloads that take a very long time. Tor waits for a directory download to
either finish or fail before making another one. That attempt might then be
directed to the same slow authority. The problem occurs when downloading
the consensus or authority certificates (but is more serious for the
latter, because it's bigger).

This becomes visible in the log files by large gaps between two log
statements (all pairs originate from different test runs):

Nov 15 00:34:35.667 [info] _connection_free(): Freeing linked Directory
connection [client reading] with 0 bytes on inbuf, 0 on outbuf.
Nov 15 00:40:11.906 [info] update_consensus_networkstatus_downloads():
Launching networkstatus consensus download.
--
Nov 14 23:06:07.497 [notice] Bootstrapped 25%: Loading networkstatus
consensus.
Nov 14 23:12:59.818 [info] run_connection_housekeeping(): Expiring wedged
directory conn (fd -1, purpose 14)
--
Nov 14 05:18:50.407 [notice] Bootstrapped 25%: Loading networkstatus
consensus.
Nov 14 05:25:01.475 [info] connection_edge_process_relay_cell(): -1: end
cell (closed normally) for stream 28178. Removing stream.
--
Nov 15 09:45:03.075 [notice] We're missing a certificate from authority
with signing key 8C4CA02AA1AA71FCC56F9F4A46E09A4719DD75C3: launching
request.
Nov 15 09:46:04.294 [notice] We're missing a certificate from authority
with signing key D6D2325E1511B23A825DBE1CFD3DF9285AAE4DEB: launching
request.

comment:4 Changed 11 years ago by Sebastian

Following patch is from mortalcombat.

--- connection.c	Wed Nov 12 10:47:36 2008
+++ connection.dir_linked.c	Fri Nov 21 10:02:40 2008
@@ -1552,7 +1552,16 @@
   if (authdir_mode(get_options()) && priority>1)
     return 0; /* there's always room to answer v2 if we're an auth dir */
 
-  if (!connection_is_rate_limited(conn))
+  tor_assert(conn->_base.type == CONN_TYPE_DIR);
+  /* connection_is_rate_limited() helping with exclude moved bytes of
+   * buffers of linked conns from extra decrements of our buckets.
+   * But this call is exception, only local conns (why? whose it's conns?)
+   * not rate limited for directory operations. Tunneled directory conns 
+   * should be limitted.
+   * Else answered bytes just server's memory and client's time consume  
+   *  if not enought bandwith present.
+   */
+  if (!connection_is_rate_limited(conn) && !conn->linked)
     return 0; /* local conns don't get limited */
 
   if (smaller_bucket < (int)attempt)
Last edited 2 years ago by arma (previous) (diff)

comment:5 Changed 11 years ago by arma

This patch is also from mortalkombat. I'm worried that it might return 1
for authdirs in the !old_dir_conn case, but I will have to review it
once I've slept to be sure.

--- connection.c	Wed Nov 12 10:47:36 2008
+++ connection.dir_linked.c	Fri Nov 21 12:06:38 2008
@@ -1549,16 +1549,21 @@
 {
   int smaller_bucket = global_write_bucket < global_relayed_write_bucket ?
                        global_write_bucket : global_relayed_write_bucket;
-  if (authdir_mode(get_options()) && priority>1)
+  int old_dir_conn = !conn->linked;
+  tor_assert(conn->type == CONN_TYPE_DIR);
+  /* tunneled conns for low bandwith authorities too slow, check for bucket */
+  if (authdir_mode(get_options()) && priority > 1 && old_dir_conn)
     return 0; /* there's always room to answer v2 if we're an auth dir */
 
-  if (!connection_is_rate_limited(conn))
+  /* tunneled conns for limited bandwith too slow, better 503 */
+  if (!connection_is_rate_limited(conn) && old_dir_conn)
     return 0; /* local conns don't get limited */
 
   if (smaller_bucket < (int)attempt)
     return 1; /* not enough space no matter the priority */
 
-  if (write_buckets_empty_last_second)
+  /* tunneled conns it's part of Tor's traffic, limited already */
+  if (write_buckets_empty_last_second && old_dir_conn)
     return 1; /* we're already hitting our limits, no more please */
 
   if (priority == 1) { /* old-style v1 query */
Last edited 2 years ago by arma (previous) (diff)

comment:6 Changed 11 years ago by rovv

Limiting of service from a limited number of authdirs
can be a convenient target for an attack on a denial of service.
(new clients with empty the cache of documents as potential victims

for the proposed patch )

However, current versions have such documents also, it's a certificates
(while other unlimited, such as consensus), and attack might
be directed against all recipients - as mirrors and clients.
For authdir perhaps better rule: slowly but distributed among all, as quickly but
only for the attacker.
There are two ways to solve the problem of congestion
while receiving of consensus from authdirs (and/or):
1) Encourage the owners of authdirs with congestion to increase the ratio of

bandwidthburst / bandwithrate (with reachable bandwidthburst in practice);

2) clients reduce allowable idletime in obtaining data from the directory,
and after trigger timer to avoid re-use this directory while receiving data.

(The remaining two items of patch is tasks to integrate affordable of

global write bucket on the side of mirrors for all documents or
certificates from authdirs, with intent distribute documents
through tunneled dir conns)

comment:7 Changed 11 years ago by arma

Another component to fixing this is that authorities with low bandwidth should set
MaxAdvertisedBandwidth 10 KB
so they attract less traffic.

But the other fixes above should be considered too.

comment:8 Changed 11 years ago by nickm

The basis of the patches above seems like a good idea, though we should probably refactor some to make it cleaner.
(The big thing is that connection_is_rate_limited() probably wants to allow unlimited reads from a linked connection,
but limit writes to it so as not to clog our ram. This might mean that we pull the linked_conn check higher, or such.)

Roger, I think you're right that the second version can return 1 for authdirs.

I think that for 0.2.1.x we should:

  • Fix the possibility if any of linked connection buffers getting enormous.
  • Fix the interaction of linked_conn and connection_is_rate_limited.

Idea 2 above (reduce timeout on fetches and treat a timeout as making a directory less desirable) is a fine one,
but it seems tricky to get right, and should maybe wait for 0.2.2.x and get a proposal. The case I'm worried
about is the one in which a user has a really slow connection such that they can't get _any_ consensus under
the default timeout, and so they fail entirely.

comment:9 Changed 10 years ago by Sebastian

Does Mike's work affect this in any way?

comment:10 Changed 9 years ago by nickm

Description: modified (diff)
Milestone: 0.2.1.x-finalTor: 0.2.1.x-final

comment:11 Changed 9 years ago by nickm

Milestone: Tor: 0.2.1.x-finalTor: 0.2.2.x-final
Version: 0.2.1.5-alphaTor: 0.1.2.19

comment:12 Changed 9 years ago by arma

Triage: nobody's getting bitten by this any more now than then, so it can survive until after 0.2.2.x gets out.

comment:13 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-finalTor: unspecified

comment:14 Changed 8 years ago by arma

Keywords: performance bootstrap added

comment:15 Changed 7 years ago by nickm

Keywords: tor-client added

comment:16 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:17 Changed 2 years ago by nickm

Cc: karsten,nickm,arma,Sebastiankarsten, nickm, arma, Sebastian
Resolution: Noneworksforme
Severity: Normal
Status: assignedclosed

I'm thinking that the analysis in this ticket is no longer relevant; we've refactored download timing a lot.

Note: See TracTickets for help on using tickets.