Opened 11 years ago

Last modified 7 years ago

#891 closed defect (Fixed)

Uncertainty and redundant connections, if not actual router descriptor.

Reported by: rovv Owned by: nickm
Priority: Low Milestone: 0.2.1.x-final
Component: Core Tor/Tor Version: 0.2.1.8-alpha
Severity: Keywords:
Cc: nickm, rovv Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Reported version: r17628

Relay does not always know the latest information on all relays of the network.
(And that's OK by itself)
If it serves as a cache, then delays to updating of information
is minimal.
Otherwise, they may be substantial, until the end update
because no builds of origin circuits. In a situation
use not up-to-date information within 48 hours
until removal from routerlist.

But uses non relevant information to create new
connections to replace the existing, can lead to creation of a
permanently redundant connection as well as uncertainty in the choice of
long-term connections between relays.

r17628, creates the conditions under which used non relevant information
and relays with dynamic ip-addresses, plus the use of v1 protocol
("certificates up-front") can lead to difficult detected problems, described
above.

To have to submit why this happens, we must take into account of clients,
which sends cell-extend. Those are also unable to synchronously download up-to-date
information. (and that's OK by itself)

(Here must be a description as can occur within minutes or hours

a continuous series of changes of conns, or simply one redudant)

(This is all really very simple to describe, but requires too many words

if no graphics, at least to my mind)

(If needed, I will still try to do it)

(Recorded and reported upon request by forest.)

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Attachments (1)

bug891-2.patch (19.9 KB) - added by nickm 11 years ago.
Second version of nickm's big patch for this bug

Download all attachments as: .zip

Change History (25)

comment:1 Changed 11 years ago by rovv

Here version of reported fix:

--- circuitbuild.c Sun Dec 21 08:37:02 2008
+++ circuitbuild.fix.c Sun Dec 21 10:15:42 2008
@@ -362,7 +362,8 @@

*state_out = "too old. Launching a new one.";
*launch_out = 1;
return 0;

  • } else if (tor_addr_compare(&n_conn->_base.addr, target_addr, CMP_EXACT) &&

+ /* main: for incoming conn and nonfresh descriptor for it router */
+ } else if (tor_addr_compare(&n_conn->real_addr, target_addr, CMP_EXACT) &&

! n_conn->is_canonical) {

*state_out = "is not from a canonical address. Launching a new one.";
*launch_out = 1;

--- connection_or.c Sun Dec 21 08:37:06 2008
+++ connection_or.fix.c Sun Dec 21 10:06:46 2008
@@ -397,10 +397,11 @@

conn->_base.port = port;
tor_addr_copy(&conn->_base.addr, addr);
tor_addr_copy(&conn->real_addr, addr);

+ tor_free(conn->_base.address);
+ /* cosmetic part: for nonfresh routerinfo_t, r->address can be nonactual */
+ conn->_base.address = tor_dup_addr(addr);

if (r) {

/* XXXX proposal 118 will make this more complex. */

  • if (tor_addr_eq_ipv4h(&conn->_base.addr, r->addr))
  • conn->is_canonical = 1;

if (!started_here) {

/* Override the addr/port, so our log messages will make sense.

  • This is dangerous, since if we ever try looking up a conn by

@@ -416,8 +417,6 @@

conn->_base.port = r->or_port;

}
conn->nickname = tor_strdup(r->nickname);

  • tor_free(conn->_base.address);
  • conn->_base.address = tor_strdup(r->address);

} else {

const char *n;
/* If we're an authoritative directory server, we may know a

@@ -431,8 +430,6 @@

base16_encode(conn->nickname+1, HEX_DIGEST_LEN+1,

conn->identity_digest, DIGEST_LEN);

}

  • tor_free(conn->_base.address);
  • conn->_base.address = tor_dup_addr(addr);

}

}


@@ -910,6 +907,11 @@

conn->_base.state = OR_CONN_STATE_OPEN;
control_event_or_conn_status(conn, OR_CONN_EVENT_CONNECTED, 0);


+ /* main: set canonical for sure connected. prevents uncertainty */
+ routerinfo_t *r = router_get_by_digest(conn->identity_digest);
+ if (r && tor_addr_eq_ipv4h(&conn->real_addr, r->addr))
+ conn->is_canonical = 1;
+

if (started_here) {

rep_hist_note_connect_succeeded(conn->identity_digest, now);
if (entry_guard_register_connect_status(conn->identity_digest,

comment:2 Changed 11 years ago by nickm

So the main changes here are:

A) when picking a connection for a circuit, look at the real_addr field rather than the addr field to

decide whether the connection matches the address we got in the

B) move the check for whether the address matches the one in the routerinfo from

connection_or_init_conn_from_address() to connection_or_set_state_open()

Part A) seems completely correct. But for part B, I'm not sure where the bug was. It seems to me that
in svn as it is today, we check the address against the routerinfo in connection_or_init_conn_from_address(),
which is invoked by connection_or_connect() and by connection_tls_finish_handshake(). So I think that if
we originate the connection, we will check the routerinfo address from connection_or_connect(),
and if we do not originate the connection, we will check it from connection_tls_finish_handshake().

I'm applying part A now, but waiting on part B until I understand the issue. What am I missing?

comment:3 Changed 11 years ago by nickm

[Also, when writing patches, remember to run configure with the "--enable-gcc-warnings" switch. It turns on
more warnings that Tor usually gets built with.]

comment:4 Changed 11 years ago by rovv

Scenario by forest:
Suppose there are two relays Rc and Rd, to simplify take that Rc available
always on a static address, while Rd is a dynamic address.
Suppose there is a subset of clients A and B, knowing different addresses for Rd, i.e.
subset of A the lack of updates descriptor, while subset of B knows the new address of
Rd. Rc knows only the old address of Rd.

For simplicity condition is that the treatment of these subsets in turn.
The client from subset of B creates circ through Rc to Rd, creating and opening
the connection conn_1, which is not canonical.
Following it, client from subset of A is requesting extend of circ to old
address of Rc through Rd. It creates to nowhere connection: conn_2, which is canonical even
(without real connection exist).
After run_connection_housekeeping() connection conn_1 will appear as is_bad_for_new_circs.
(because connection_or_get_by_identity_digest() returns conn_2,

and this is normal logic: if (best->is_canonical && !conn->is_canonical)
continue; /* A canonical connection is best. */ -- It's can not be else)

conn_2 closed on time-out. Remains conn_1 through which is not possible to build
new circs.
And further to the cycle of requests from clients (subset of B) creates and open conn_n,
after requests from clients (subset of A) creates conn_n+1 (canonical) and marks
conn_n (is_bad_for_new_circs).

And so on.

This greatly simplified scenario, of course. But if I am right with it analysis, then
such leapfrog terminated only when Rc generally knows nothing about Rd or
after receives an updated descriptor, or if clients receive descriptors
strictly synchronously (it excludes subset of A).

All this can be avoided by establishing a connection as if the canonical address
in practice, still belongs a statement from router descriptor.
The most convenient place for this: connection_or_set_state_open(),
changing place of checking (detects canonical) the address matches
for initiator of connection.
The responsible party in any case have access to conn
with identity_digest only after the successful handshake
(when connection surely exist).

comment:5 Changed 11 years ago by rovv

[Link protocol between Rc and Rd is v1 ("certificates up-front"). Rc is r17628]

comment:6 Changed 11 years ago by rovv

Wait, but Part B produce heaps of connections to nowhere
by request of clients from Subset A. That is worst.

comment:7 Changed 11 years ago by rovv

okay, it's not a bug (at least not fixable), it's a feature. Please close the task.
btw, how can I edit my old posts?

comment:8 Changed 11 years ago by rovv

--- circuitbuild.c Tue Dec 23 17:41:20 2008
+++ circuitbuild.fix2.c Tue Dec 23 17:44:18 2008
@@ -363,7 +363,7 @@

*launch_out = 1;
return 0;

} else if (tor_addr_compare(&n_conn->real_addr, target_addr, CMP_EXACT)

  • && ! n_conn->is_canonical) {

+ && ! n_conn->is_canonical && n_conn->link_proto >= 2) {

*state_out = "is not from a canonical address. Launching a new one.";
*launch_out = 1;
return 0;

comment:9 Changed 11 years ago by nickm

Ah, I think I get it now.

Router A changes its address from A_old to A_new.
Router B still has a routerdesc with A_old in it.
Client C tells B, "Extend to router A, at A_new!"
Router A is using a version of Tor with the old link protocol, so it never exchanges NETINFO cells with router B.

So B never decides that it has a canonical connection to A, because it can't see that A_new is good in the
routerinfo (it has an old one), and it can't see that A_new is good from the NETINFO cell (it never got one).

Worse, when B later tries to connect to A_old, connection_or_connect calls connection_or_init_conn_from_address,
which will set is_canonical early before the connection even finishes. That will make the older, non-canonical
connection get deprecated in preference to the new connection that never actually finishes.

I think that the small patch (the one marked 6:51 pm above) is part of a solution, but not the whole thing: as
"part B" of the original patch implies, it's probably wrong to call a connection canonical before it has completed.

Also, I think we need to be smarter about how we set is_bad_for_new_circuits.

comment:10 Changed 11 years ago by nickm

I'm working on a patch now that I'd like your feedback on; I'll post it when it's done.

comment:11 Changed 11 years ago by nickm

I just uploaded a big patch as "bug891-1.patch". It's more ambitious than the patches above, but I think it makes the
code cleaner. It:

1) merges connection_or_get_by_identity_digest() and connection_good_enough_for_extend so that if there is a

good enough connection, it will be returned.

2) decouples connection_or_get_by_identity_digest() from the is_bad_for_new_circs logic in main.c.

What do people think?

Changed 11 years ago by nickm

Attachment: bug891-2.patch added

Second version of nickm's big patch for this bug

comment:12 Changed 11 years ago by nickm

Actually, I just uploaded a bug891-2.patch that is better: it doesn't segfault. :)

comment:13 Changed 11 years ago by rovv

What about TLS_HANDSHAKE_TIMEOUT? Or there are some other place for checks
of time-out or_conns in code will be? (Sorry if I am something missing)

Yet, thinking about a new connection after TIME_BEFORE_OR_CONN_IS_TOO_OLD,
if we do not originate the connection then it can be marks as is_bad_for_new_circs
while connection_or_group_set_badness() called before a first cell_create arrived.
(clocks of parties can slowly differ, if somebody not used ntp.

it's should not be much however, without ntp even)

Is it possible? just wonder about transitional processes fot that case.
Acctually, then that's behavior for current stable version also.

comment:14 Changed 11 years ago by rovv

test

comment:15 Changed 11 years ago by rovv

passed

comment:16 Changed 11 years ago by rovv

Please delete my account!

comment:17 Changed 11 years ago by nickm

Why?

comment:18 Changed 11 years ago by nickm

(I ask because I do not have an admin way to remove an account. There's other stuff I could try, but
unless I know what you need, I do not know if it will be good enough.)

comment:19 Changed 11 years ago by nickm

(I shut down your account as hard as I could. Ask weasel if you need it removed more thoroughly for some reason.)

So I think the case described above is one where two hosts, A and B, disagree about how long a connection has been
open, because of clock drift of some kind. Host A then marks the circuit as is_bad_for_new_circs because it thinks
it is too old and opens a new connection. In between when B receives a NETINFO cell on the new connection, and when B
receives a create cell on the new connection, the new connection will seem worse to B than the old one, and so B will
mark it as is_bad_for_new_circs in the second or third loop of connection_or_group_set_badness().

Hm. Tricky.

comment:20 Changed 11 years ago by nickm

Actually, I think it's only the third loop ("Pass 3") that we need to worry about.

comment:21 Changed 11 years ago by nickm

Okay, I think the solution is to give new connections a grace period in which they don't get closed simply because
there is a "better" connection, if the only reason that connection is "better" is that it has circuits.

comment:22 Changed 11 years ago by nickm

The last case is fixed in 18303.

comment:23 Changed 11 years ago by nickm

flyspray2trac: bug closed.

comment:24 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.