Opened 11 years ago

Closed 6 years ago

#872 closed defect (fixed)

Endless loop of requests if broken resolves.

Reported by: rovv Owned by:
Priority: Low Milestone: Tor: unspecified
Component: Core Tor/Tor Version: 0.2.0.31
Severity: Keywords: tor-client
Cc: nickm, arma Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

Client to ignore all DNS answers with local addresses by default (v 0.2.0.32).
However, it lead connection_edge_process_end_not_open() to
endless loop of connections attempt (connection_ap_detach_retriable())
via the same exit, if relay cell (END_STREAM_REASON_EXITPOLICY)
contains local addresses.
Attached some ideas of fix.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Attachments (1)

relay.diff (1.7 KB) - added by rovv 11 years ago.
Q, and ideas only.

Download all attachments as: .zip

Change History (17)

Changed 11 years ago by rovv

Attachment: relay.diff added

Q, and ideas only.

comment:1 Changed 11 years ago by rovv

Sadly, patch for patch. Tor-level's endless loop of relay_begin<->relay_end
possibly as bad as just caching local address.
Feel free to reinspect all my proposed patches from past and go to ignore from future.

comment:2 Changed 11 years ago by nickm

Hm. We could detach it and increment num_socks_retries, I think. Does num_socks_retries not work to prevent
infinite loops on stream reattachment?

comment:3 Changed 11 years ago by rovv

Re-attempt connections to the same circ will go without stopping, but
not infinite. Perhaps the word 'endless' used incorrectly to describe
situation.

Limiting of infinite reuse of the same circuit still works, but
until such time (10 minutes by default) will be selected the one and the same
circuit with a high degree of probability.
Consider the three scenario to RELAY_COMMAND_BEGIN containing the hostname:

1) Exit node (A) answer with RELAY_COMMAND_END(END_STREAM_REASON_EXITPOLICY)

and in addition reported (optional data) that resolved of hostname to
ip address 11.22.33.44
Client rewrite socks_request->address to an IP address which it learned,
and while (connection_ap_detach_retriable()) will select the same circ
with the same node (A) only if the uncertainty (for different reasons)
the available data on the exit policy node A. If again repeats
RELAY_COMMAND_END(END_STREAM_REASON_EXITPOLICY) then client mark node (A) as
non-exit (policies_set_router_exitpolicy_to_reject_all())
And avoid the recurrence of further queries to the node (A)
Maximum 2 request.

2) Exit node (B) answer with RELAY_COMMAND_END(END_STREAM_REASON_EXITPOLICY)

and in addition reported (optional data) that resolved of hostname to
ip address 192.168.03.04. If client remember (as before r17342)
and rewrite socks_request->address to this IP address.
And even after connection_ap_detach_retriable() will select the same
circ with node (B)
(if availability permitting rule in policy) everything goes
for the first (1) scenario. Or (more likely), if DNS record was erroneous
for hostname, the client does not find a suitable exit nodes, and complete
attempts to connect.

3) Exit node (C) answer with RELAY_COMMAND_END(END_STREAM_REASON_EXITPOLICY)

and in addition reported (optional data) that resolved of hostname to
ip address 192.168.33.44. If client do not remember (as after r17342)
this address, then socks_request->address still contains the hostname,
and in the choice of circs did not happen changes, so after
connection_ap_detach_retriable() will be selected the same circ.
Request to the same node (C) will be repeated again and again
if client believes the circ reusable.
For unintentional mistake (DNS record or operator's errors)
this is nothing more than a game of ping-pong for 10 minutes
without consequences.
For deliberate resolves to a local addresses this additional
tool in the attacks, the measurement of latency.

For the fix is suffice to declare that circuit is not reusable
(circ->_base.timestamp_dirty -= get_options()-> MaxCircuitDirtiness).
So measuring latency and simple ping-pong through END_STREAM_REASON_EXITPOLICY
would be almost impossible (time for building of new circs,plus a random
selection exit nodes), except for a very limited number of exit nodes and large
distortions of theirs bandwiths.

Request to resolve affected too.
For RELAY_COMMAND_RESOLVE and (it impossible for well-behaved nodes)
answer with RELAY_COMMAND_END(END_STREAM_REASON_EXITPOLICY):

In this case ping-pong is also guaranteed for the third (3) scenario of the above.

Perhaps using of num_socks_retries for limiting will be more reliable mechanism.
But if you just stop after a specified number of requests, remained
application-level and user-level, who may resume inquiries via usable
"ping-pong" circuit, although not so well for the attacker.

Offtask:
The same situiation with a application-level and user-level
remains in the processing RELAY_COMMAND_CONNECTED and RELAY_COMMAND_RESOLVED
contains local addresses
(there is only connection_mark_unattached_ap() happened and the user is left

for a second request with the same circ for the same answer).

P.S. I feel that described all with very horrible language,
but better than it had been unable to.

comment:4 Changed 11 years ago by rovv

Little corrections: AP connection will be detached from "ping-pong" circuit
after SocksTimeout. But circuit itself will be still re-usable.
Well, 2 minutes by default not much for attacks, but erroneous situations
and user-level actions still happens.

comment:5 Changed 11 years ago by rovv

Oh, the same behavior ("ping-pong") for legal END_STREAM_REASON_EXITPOLICY
from nonexisten hidden service (after fix of bug 840)
Offtask:
yet, policies_set_router_exitpolicy_to_reject_all() marks innocent router
as non-exit by mistake, for this case.

comment:6 Changed 11 years ago by rovv

Here's how it should check first. Exactly so arranged in a similar
checks as the code now works.

--- relay.c Thu Nov 20 22:21:32 2008
+++ relay.asis.c Fri Jan 2 13:11:50 2009
@@ -730,6 +730,7 @@

(void) layer_hint; /* unused */


if (rh->length > 0 && edge_reason_is_retriable(reason) &&

+ !connection_edge_is_rendezvous_stream(conn) && /* avoid retry if rend */

conn->_base.type == CONN_TYPE_AP) {

log_info(LD_APP,"Address '%s' refused due to '%s'. Considering retrying.",

safe_str(conn->socks_request->address),

@@ -752,10 +753,15 @@

else

ttl = -1;


  • if (!(get_options()->ClientDNSRejectInternalAddresses &&
  • is_internal_IP(addr, 0)))
  • client_dns_set_addressmap(conn->socks_request->address, addr,
  • conn->chosen_exit_name, ttl);

+ if (get_options()->ClientDNSRejectInternalAddresses &&
+ is_internal_IP(addr, 0)) {
+ log_info(LD_APP,"Address '%s' resolved to internal. Closing,",
+ safe_str(conn->socks_request->address));
+ connection_mark_unattached_ap(conn, END_STREAM_REASON_TORPROTOCOL);
+ return 0;
+ }
+ client_dns_set_addressmap(conn->socks_request->address, addr,
+ conn->chosen_exit_name, ttl);

}
/* check if he *ought* to have allowed it */
if (exitrouter &&

User must learn to send 'NEWNYM' finally, if all of a sudden
it whatever want to successfully complete its request.

comment:7 Changed 11 years ago by arma

Another option is to treat an exit relay that gives you an internal IP
address as confused, and not ask him to exit any streams for a while. That
is, set his exit policy (from the client's perspective) to reject *:*. Then
when we detach the stream, we won't retry him.

Of course, if the user types in "localhost", he will then switch off a half
dozen exit relays, until a few hours from now when he gets a new directory
consensus and a new set of exit policy beliefs. And if he keeps hitting
reload, he could conceivably refuse to use any exit relays for those few
hours. Hm.

The idea of just marking the current circuit as old also has problems when
the address actually does map to an internal IP address. In this case we keep
cancelling circuits until we have no more circuits, and then we frantically
build circuits and cancel them until SocksTimeOut arrives, which could be a
lot of circuits.

So in conclusion, I think both Nick's suggestion of client_dns_incr_failures()
plus rovv's suggestion of obsoleting the circuit is a good plan.

comment:8 Changed 11 years ago by arma

Of course, there's the question of consistency. When we get a 'connected'
cell that has an internal IP address in it, we just close the stream right
then. And that's a different behavior from what I proposed above.

How often does it happen in practice that we get an internal IP address
for a legitimate public site? Most exit relays should catch that in their
dns bootstrapping checks, if they have something misconfigured and are
returning internal IP addresses for many sites. (Nick, do they actually
check for this?)

comment:9 Changed 11 years ago by nickm

Roger: There's no exit-side check for internal IP addresses specifically. What we test for is whether a significant
number of random hostnames all resolve to the same IP. This should be adequate to catch a lot of brokenness, but
not all of it.

comment:10 Changed 11 years ago by nickm

14:09 * nickm contemplates bug 872.
14:11 < armadev> the easy fix is to kill the stream when you get an internal ip

address back and you don't like internal ip addresses.

14:11 < armadev> that's a bit rude for the user though, if it's just a flaky

relay

14:11 < armadev> if we want to handle that case, we should

dns_resolve_incr_failure it and discard the circuit.

14:12 < armadev> and we should do that for the 'connected' case too, rather

than the current 'kill the stream' that we do there.

14:12 < nickm> I don't like discarding the circuit whenever somebody says

localhost

14:12 < armadev> well, it's just 3 circuits. then we give up.
14:12 < nickm> Nor do I like the idea of a website forcing you to discard a

circuit by saying <img src="http://localhost/evil.png">

14:13 < armadev> they can already force this by asking for a site that isn't

reachable

14:13 < armadev> or isn't resolvable, i believe
14:15 < nickm> Long-term, given lots of time and taking no heed of complexity,

here's what I'd say:

14:16 < nickm> If a resolve fails or gives you an internal IP, don't use that

exit to resolve that address again. If an exit has failed for
all (or just a lot of) addresses, then stop using the circuit.

14:16 < nickm> For now, closing the stream seems reasonable.
14:17 < armadev> right. i agree that our strategy of marking a circuit as "too

dirty for anything" when we really just mean "not this stream
again" is overkill.

14:18 < nickm> It does sound rather like a feature, though. Killing the stream

in this case could be fine.

14:18 < armadev> true. the only downside is that now a single flaky exit can

ruin your day

14:18 < armadev> whereas in the past (a while past), you just move on to a new

circuit

14:19 < armadev> like you do when the exit says "can't resolve"
14:19 < armadev> so i would argue for incr_failure and discard circuit. but

that could be in 0.2.2.x.

14:20 < nickm> If we're going to do the right thing in 0.2.2.x, let's do the

non-overkill right thing. ('we really just mean "not this
stream again"')

14:21 < armadev> do you mean stream or circuit?

[...]

14:53 < nickm> armadev: I was just quoting you. Probably I meant, "Not this

address on this exit again."

comment:11 Changed 11 years ago by nickm

Conservative fix based on above patch applied in r17924.

comment:12 Changed 10 years ago by nickm

Marking "the right thing" as for post-0.2.1.x.

comment:13 Changed 9 years ago by nickm

Description: modified (diff)
Milestone: post 0.2.1.xTor: unspecified

So I no longer remembered what exactly the "right thing" is supposed to be. Let me try to summarize:

  • When a stream fails, or a resolve fails or gives us a weird answer, we should maybe retry on a different exit. Right now we just close the stream.
  • We do not want to mark the whole circuit as invalid, or else any user who types "localhost" will cycle through circuits till they run out of retries. We don't want to mark the exit as invalid, or any user who types "localhost" will soon run out of exits.

The current close-the-stream behavior does not seem to be causing problems in practice AFAICT. So I'm going to move this to the "unspecified" milestone. We should revisit that decision if this someday turns out to cause problems.

comment:14 Changed 7 years ago by nickm

Keywords: tor-client added

comment:15 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:16 Changed 6 years ago by cypherpunks

Cc: nickm,armanickm, arma
Resolution: Nonefixed
Status: newclosed

Bug is fixed.
#10478 opened for address "right thing".

Note: See TracTickets for help on using tickets.