Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#6472 closed defect (fixed)

Assertion !connection_is_on_closeable_list

Reported by: torland Owned by:
Priority: High Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version: Tor: 0.2.3.19-rc
Severity: Keywords: tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

TorLand2 (id: 945C BBA5 9980 8018 749D DC4E BB59 2168 F285 8C1B) aborted with this error:

11:13:47 [ERR] connection_free(): Bug: connection.c:575: connection_free: Assertion !connection_is_on_closeable_list(conn) failed; aborting.

│ 11:13:47 [WARN] connection_edge_end(): Bug: (Harmless.) Calling connection_edge_end (reason 7) on an already ended stream?

Child Tickets

Change History (14)

comment:1 in reply to:  description Changed 7 years ago by torland

Sorry wrong order:

11:13:47 [WARN] connection_edge_end(): Bug: (Harmless.) Calling connection_edge_end (reason 7) on an already ended stream?
11:13:47 [ERR] connection_free(): Bug: connection.c:575: connection_free: Assertion !connection_is_on_closeable_list(conn) failed; aborting.

comment:2 Changed 7 years ago by nickm

Any chance of a stack trace on this one?

comment:3 Changed 7 years ago by nickm

Reported in IRC: it could conceivably be from dns_found_answer() or connection_exit_connect.

If so (or even if not), I suggest two fixes and a workaround:

  • Every place that calls connection_free except for connection_unlink(), we need to audit to make it's not reachable on on an already-marked-for-close connection.
  • We should start with the two places above.
  • We should change the behavior of connection_free() for a connection on the closeable list to log a bug and return, rather than freeing the connection. Since it's on the closeable list, it should get freed soon anyway.)

The reporter suggests that we should call connection_dns_remove earlier (perhaps when we mark connections?), and that it should work differently. more info to follow

comment:4 Changed 7 years ago by cypherpunks

I'll leave it here

--- src/or/dns.c
+++ src/or/dns.mod.c
@@ -449,16 +449,17 @@
     if (resolve->pending_connections) {
       log_debug(LD_EXIT,
                 "Closing pending connections on timed-out DNS resolve!");
-      tor_fragile_assert();
       while (resolve->pending_connections) {
         pend = resolve->pending_connections;
         resolve->pending_connections = pend->next;
         /* Connections should only be pending if they have no socket. */
         tor_assert(!SOCKET_OK(pend->conn->_base.s));
         pendconn = pend->conn;
-        connection_edge_end(pendconn, END_STREAM_REASON_TIMEOUT);
-        circuit_detach_stream(circuit_get_by_edge_conn(pendconn), pendconn);
-        connection_free(TO_CONN(pendconn));
+        if (!pendconn->_base.marked_for_close) {
+          connection_edge_end(pendconn, END_STREAM_REASON_TIMEOUT);
+          circuit_detach_stream(circuit_get_by_edge_conn(pendconn), pendconn);
+          connection_free(TO_CONN(pendconn));
+        }
         tor_free(pend);
       }
     }
@@ -1085,6 +1086,13 @@
     pendconn = pend->conn; /* don't pass complex things to the
                               connection_mark_for_close macro */
     assert_connection_ok(TO_CONN(pendconn),time(NULL));
+    if (pendconn->_base.marked_for_close) {
+      /* prevent double-remove. */
+      pendconn->_base.state = EXIT_CONN_STATE_RESOLVEFAILED;
+      resolve->pending_connections = pend->next;
+      tor_free(pend);
+      continue;
+    }
     tor_addr_from_ipv4h(&pendconn->_base.addr, addr);
     pendconn->address_ttl = ttl;

comment:5 Changed 7 years ago by nickm

Status: newneeds_review

comment:6 Changed 7 years ago by nickm

First part looks good. Not sure about second part; would "continue and skip" also be adequate?

comment:7 Changed 7 years ago by cypherpunks

Yes, you no need any action if got answer for already closed connection. You can't to send any answer back. You can't establish requested connection. Connection was marked for close. The End.

You can reformat patch, it duplicates lines with removing action:

+      resolve->pending_connections = pend->next;
+      tor_free(pend);

comment:8 Changed 7 years ago by torland

My other node TorLand1 (id: D223 3999 0711 3A1F 216A AA64 997B C1D4 CFA8 E1AC) showed the same abort:

19:41:19 [WARN] connection_edge_end(): Bug: (Harmless.) Calling connection_edge_end (reason 7) on an already ended stream?
19:41:19 [ERR] connection_free(): Bug: connection.c:575: connection_free: Assertion !connection_is_on_closeable_list(conn) failed; aborting.

May this be triggered from remote?

I attached gdb to catch the next abort.

During package installation of gdb I noticed that my local dns resolver unbound did not work. Had to restart the daemon. I am wondering if this has something to do with this error?

comment:9 in reply to:  4 ; Changed 7 years ago by arma

Replying to cypherpunks:

  • connection_edge_end(pendconn, END_STREAM_REASON_TIMEOUT);
  • circuit_detach_stream(circuit_get_by_edge_conn(pendconn), pendconn);
  • connection_free(TO_CONN(pendconn));

+ if (!pendconn->_base.marked_for_close) {
+ connection_edge_end(pendconn, END_STREAM_REASON_TIMEOUT);
+ circuit_detach_stream(circuit_get_by_edge_conn(pendconn), pendconn);
+ connection_free(TO_CONN(pendconn));
+ }

This can happen in practice when dns_resolve_impl() gets a request for something that was already answered and cached, then the new request is cancelled (by an end cell), and the cached version expires while the request is pending. Then purge_expired_resolves() finds something on resolve->pending_connections and proceeds to free it, but it's on the closeable_list too.

Diagnosed by our irc person. We agree that this race condition seems hard to intentionally exploit in practice.

comment:10 in reply to:  9 Changed 7 years ago by arma

Replying to arma:

This can happen in practice when dns_resolve_impl() gets a request for something that was already answered and cached

Not already answered and cached. It has to sit in the pending queue for the 300 seconds, and then have another request appear during that 300 seconds, and have that other request get ended at just the right time.

So it sounds like Torland's "my unbound was broken at the time" hint is relevant.

Diagnosed by our irc person. We agree that this race condition seems hard to intentionally exploit in practice.

Still seems tough to exploit in practice.

comment:11 Changed 7 years ago by nickm

Put cypherpunks's patch in branch "bug6472"; if folks like the changes file, I'll merge.

comment:12 Changed 7 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Merged this; closing the ticket. Thanks, everyone!

comment:13 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:14 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.