Opened 3 months ago

Last modified 7 weeks ago

#30344 new defect

conn_read_callback is called on connections that are marked for closed

Reported by: robgjansen Owned by:
Priority: Medium Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version: Tor: 0.3.5.8
Severity: Normal Keywords: tor-conn, 035-backport, 041-deferred-20190530
Cc: iang Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

There is a bug causing busy loops in Libevent and infinite loops in the Shadow simulator. In my Shadow simulations, I have observed that conn_read_callback is getting called on a connection that is marked for close.

This is similar to #5263, and as described there, I believe it is a bug for conn_read_callback to be called on sockets that are marked for close.

The bug is problematic in Shadow when the marked connection also wants to flush, but attempting to write the outbuf inside conn_close_if_marked fails because the write would block (because the kernel write buffer is already full). Because it still wants to flush, the connection does not get closed, but the connection remains readable causing Libevent to continue looping on conn_read_callback until the socket buffer can actually write. This wastes CPU resources in Tor, and causes an infinite loop in Shadow because Shadow's discrete-event time does not advance during this loop.

I found the bug on 0.3.5.8, and it is probably present at least since then.

I think the conn should not be waiting for read events when it is marked. I'm not sure where in the code this assumption is first violated, but the following patch fixed the issue in my Shadow simulations:

diff --git a/src/core/mainloop/mainloop.c b/src/core/mainloop/mainloop.c
index 6e2b300..e82c77a 100644
--- a/src/core/mainloop/mainloop.c
+++ b/src/core/mainloop/mainloop.c
@@ -894,6 +894,21 @@ conn_read_callback(evutil_socket_t fd, short event, void *_conn)
   }
   assert_connection_ok(conn, time(NULL));
 
+  if (conn->marked_for_close && connection_is_reading(conn)) {
+      /* Libevent says we can read, but we are marked so we will never try
+       * to read again. We will try to close the connection below inside of
+       * close_closeable_connections(), but let's make sure not to cause
+       * Libevent to spin on conn_read_callback() while we wait for the
+       * socket to let us flush to it.*/
+      connection_stop_reading(conn);
+
+      /* Now, if we still have data to flush, then make sure Libevent tells
+       * us when the conn will allow us to write the bytes. */
+      if (connection_wants_to_flush(conn) && !connection_is_writing(conn)) {
+          connection_start_writing(conn);
+      }
+  }
+
   if (smartlist_len(closeable_connection_lst))
     close_closeable_connections();
 }

Child Tickets

Change History (4)

comment:1 Changed 3 months ago by robgjansen

I wonder if the better place to stop reading on marked connections is inside of connection_mark_for_close_internal_, which appears to be the only place (outside of testing code) where the conn->marked_for_close state variable is written (modified).

diff --git a/src/core/mainloop/connection.c b/src/core/mainloop/connection.c
index f2a646c..e24d349 100644
--- a/src/core/mainloop/connection.c
+++ b/src/core/mainloop/connection.c
@@ -941,6 +941,12 @@ connection_mark_for_close_internal_, (connection_t *conn,
    * the number of seconds since last successful write, so
    * we get our whole 15 seconds */
   conn->timestamp_last_write_allowed = time(NULL);
+
+  /* We should never listen for read events on marked connections, because
+   * we never try to actually read from the connection again. */
+  if (connection_is_reading(conn)) {
+    connection_stop_reading(conn);
+  }
 }
 
 /** Find each connection that has hold_open_until_flushed set to

An issue with this approach is even though we disable read events as above, we might enable it somewhere else (which would be a bug).

To ensure that bug never occurs, we could add a check in connection_start_reading so that we return if the connection is marked.

diff --git a/src/core/mainloop/mainloop.c b/src/core/mainloop/mainloop.c
index e82c77a..7922156 100644
--- a/src/core/mainloop/mainloop.c
+++ b/src/core/mainloop/mainloop.c
@@ -641,6 +641,10 @@ connection_start_reading,(connection_t *conn))
     return;
   }
 
+  if (conn->marked_for_close) {
+      return;
+  }
+
   if (conn->linked) {
     conn->reading_from_linked_conn = 1;
     if (connection_should_read_from_linked_conn(conn))

Or maybe the logic should be added to connection_check_event?

(Note that my diffs are on a custom branch and may not apply cleanly.)

comment:2 Changed 2 months ago by dgoulet

Keywords: tor-conn 035-backport added
Milestone: Tor: 0.4.1.x-final

comment:3 Changed 7 weeks ago by nickm

Keywords: 041-deferred-20190530 added

Marking these tickets as deferred from 041.

comment:4 Changed 7 weeks ago by nickm

Milestone: Tor: 0.4.1.x-finalTor: 0.4.2.x-final
Note: See TracTickets for help on using tickets.