conn_read_callback is called on connections that are marked for closed

changed milestone to %Tor: 0.3.5.x-final

Trac:
Child Ticket(s): #31958 (moved)

added 035-backport 041-deferred-20190530 042-should BugSmashFund component::core tor/tor consider-backport-after-0424 milestone::Tor: 0.3.5.x-final nickm-merge owner::asn priority::medium resolution::fixed reviewer::dgoulet severity::normal status::closed tor-conn type::defect version::tor 0.3.5.8 labels

I wonder if the better place to stop reading on marked connections is inside of connection_mark_for_close_internal_, which appears to be the only place (outside of testing code) where the conn->marked_for_close state variable is written (modified).

diff --git a/src/core/mainloop/connection.c b/src/core/mainloop/connection.c
index f2a646c..e24d349 100644
--- a/src/core/mainloop/connection.c
+++ b/src/core/mainloop/connection.c
@@ -941,6 +941,12 @@ connection_mark_for_close_internal_, (connection_t *conn,
    * the number of seconds since last successful write, so
    * we get our whole 15 seconds */
   conn->timestamp_last_write_allowed = time(NULL);
+
+  /* We should never listen for read events on marked connections, because
+   * we never try to actually read from the connection again. */
+  if (connection_is_reading(conn)) {
+    connection_stop_reading(conn);
+  }
 }
 
 /** Find each connection that has hold_open_until_flushed set to

An issue with this approach is even though we disable read events as above, we might enable it somewhere else (which would be a bug).

To ensure that bug never occurs, we could add a check in connection_start_reading so that we return if the connection is marked.

diff --git a/src/core/mainloop/mainloop.c b/src/core/mainloop/mainloop.c
index e82c77a..7922156 100644
--- a/src/core/mainloop/mainloop.c
+++ b/src/core/mainloop/mainloop.c
@@ -641,6 +641,10 @@ connection_start_reading,(connection_t *conn))
     return;
   }
 
+  if (conn->marked_for_close) {
+      return;
+  }
+
   if (conn->linked) {
     conn->reading_from_linked_conn = 1;
     if (connection_should_read_from_linked_conn(conn))

Or maybe the logic should be added to connection_check_event?

(Note that my diffs are on a custom branch and may not apply cleanly.)

Trac:
Keywords: N/A deleted, tor-conn, 035-backport added
Milestone: N/A to Tor: 0.4.1.x-final

Marking these tickets as deferred from 041.

Trac:
Keywords: N/A deleted, 041-deferred-20190530 added

Trac:
Milestone: Tor: 0.4.1.x-final to Tor: 0.4.2.x-final

Trac:
Keywords: N/A deleted, 042-should added

Distributing 0.4.2 tickets between network team members.

Trac:
Status: new to assigned
Owner: N/A to asn

Trac:
Cc: iang to iang, pastly

Hello Rob, I took your patch from the top comment and slimmed it down a bit to https://github.com/torproject/tor/pull/1380

I removed the parts about the writing/flushing because I did not understand exactly what they were doing, and also because the patch from comment:1 did not seem to need them. What are we missing by not doing it?

Also I did not do the approach from comment:1 because we don't want to add stuff to connection_mark_for_close() (Also see the documentation change I did to this effect).

Let me know how you like this, and if it works for you please!

Trac:
Status: assigned to needs_information
Cc: iang, pastly to iang

Hello! Pastly has been running some Shadow experiments lately where he was experiencing the issue described in this ticket. He fixed it in a 3rd place (i.e., not my patch and not your patch). He is going to go back and test your patch to make sure it does indeed fix the problem.

Trac:
Cc: iang to iang, pastly

Thanks Rob and pastly. Let me know what you find and I will act accordingly.

asn's branch fixes the issue.

Two experiments:

asn's branch at 0efa7827e476f63c442ac0536aa874458449ef78. ~300 relay Shadow network with appropriate client load.
same, but with at de66bed604377db23cfa303b83e385ef59121a64 (the commit just before asn's fix).

The first has been running for 48 of 60 simulation minutes and is still going strong. The second hit the infinite loop just 6 simulated minutes in and has died as a result of a shadow buffer overflowing from this infinite loop in Tor. Hitting the infinite loop 5 or 6 minutes in to the experiment is consistent with what I've been seeing. (The clients start generating traffic at the 5 minute mark)

Changing ticket to (rolls dice) needs_review

Trac:
Status: needs_information to needs_review

Trac:
Reviewer: N/A to dgoulet

This LGTM as soon as the changes file is changed and CI passes, merge_ready it should be!

(Oh also, there is a comment from teor about a typo in a comment :).

Trac:
Keywords: N/A deleted, nickm-merge added

Pushed fixes to changes file and teor's comment. Marking as merge_ready.

Trac:
Status: needs_review to merge_ready

asn, this is still failing on the CI.

Do not prefix versions with 'tor-'. ('0.1.2', not 'tor-0.1.2'.)

Trac:
Status: merge_ready to needs_revision

ugh sorry about that. force pushed the right fixup!

Trac:
Status: needs_revision to merge_ready

Since this is marked for possible backport onto 0.3.5, I made a new bug30344_squashed_035 branch that rebases and squashes it. The new PR is at https://github.com/torproject/tor/pull/1405 .

I've merged it to master; marking it for backport.

Trac:
Keywords: N/A deleted, BugSmashFund added
Milestone: Tor: 0.4.2.x-final to Tor: 0.4.1.x-final

Trac:
Keywords: N/A deleted, consider-backport-after-0424 added

Merged to 0.3.5 and later.

Merged #32575 (moved), #31939 (moved), #31548 (moved), #30344 (moved), #30258 (moved), #28970 (moved), #31091 (moved), and #32108 (moved) together.

Trac:
Status: merge_ready to closed
Resolution: N/A to fixed
Milestone: Tor: 0.4.1.x-final to Tor: 0.3.5.x-final

closed

mentioned in issue #31091 (moved)

mentioned in issue #31548 (moved)

mentioned in issue #31939 (moved)

mentioned in issue #31958 (moved)

mentioned in issue #32058 (moved)

mentioned in issue #32108 (moved)

mentioned in issue #32575 (moved)

moved to tpo/core/tor#30344 (closed)

conn_read_callback is called on connections that are marked for closed

Child items ...

Activity