After somewhere between a few and 24 hours, Tor on my exit node crashes with the same log message. This has happened every day for the past 3.
Jun 4 08:14:47 tor Tor[24513]: connection_exit_connect(): Bug: newly connected conn had data waiting!
Jun 4 08:14:57 tor Tor[24513]: assert_connection_ok(): Bug: connection.c:4066: assert_connection_ok: Assertion (conn->type == CONN_TYPE_EXIT && conn->state == EXIT_CONN_STATE_RESOLVING) || connection_is_writing(conn) || conn->write_blocked_on_bw || (CONN_IS_EDGE(conn) && TO_EDGE_CONN(conn)->edge_blocked_on_circ) failed; aborting.
Jun 04 15:48:12.279 [notice] Tor v0.2.3.25 (git-17c24b3118224d65) running on OpenBSD. (5.3, amd64)
Jun 04 15:48:12.289 [notice] Initialized libevent version 1.4.14b-stable using method kqueue. Good.
Trac: Username: usuality
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Program received signal SIGPIPE, Broken pipe.0x00000bfba90b5b1a in sendto () at <stdin>:22 <stdin>: No such file or directory. in <stdin>Current language: auto; currently asm(gdb) where#0 0x00000bfba90b5b1a in sendto () at <stdin>:2#1 0x00000bfb9fbabcbd in sendto (fd=192, msg=0xbfba7bae020, len=219, flags=0, to=0x0, tolen=0) at /usr/src/lib/librthread/rthread_cancel.c:526#2 0x00000bf99b9826c0 in flush_chunk (s=192, buf=0xbfba8232000, chunk=0xbfba7bae000, sz=219, buf_flushlen=0xbfba6bb6630) at buffers.c:791#3 0x00000bf99b982589 in flush_buf (s=192, buf=0xbfba8232000, sz=219, buf_flushlen=0xbfba6bb6630) at buffers.c:877#4 0x00000bf99b9c3b87 in connection_handle_write_impl (conn=0xbfba6bb6600, force=0) at connection.c:3283#5 0x00000bf99b9c3d39 in connection_handle_write (conn=0xbfba6bb6600, force=0) at connection.c:3340#6 0x00000bf99b9114b8 in conn_write_callback (fd=192, events=4, _conn=0xbfba6bb6600) at main.c:738#7 0x00000bfba33f1a38 in event_base_loop (base=0xbfba6ca0c00, flags=Variable "flags" is not available.) at /usr/src/lib/libevent/event.c:402#8 0x00000bf99b913cae in do_main_loop () at main.c:1959#9 0x00000bf99b915341 in tor_main (argc=1, argv=0x7f7ffffc9000) at main.c:2652#10 0x00000bf99b90f8ab in main (argc=1, argv=0x7f7ffffc9000) at tor_main.c:30(gdb)
connection.c:4066 assert_connection_ok: Assertion (conn->type == CONN_TYPE_EXIT && conn->state == EXIT_CONN_STATE_RESOLVING) || connection_is_writing(conn) || conn->write_blocked_on_bw || (CONN_IS_EDGE(conn) && TO_EDGE_CONN(conn)->edge_blocked_on_circ) failed; aborting.Program received signal SIGABRT, Aborted.0x00000bfba90c074a in kill () at <stdin>:22 <stdin>: No such file or directory. in <stdin>(gdb) where #0 0x00000bfba90c074a in kill () at <stdin>:2#1 0x00000bfba91247ca in abort () at /usr/src/lib/libc/stdlib/abort.c:70#2 0x00000bf99b9c650c in assert_connection_ok (conn=0xbfba11eef00, now=0) at connection.c:4061#3 0x00000bf99b9bd720 in _connection_mark_for_close (conn=0xbfba11eef00, line=1172, file=0xbf99bb55f5b "relay.c") at connection.c:673#4 0x00000bf99b92efe5 in connection_edge_process_relay_cell (cell=0x7f7ffffc8870, circ=0xbfba42e7e00, conn=0xbfba11eef00, layer_hint=0x0) at relay.c:1172#5 0x00000bf99b92b647 in circuit_receive_relay_cell (cell=0x7f7ffffc8870, circ=0xbfba42e7e00, cell_direction=CELL_DIRECTION_OUT) at relay.c:192#6 0x00000bf99b9a493a in command_process_relay_cell (cell=0x7f7ffffc8870, conn=0xbfba2a15200) at command.c:576#7 0x00000bf99b9a378e in command_process_cell (cell=0x7f7ffffc8870, conn=0xbfba2a15200) at command.c:192#8 0x00000bf99b9d7551 in connection_or_process_cells_from_inbuf (conn=0xbfba2a15200) at connection_or.c:1882#9 0x00000bf99b9d3e32 in connection_or_process_inbuf (conn=0xbfba2a15200) at connection_or.c:395#10 0x00000bf99b9c5231 in connection_process_inbuf (conn=0xbfba2a15200, package_partial=1) at connection.c:3827#11 0x00000bf99b9c2b3a in connection_handle_read_impl (conn=0xbfba2a15200) at connection.c:2706#12 0x00000bf99b9c2c48 in connection_handle_read (conn=0xbfba2a15200) at connection.c:2748#13 0x00000bf99b9112d0 in conn_read_callback (fd=130, event=2, _conn=0xbfba2a15200) at main.c:705#14 0x00000bfba33f1a38 in event_base_loop (base=0xbfba6ca0c00, flags=Variable "flags" is not available.) at /usr/src/lib/libevent/event.c:402#15 0x00000bf99b913cae in do_main_loop () at main.c:1959#16 0x00000bf99b915341 in tor_main (argc=1, argv=0x7f7ffffc9000) at main.c:2652#17 0x00000bf99b90f8ab in main (argc=1, argv=0x7f7ffffc9000) at tor_main.c:30(gdb)
I've written a small non-blocking connect() test program which I will attach, and made observations. On both Linux (kernel 3.9.1, x86-64) and OpenBSD 5.2 on x86-64, the output is as follows, which does not reproduce the presumed behavior if nickm's hypothesis on the cause is correct:
connect() gave EINPROGRESS
About to select()...
The socket has become writeable according to select() (returned 1)
Async connect() succeeded
Socket closed
Hmm; the non-immediate-return case code for handling when the connection completes is in connection_edge_finished_connecting(); the code is substantially similar to the latter part of connection_exit_connect() and this should be refactored to avoid the duplication, but probably not in maint-0.2.3. The only differences I see are
connection_edge_finished_connecting() calls rep_hist_note_exit_stream_opened(conn->port) before it puts the connection in EXIT_CONN_STATE_OPEN.
connection_edge_finished_connecting() has an extra connection_edge_process_inbuf(edge_conn, 1) before it returns, which connection_exit_connect() lacks.
Hm. So the best fix I can think of here right now then is to fix the known bug in connection_edge_finished_connecting(), and refactor that part of connection_assert_ok() into a new function that just logs a bug and enables writing. Not pretty, but neither are crash bugs. Andrea, does that sound plausible?
Oh duh; my fix would not have worked! connection_watch_events(READ_EVENT) turns off the write event. I have another patch to try; please see if "9017_v2.patch" works for you?