Opened 12 years ago

Last modified 7 years ago

#406 closed defect (Fixed)

connection.c:2391 assert_connection_ok failed (on bridged dir conn?) 0.1.2.12-rc

Reported by: seeess Owned by: nickm
Priority: Low Milestone: 0.1.2.20
Component: Core Tor/Tor Version: 0.1.0.18
Severity: Keywords:
Cc: seeess, nickm, arma Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

0.1.2.12-rc

connection.c:2391 assert_connection_ok: Assertion connection_is_writing(conn)
conn->wants_to_write

(conn->type == CONN_TYPE_DIR && TO_DIR_CONN(conn)->is_blocked_on_or_conn) failed; aborting.
Aborted

from the log
Mar 20 09:42:49.783 [info] connection_edge_process_relay_cell(): 524: end cell (closed normally) for stream 23964. Removing stream.
Mar 20 09:42:49.783 [info] connection_edge_process_relay_cell(): end cell (closed normally) dropped, unknown stream.
Mar 20 09:42:49.784 [info] connection_edge_process_relay_cell(): end cell (closed normally) dropped, unknown stream.
Mar 20 09:42:49.784 [info] connection_edge_process_relay_cell(): end cell (closed normally) dropped, unknown stream.

Mar 20 09:42:49.784 [err] connection.c:2391: assert_connection_ok: Assertion connection_is_writing(conn) conn->wants_to_write
(conn->type == CONN_TYPE_DIR && TO_DIR_CONN(conn)->is_blocked_on_or_conn) failed; aborting

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (20)

comment:1 Changed 12 years ago by seeess

ulimits weren't set right to get a core @#$^@#$^

comment:2 Changed 12 years ago by nickm

platform? configuration? frequency?

comment:3 Changed 12 years ago by seeess

I had this happen on two different OR's within a day of each other and hasn't happened again since.

running on gentoo (2.6.18-gentoo-r6 #17 SMP) tor compiled from source

torrc
SocksPort 9050# what port to open for local application connections
SocksListenAddress 127.0.0.1 # accept connections only from localhost
Log info file /var/log/tor/tor.log
DataDirectory /var/lib/tor
Nickname xxxxx
ORPort 80
DirPort 443 # what port to advertise for directory connections
ExitPolicy accept *:9030
ExitPolicy reject *:*
BandwidthRate 200 KB
BandwidthBurst 250 KB
NumCPUs 4

(also happened on a single core machine with a similar configuration)

comment:4 Changed 12 years ago by seeess

Caught the core woot

#0 0xffffe410 in kernel_vsyscall ()
#1 0xb7c23770 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7c24ef3 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0x080670d7 in assert_connection_ok (conn=0x8d7d360, now=1177660924)

at connection.c:2367

#4 0x080948f0 in conn_read_callback (fd=470, event=2, _conn=0x8d7d360)

at main.c:420

#5 0xb7d31455 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#6 0xb7d316c5 in event_loop () from /usr/lib/libevent-1.1a.so.1
#7 0xb7d316e9 in event_dispatch () from /usr/lib/libevent-1.1a.so.1
#8 0x0809448a in tor_main (argc=1, argv=0xbf9b2a64) at main.c:1267
#9 0x080b7a12 in main (argc=Cannot access memory at address 0x463
) at tor_main.c:22

comment:5 Changed 12 years ago by seeess

#3 print *conn

$1 = {magic = 2575892462, type = 9 '\t', state = 6 '\006', purpose = 9 '\t',

wants_to_read = 0, wants_to_write = 0, hold_open_until_flushed = 0,
inbuf_reached_eof = 0, edge_has_sent_end = 0, or_is_obsolete = 0,
chosen_exit_optional = 0, s = 470, conn_array_index = 246,
read_event = 0x92924d0, write_event = 0x9383750, inbuf = 0xe922af8,
outbuf = 0x92ee388, outbuf_flushlen = 141, timestamp_lastread = 1177660923,
timestamp_lastwritten = 1177660923, timestamp_created = 1177660923,
addr = 2130706433, port = 0, marked_for_close = 0,
marked_for_close_file = 0x0, address = 0x8decbe8 "Tor network"}

comment:6 Changed 12 years ago by seeess

print *conn->outbuf

$2 = {magic = 2969563922,

mem = 0x8a1a384 "HTTP/1.0 200 OK\r\nDate: Fri, 27 Apr 2007 08:02:03 GMT\r\nContent-Type: application/octet_stream\r\nContent-Encoding: deflate\r\nPragma: no-cache\r\n\r\n
�B,ະ�\210�\005�A�H�\022\031m:\003\2038��5\235�U\nz\006��\"r���`\202\033�\"\177�q\003�!�Na��˵4�"...,
cur = 0x8a1a384 "HTTP/1.0 200 OK\r\nDate: Fri, 27 Apr 2007 08:02:03 GMT\r\nContent-Type: application/octet_stream\r\nContent-Encoding: deflate\r\nPragma: no-cache\r\n\r\n
�B,ະ�\210�\005�A�H�\022\031m:\003\2038��5\235�U\nz\006��\"r���`\202\033�\"\177�q\003�!�Na��˵4�"..., highwater = 141, len = 4096,
memsize = 4096, datalen = 141}

comment:7 Changed 12 years ago by nickm

Hmmm. So it's a directory connection, probably in the process of writing. address is set to "Tor network", so
it's a "bridged" directory conn.

So it looks like the blocked_on_or_conn field _should_ be set, but isn't....

hm.......

comment:8 Changed 12 years ago by erikbosman

I found a similar crash message in my logs

Aug 31 06:25:03.678 [notice] Tor 0.1.2.16 opening new log file.

Aug 31 08:14:26.915 [err] connection.c:2391: assert_connection_ok: Assertion connection_is_writing(conn)
conn->wants_to_write (conn->type == CONN_TYPE_DIR && TO_DIR_CONN(conn)->is_blocked_on_or_conn) failed; aborting.

My guess is that the crash is caused by a memory shortage since
the core dump is about 332M, with 384M available to the whole OS
(running under XEN.)

In used the 0.1.2.16-1 debian package (lenny) from noreply.org

I am unable to resolve any symbols with gdb from the core dump.
since the debug symbols for this package have been replaced by
symbols for a newer version. For what it's worth, here's
the backtrace:

(gdb) bt
#0 0xb7cf5176 in ?? ()
#1 0xb7e0dff4 in ?? ()
#2 0xb7cc96b0 in ?? ()
#3 0xbfe7fe68 in ?? ()
#4 0xb7cf6a51 in ?? ()
#5 0x00000006 in ?? ()
#6 0xbfe7fddc in ?? ()
#7 0x00000000 in ?? ()

Attempting to install the debug symbols also upgraded my
tor server. I have the old version in a backup VM, but its
probably hard to reproduce this bug anyway.

comment:9 Changed 12 years ago by weasel

Erik, I've put the old 0.1.2.16 packages for lenny up again on http://asteria.noreply.org/~weasel/tor/

comment:10 Changed 12 years ago by nickm

If you still have the core, could you dpkg -i the packages, and use gdb to make the stack trace contain symbols? That
would be really helpful.

comment:11 Changed 12 years ago by erikbosman

Here's the stack trace with symbols:

(gdb) bt
#0 0xb7cf5176 in raise () from /lib/libc.so.6
#1 0xb7cf6a51 in abort () from /lib/libc.so.6
#2 0x080671c7 in assert_connection_ok (conn=0x8733768, now=1188540866) at connection.c:2367
#3 0x08093e60 in conn_read_callback (fd=210, event=2, _conn=0x8733768) at main.c:420
#4 0xb7e17730 in event_base_loop () from /usr/lib/libevent-1.3b.so.1
#5 0xb7e17929 in event_loop () from /usr/lib/libevent-1.3b.so.1
#6 0xb7e1794e in event_dispatch () from /usr/lib/libevent-1.3b.so.1
#7 0x080939fd in tor_main (argc=1, argv=0xbfe80194) at main.c:1267
#8 0x080b6e42 in main (argc=Cannot access memory at address 0x3117
) at tor_main.c:22
#9 0xb7ce1030 in libc_start_main () from /lib/libc.so.6
#10 0x0804cdf1 in _start ()

(gdb) print *(connection_t*)(0x8733768)
$1 = {magic = 2575892462, type = 9 '\t', state = 6 '\006', purpose = 9 '\t',

wants_to_read = 0, wants_to_write = 0, hold_open_until_flushed = 0,
inbuf_reached_eof = 0, edge_has_sent_end = 0, or_is_obsolete = 0,
chosen_exit_optional = 0, s = 210, conn_array_index = 435,
read_event = 0x91c9418, write_event = 0x8995f80, inbuf = 0x980b938,
outbuf = 0x95f8100, outbuf_flushlen = 141, timestamp_lastread = 1188540858,
timestamp_lastwritten = 1188540856, timestamp_created = 1188540856,
addr = 2130706433, port = 0, marked_for_close = 0,
marked_for_close_file = 0x0, address = 0x146038d0 "Tor network"}

(gdb) print *((connection_t*)0x8733768)->outbuf
$3 = {magic = 2969563922,

mem = 0x10db3904 "HTTP/1.0 200 OK\r\nDate: Fri, 31 Aug 2007 06:14:18 GMT\r\nContent-Type: application/octet_stream\r\nContent-Encoding: deflate\r\nPragma: no-cache\r\n\r\n4568021802C1CA0A50E30D98D758C86B56605C59+881574FDD50A9EFA87"...,
cur = 0x10db3904 "HTTP/1.0 200 OK\r\nDate: Fri, 31 Aug 2007 06:14:18 GMT\r\nContent-Type: application/octet_stream\r\nContent-Encoding: deflate\r\nPragma: no-cache\r\n\r\n4568021802C1CA0A50E30D98D758C86B56605C59+881574FDD50A9EFA87"...,
highwater = 141, len = 4096, memsize = 4096, datalen = 141}

comment:12 Changed 12 years ago by nickm

Great, thanks!

This is an internal directory connection, again, writing a server response. If you still have the coore, can you
show the output of "print *(dir_connection_t*)(0x8733768)" ?

comment:13 Changed 12 years ago by erikbosman

(gdb) print *(dir_connection_t*)(0x8733768)
$1 = {_base = {magic = 2575892462, type = 9 '\t', state = 6 '\006',

purpose = 9 '\t', wants_to_read = 0, wants_to_write = 0,
hold_open_until_flushed = 0, inbuf_reached_eof = 0, edge_has_sent_end = 0,
or_is_obsolete = 0, chosen_exit_optional = 0, s = 210,
conn_array_index = 435, read_event = 0x91c9418, write_event = 0x8995f80,
inbuf = 0x980b938, outbuf = 0x95f8100, outbuf_flushlen = 141,
timestamp_lastread = 1188540858, timestamp_lastwritten = 1188540856,
timestamp_created = 1188540856, addr = 2130706433, port = 0,
marked_for_close = 0, marked_for_close_file = 0x0,
address = 0x146038d0 "Tor network"}, requested_resource = 0x0,

dirconn_direct = 0, is_blocked_on_or_conn = 0,
dir_spool_src = DIR_SPOOL_SERVER_BY_DIGEST, fingerprint_stack = 0x10ff27b0,
cached_dir = 0x0, cached_dir_offset = 0, zlib_state = 0x9108300,
rend_query = '\0' <repeats 16 times>,
identity_digest = '\0' <repeats 19 times>, bridge_conn = 0x0,
next_blocked_on_same_or_conn = 0x0}

comment:14 Changed 12 years ago by arma

Lefkada just experienced this on 0.1.2.18.

The 0.2.0.x tree has a new 'bridged conn' implementation, so I think
the best thing to do is to let this one fade away.

See also bug 419, which may be related. Or not.

comment:15 Changed 12 years ago by nickm

I'll take one more stab at fixing this before we call it a "just upgrade to 0.2.0.x" thing.

comment:16 Changed 12 years ago by nickm

See bug 419 for a related/remaining instance of this.

comment:17 Changed 12 years ago by nickm

(Once again, this is an 0.1.2.x-only bug.)

The key thing to notice is that bridge_conn is set to 0, but s is set. Since s is set on the bridged conn and the
address is set to "Tor network", that means that the connection has been through connection_exit_connect_dir(). But
bridge_conn is 0, which means that it has been subsequently cleared.

The culprit seems to be connection_dirserv_unlink_from_bridge(), called from connection_about_to_close_connection()
whenever a dir, edge, or OR coonnection involved with a bridge is about to close. It sets is_blocked_on_or_conn to
0, which winds up tripping the assertion.

comment:18 Changed 12 years ago by nickm

Quite likely fixed by 13469.

comment:19 Changed 12 years ago by nickm

flyspray2trac: bug closed.

comment:20 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.