obfsproxy assertion failure src/network.c:726: down->buffer == bev
My obfsproxy, as a server in external mode, just triggered
2012-02-10T11:31:17Z [error] assertion failure at src/network.c:726: down->buffer == bev
It was running under valgrind so I have no traceback or core.
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Author
Happened a second time. I was running new code that asn apparently didn't add to this ticket.
Here's the new log line:
2012-02-10T12:12:30Z [warn] down->buffer: '0x621710' bev: '0x6213a0'
Please try branch
bug5072
inhttps://git.gitorious.org/obfsproxy/obfsproxy.git
.- Author
[warn] simple_server_listener_cb: conn: '0x60a34f0' conn->buffer: '0x656f6c0' [info] x.x.x.x:55981 (obfs2): trying to connect to 127.0.0.1:9009 [warn] pending_conn_cb: circ->downstream: '0x60a34f0' circ->downstream->buffer: '0x656f6c0' [info] EOF from 127.0.0.1:9009 [warn] downstream_read_cb: bev: '0x64a4100' down->buffer:'0x637dc40', down: '0x6786f80' [warn] bev == down->circuit->upstream->buffer: 0 bev == down->circuit->downstream->buffer: 1 down->buffer == down->circuit->upstream->buffer: 1 down->buffer == down->circuit->downstream->buffer: 0 [error] assertion failure at src/network.c:740: 0
- Author
Way earlier in the log I see
[warn] simple_server_listener_cb: conn: '0x666d200' conn->buffer: '0x64a4100' ... [warn] pending_conn_cb: circ->downstream: '0x666d200' circ->downstream->buffer: '0x64a4100'
whereas 0x637dc40 does not show up elsewhere in the log.
- Author
gcc sure isn't a fan of leaving these variables in place.
#0 0x00007ffdafe911b5 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 pid = <value optimized out> selftid = <value optimized out> #1 0x00007ffdafe93fc0 in *__GI_abort () at abort.c:92 act = {__sigaction_handler = {sa_handler = 0xbae94a9b660dc8a4, sa_sigaction = 0xbae94a9b660dc8a4}, sa_mask = {__val = { 9521807005097767791, 14725201737993313616, 3719836312890456229, 11323701844346537692, 9696617352571939357, 140727554562824, 140727566413824, 16679215976743444622, 4294967295, 12635715657266764497, 1, 6354472, 0, 0, 1, 0}}, sa_flags = -1334102254, sa_restorer = 0x5d70663500000001} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x0000000000406298 in log_error_abort (format=<value optimized out>) at src/util.c:580 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7fff3290f1d0, reg_save_area = 0x7fff3290f110}} #3 0x0000000000404363 in downstream_read_cb (bev=0x1fec380, arg=0x20459d0) at src/network.c:740 r = <value optimized out> __func__ = "downstream_read_cb" #4 0x00007ffdb057cba5 in bufferevent_readcb (fd=<value optimized out>, event=<value optimized out>, arg=0x1fec380) at bufferevent_sock.c:183 input = 0x1fec580 res = <value optimized out> howmuch = <value optimized out> readmax = <value optimized out> #5 0x00007ffdb057493c in event_process_active_single_queue (base=0x1feac40, flags=<value optimized out>) at event.c:1325 ev = 0x1fec390 #6 event_process_active (base=0x1feac40, flags=<value optimized out>) at event.c:1392 i = 0 #7 event_base_loop (base=0x1feac40, flags=<value optimized out>) at event.c:1589 n = 1 evsel = 0x7ffdb07a4160 tv = {tv_sec = 1, tv_usec = 140727564342821} tv_p = <value optimized out> res = <value optimized out> retval = <value optimized out> __func__ = "event_base_loop" #8 0x000000000040af79 in launch_external_proxy (begin=0x1) at src/external.c:78 configs = 0x1fdc010 end = 0x1 #9 0x0000000000402368 in obfs_main (argc=<value optimized out>, argv=0x30c1) at src/main.c:262 begin = 0x7fff3290f490 #10 0x00007ffdafe7dc8d in __libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fff3290f468) at libc-start.c:228 result = <value optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 1202158516013877494, 4202624, 140734041748592, 0, 0, -1202309449877305098, -1200856901397001994}, mask_was_saved = 0}}, priv = {pad = { 0x0, 0x0, 0x40b0a0, 0x7fff3290f478}, data = {prev = 0x0, cleanup = 0x0, canceltype = 4239520}}} not_first_call = <value optimized out> #11 0x00000000004020a9 in _start () No symbol table info available.
- Author
The warns that went with that backtrace:
[warn] simple_server_listener_cb: conn: '0x1feac10' conn->buffer: '0x1febf60' [warn] pending_conn_cb: circ->downstream: '0x1feac10' circ->downstream->buffer: '0x1febf60' [warn] simple_server_listener_cb: conn: '0x1fec7d0' conn->buffer: '0x1fec9a0' [warn] pending_conn_cb: circ->downstream: '0x1fec7d0' circ->downstream->buffer: '0x1fec9a0' [warn] simple_server_listener_cb: conn: '0x2030140' conn->buffer: '0x202f910' [warn] pending_conn_cb: circ->downstream: '0x2030140' circ->downstream->buffer: '0x202f910' [warn] simple_server_listener_cb: conn: '0x20300b0' conn->buffer: '0x2030440' [warn] pending_conn_cb: circ->downstream: '0x20300b0' circ->downstream->buffer: '0x2030440' [warn] simple_server_listener_cb: conn: '0x2035aa0' conn->buffer: '0x2035af0' [warn] pending_conn_cb: circ->downstream: '0x2035aa0' circ->downstream->buffer: '0x2035af0' [warn] simple_server_listener_cb: conn: '0x2030810' conn->buffer: '0x2015500' [warn] pending_conn_cb: circ->downstream: '0x2030810' circ->downstream->buffer: '0x2015500' [warn] simple_server_listener_cb: conn: '0x1fffc20' conn->buffer: '0x1fff630' [warn] pending_conn_cb: circ->downstream: '0x1fffc20' circ->downstream->buffer: '0x1fff630' [warn] simple_server_listener_cb: conn: '0x2015050' conn->buffer: '0x2034f70' [warn] pending_conn_cb: circ->downstream: '0x2015050' circ->downstream->buffer: '0x2034f70' [warn] simple_server_listener_cb: conn: '0x1fff9f0' conn->buffer: '0x2045bd0' [warn] pending_conn_cb: circ->downstream: '0x1fff9f0' circ->downstream->buffer: '0x2045bd0' [warn] simple_server_listener_cb: conn: '0x2030060' conn->buffer: '0x2030810' [warn] pending_conn_cb: circ->downstream: '0x2030060' circ->downstream->buffer: '0x2030810' [warn] simple_server_listener_cb: conn: '0x2030120' conn->buffer: '0x1fec380' [warn] pending_conn_cb: circ->downstream: '0x2030120' circ->downstream->buffer: '0x1fec380' [warn] simple_server_listener_cb: conn: '0x2035a40' conn->buffer: '0x1ffc720' [warn] pending_conn_cb: circ->downstream: '0x2035a40' circ->downstream->buffer: '0x1ffc720' [warn] simple_server_listener_cb: conn: '0x2035aa0' conn->buffer: '0x20245f0' [warn] pending_conn_cb: circ->downstream: '0x2035aa0' circ->downstream->buffer: '0x20245f0' [warn] simple_server_listener_cb: conn: '0x20247f0' conn->buffer: '0x20304e0' [warn] pending_conn_cb: circ->downstream: '0x20247f0' circ->downstream->buffer: '0x20304e0' [warn] downstream_read_cb: bev: '0x1fec380' down->buffer:'0x2045660', down: '0x20459d0' [warn] bev == down->circuit->upstream->buffer: 0 bev == down->circuit->downstream->buffer: 1 down->buffer == down->circuit->upstream->buffer: 1 down->buffer == down->circuit->downstream->buffer: 0 [error] assertion failure at src/network.c:740: 0
This log:
[warn] downstream_read_cb: bev: '0x1fec380' down->buffer:'0x2045660', down: '0x20459d0' [warn] bev == down->circuit->upstream->buffer: 0 bev == down->circuit->downstream->buffer: 1 down->buffer == down->circuit->upstream->buffer: 1 down->buffer == down->circuit->downstream->buffer: 0
tells us that the
downstream_read_cb()
callback was triggered by the downstream bufferevent of a circuit (0x1fec380
), but the callback argument was the upstream connection (0x20459d0
). The callback argument ofdownstream_read_cb()
must be the downstream connection, and that's why we assert out.Looking at the whole log, we see that
0x1fec380
is the downstream bufferevent of a connection established some lines above:[warn] simple_server_listener_cb: conn: '0x2030120' conn->buffer: '0x1fec380' [warn] pending_conn_cb: circ->downstream: '0x2030120' circ->downstream->buffer: '0x1fec380'
I pushed another commit to my
bug5072
branch which might or might not help debug this issue. I'm also CCing nickm in case he has any good ideas.Trac:
Priority: normal to majorI wonder how:
if (down->buffer != bev) { conn_free(down); return; }
in the top of
downstream_read_cb()
, would act as an anti-crash temporary workaround.conn_free(down);
might make it too much undefined behavior, maybe just returning and suffering the memory leak could also work.Messy, but it might do it for the weekend.
- Author
Replying to asn:
I wonder how:
if (down->buffer != bev) { conn_free(down); return; } }}} in the top of `downstream_read_cb()`, would act as an anti-crash temporary workaround. `conn_free(down);` might make it too much undefined behavior, maybe just returning and suffering the memory leak could also work. Messy, but it might do it for the weekend.
I'm running this patch, and it crashes less. Specifically, it just encountered an area where it would have asserted, and it didn't.
{{{ [debug] error_cb for 127.0.0.1:9009: what=0x0011 errno=11 [info] EOF from 127.0.0.1:9009 [debug] error_or_eof for 127.0.0.1:9009 [debug] 127.0.0.1:9009: downstream_read_cb, 586 bytes available [warn] downstream_read_cb: bev: '0xa1ef40' down->buffer:'0xa1f380', down: '0xa1f6f0' [warn] bev == down->circuit->upstream->buffer: 0 bev == down->circuit->downstream->buffer: 1 down->buffer == down->circuit->upstream->buffer: 1 down->buffer == down->circuit->downstream->buffer: 0
[debug] Closing connection with 127.0.0.1:9009; 1 remaining [debug] Closing connection with x.x.x.x:10819; 0 remaining
- Author
[debug] 127.0.0.1:9009: upstream_read_cb, 586 bytes available [debug] obfs2_send: transmitting 586 bytes. [debug] obfs2_crypt_and_transmit: Processed 586 bytes. [debug] y.y.y.y:43721: transmitted 586 bytes [debug] error_cb for 127.0.0.1:9009: what=0x0011 errno=11 [info] EOF from 127.0.0.1:9009 [debug] error_or_eof for 127.0.0.1:9009 [debug] Closing connection with 127.0.0.1:9009; 5 remaining [debug] Closing connection with z.z.z.z:4284; 4 remaining [debug] 127.0.0.1:9009: downstream_read_cb, 586 bytes available [warn] downstream_read_cb: bev: '0xa1ef40' down->buffer:'0xa1f380', down: '0xa1f6f0' [warn] bev == down->circuit->upstream->buffer: 0 bev == down->circuit->downstream->buffer: 1 down->buffer == down->circuit->upstream->buffer: 1 down->buffer == down->circuit->downstream->buffer: 0 [debug] Closing connection with 127.0.0.1:9009; 3 remaining [debug] Closing connection with x.x.x.x:10824; 2 remaining [debug] error_cb for y.y.y.y:43721: what=0x0011 errno=11 [info] EOF from y.y.y.y:43721 [debug] error_or_eof for y.y.y.y:43721 [debug] Closing connection with 127.0.0.1:9009; 1 remaining [debug] Closing connection with y.y.y.y:43721; 0 remaining
Just pushed a rebased and extended bug5072_v2 to my public repository; let me know if it asserts anything different, and if so what the stack trace is?
(For more info: my hackish code tries to track whether every conn is upstream, downstream, or unknown, and tries to make sure that they get added to circuits in the right way, and that downstream_read_cb is only ever set on downstream conns associated with the right bufferevent.)
[warn] simple_server_listener_cb: conn: '0x8fd060' conn->buffer: '0x8fe310' [info] [manually scrubbed] (obfs2): trying to connect to 127.0.0.1:9001 [warn] pending_conn_cb: circ->downstream: '0x8fd060' circ->downstream->buffer: '0x8fe310'
Reports from user that my tor with obfsproxy printing the warnings above relays traffic as it should.
- Author
Running nickm's bug5072_v2 branch:
[debug] x.x.x.x:44138: transmitted 586 bytes [debug] x.x.x.x:44138: downstream_read_cb, 586 bytes available [debug] obfs2_recv: Processing 586 bytes application data [debug] obfs2_crypt_and_transmit: Processed 586 bytes. [debug] x.x.x.x:44138: forwarded 586 bytes [debug] x.x.x.x:44138: downstream_read_cb, 586 bytes available [debug] obfs2_recv: Processing 586 bytes application data [debug] obfs2_crypt_and_transmit: Processed 586 bytes. [debug] x.x.x.x:44138: forwarded 586 bytes [debug] error_cb for 127.0.0.1:9009: what=0x0011 errno=104 [info] EOF from 127.0.0.1:9009 [debug] error_or_eof for 127.0.0.1:9009 [error] assertion failure at src/network.c:168: conn->is_downstream == 1
#0 0x00007fceebbf91b5 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 pid = <value optimized out> selftid = <value optimized out> #1 0x00007fceebbfbfc0 in *__GI_abort () at abort.c:92 act = {__sigaction_handler = {sa_handler = 0xcd6a90064b1faad1, sa_sigaction = 0xcd6a90064b1faad1}, sa_mask = {__val = { 9949445123934572641, 9715088607761979730, 140526695169636, 4707197592648237900, 7161402270843880775, 140526695013128, 140526706864128, 1, 4294967295, 140526698516032, 1, 6362904, 0, 0, 1, 0}}, sa_flags = -330189038, sa_restorer = 0x1} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x0000000000406ead in log_error_abort ( format=0x40d0f8 "assertion failure at %s:%d: %s") at src/util.c:580 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7fffdab552a0, reg_save_area = 0x7fffdab551e0}} #3 0x00000000004038d9 in bufferevent_wrap_setcb (bev=<value optimized out>, r=<value optimized out>, w=<value optimized out>, e=<value optimized out>, arg=0x7fffdab54d70) at src/network.c:168 No locals. #4 0x00007fceec2e4bce in bufferevent_readcb (fd=0, event=28080, arg=0x1696bb0) at bufferevent_sock.c:192 input = 0x1696db0 res = 0 howmuch = <value optimized out> readmax = <value optimized out> #5 0x00007fceec2dc93c in event_process_active_single_queue (base=0x1692c40, flags=<value optimized out>) at event.c:1325 ev = 0x1696bc0 #6 event_process_active (base=0x1692c40, flags=<value optimized out>) at event.c:1392 i = 0 #7 event_base_loop (base=0x1692c40, flags=<value optimized out>) at event.c:1589 n = 1 evsel = 0x7fceec50c160 tv = {tv_sec = 1, tv_usec = 140526704793125} tv_p = <value optimized out> res = <value optimized out> retval = <value optimized out> __func__ = "event_base_loop" #8 0x000000000040ca79 in launch_external_proxy (begin=0x1) at src/external.c:78 configs = 0x1684010 end = 0x1 #9 0x00000000004024a8 in obfs_main (argc=<value optimized out>, argv=0x7278) at src/main.c:262 begin = 0x7fffdab55570 #10 0x00000000004022b4 in main (argc=7, argv=0x7fffdab55558) at src/obfs_main.c:23
fucking gcc:
(gdb) print conn $1 = <value optimized out>
- Author
Ok, here's a better one:
debug] error_cb for 127.0.0.1:9009: what=0x0011 errno=11 [info] EOF from 127.0.0.1:9009 [debug] error_or_eof for 127.0.0.1:9009 [error] assertion failure at src/network.c:168: conn->is_downstream == 1
#0 0x00007fd4341a11b5 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 pid = <value optimized out> selftid = <value optimized out> #1 0x00007fd4341a3fc0 in *__GI_abort () at abort.c:92 act = {__sigaction_handler = {sa_handler = 0x7fd434873fb0, sa_sigaction = 0x7fd434873fb0}, sa_mask = {__val = { 140736224283776, 140549391074760, 140549391080016, 9946658577485458587, 15627131712425739986, 140549383740168, 140549395591168, 140549395653936, 4294967295, 140549391089376, 1, 6373952, 0, 0, 1, 0}}, sa_flags = 883701522, sa_restorer = 0x7fff00000001} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00000000004087d5 in log_error_abort ( format=0x40f5f0 "assertion failure at %s:%d: %s") at src/util.c:580 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7fffb4a7cd20, reg_save_area = 0x7fffb4a7cc60}} #3 0x0000000000403ed8 in bufferevent_wrap_setcb (bev=0x10c29c0, r=0x405400 <downstream_read_cb>, w=0x40486d <conn_free_on_flush>, e=0x4059e4 <flush_error_cb>, arg=0x10c30a0) at src/network.c:168 conn = 0x10c30a0 #4 0x00000000004058d1 in error_or_eof (conn=0x10c30a0) at src/network.c:836 circ = 0x10c3120 bev_err = 0x10c2d30 bev_flush = 0x10c29c0 __func__ = "error_or_eof" #5 0x00000000004059e2 in error_cb (bev=0x10c2d30, what=17, arg=0x10c30a0) at src/network.c:864 conn = 0x10c30a0 errcode = 11 __func__ = "error_cb" #6 0x00007fd43488cbce in bufferevent_readcb (fd=0, event=12080, arg=0x10c2d30) at bufferevent_sock.c:192 input = 0x10c2f30 res = 0 howmuch = <value optimized out> readmax = <value optimized out> #7 0x00007fd43488493c in event_process_active_single_queue (base=0x10c0c40, flags=<value optimized out>) at event.c:1325 ev = 0x10c2d40 #8 event_process_active (base=0x10c0c40, flags=<value optimized out>) at event.c:1392 i = 0 #9 event_base_loop (base=0x10c0c40, flags=<value optimized out>) at event.c:1589 n = 1 evsel = 0x7fd434ab4160 tv = {tv_sec = 140736224284880, tv_usec = 140549393520165} tv_p = <value optimized out> res = <value optimized out> retval = <value optimized out> __func__ = "event_base_loop" #10 0x000000000040eb67 in launch_external_proxy (begin=0x7fffb4a7d110) at src/external.c:78 configs = 0x10b2010 end = 0x7fffb4a7d110 #11 0x0000000000402a18 in obfs_main (argc=7, argv=0x7fffb4a7d0d8) at src/main.c:262 begin = 0x7fffb4a7d0f0 #12 0x0000000000402424 in main (argc=7, argv=0x7fffb4a7d0d8) at src/obfs_main.c:23 No locals.
(gdb) print *conn $2 = {conn_magic = 3225731140, cfg = 0x10b20e0, peername = 0x10c30e0 "127.0.0.1:9009", circuit = 0x10c3120, buffer = 0x10c2d30, mode = LSN_SIMPLE_SERVER, is_downstream = 0}
- Author
Hit it again, looks very similar:
(gdb) print *conn $1 = {conn_magic = 3225731140, cfg = 0x24d80e0, peername = 0x24e8700 "127.0.0.1:9009", circuit = 0x24e8720, buffer = 0x24e8390, mode = LSN_SIMPLE_SERVER, is_downstream = 0}
Please try
bug5072_v2
inhttps://git.gitorious.org/obfsproxy/obfsproxy.git
.It's basically https://gitorious.org/obfsproxy/obfsproxy/commit/a7e263494eaddae7523615a2b7efca51b0cc2d49?format=patch on top of Nick's branch.
It seems like
error_or_eof
might have been a bit sloppy when re-assigning bufferevent callbacks.If
bug5072_v2
stops the crashing, check out branchbug5072_v3
. It stops the annoying "Oh no you didn't" messages.It simply doesn't reassign the read callback on
error_or_eof()
. That's the behavior of the olderror_or_eof()
, before #3613 (closed).The bug that
bug5072_v2
andbug5072_v3
are dodging is:static void error_or_eof(conn_t *conn) { ... bev_flush = (conn == circ->upstream) ? circ->downstream->buffer : circ->upstream->buffer; ... bufferevent_setcb(bev_flush, bev_flush->readcb, conn_free_on_flush, flush_error_cb, conn); }
in the case where
bev_flush->readcb == downstream_read_cb
.- Author
asn's bug5072_v2 does indeed stop the crashing.
I get quite a few of the "Oh not you didn't" messages -- so far 44 since starting my obfsproxy a few hours ago.
I look forward to nickm merging this fix. :)
- Author
Trac:
Status: new to needs_review Two things:
This bug probably also applies to the client-side, since
error_or_eof()
is also called by the client code. The next bundles should probably havebug5072_v3
applied, since AFAIK it seems to work.Nick, do you see a reason for setting the read callback on a bufferevent that just has to flush its data and close?
I don't see how the bug works. If bev_flush was had is_downstream==0, how was bev_flush->readcb == downstream_read_cb in the first place?
- Author
Replying to nickm:
Just pushed a rebased and extended bug5072_v2 to my public repository; let me know if it asserts anything different, and if so what the stack trace is?
(gdb) up #3 0x0000000000403ed8 in bufferevent_wrap_setcb (bev=0x1e1aea0, r=0x405400 <downstream_read_cb>, w=0x40486d <conn_free_on_flush>, e=0x4059e4 <flush_error_cb>, arg=0x1e1b580) at src/network.c:168 168 obfs_assert(conn->is_downstream == 1); (gdb) print *conn->circuit $1 = {upstream = 0x1e1b580, downstream = 0x1e1bff0, socks_state = 0x0, is_open = 1, is_flushing = 1} (gdb) print *conn $2 = {conn_magic = 3225731140, cfg = 0x1dfe0e0, peername = 0x1e1b5c0 "127.0.0.1:9009", circuit = 0x1e1b600, buffer = 0x1e1b210, mode = LSN_SIMPLE_SERVER, is_downstream = 0}
Aha! When we were setting the flush callback on bev_flush, we were passing the wrong conn_t as the argument.
See branch bug5072_solution in my public obfsproxy repository.
Merged bug5072_solution as tested by marlowe and armadev.
Trac:
Resolution: N/A to fixed
Status: needs_review to closed- Trac closed
closed