Opened 8 years ago

Closed 8 years ago

#5156 closed defect (fixed)

assertion failure at src/network.c:931: circ

Reported by: arma Owned by: asn
Priority: High Milestone:
Component: Archived/Obfsproxy Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

My Tor client launched obfsproxy in managed mode, configured to use the 14 obfsproxy bridges that are in the bundle. My network went down, and it looks like obfsproxy bit it when the network returned.

2012-02-16 21:56:13 [warn] Connection error: No route to host
2012-02-16 21:56:13 [warn] Connection error: No route to host
2012-02-16 21:56:14 [info] [scrubbed]: socks: trying to connect to [scrubbed]:36609
2012-02-16 21:56:14 [info] [scrubbed]: socks: trying to connect to [scrubbed]:42401
2012-02-16 21:56:16 [warn] Connection error: No route to host
2012-02-16 21:56:19 [warn] Connection error: No route to host
2012-02-16 21:57:09 [info] [scrubbed]: socks: trying to connect to [scrubbed]:53999
2012-02-16 21:57:09 [error] assertion failure at src/network.c:931: circ

I'm not sure quite what version of obfsproxy it was, since it has no versions and no releases. It was built Feb 13 (Monday morning) at 07:30 EST, so it's presumably missing what we've committed since then.

#0  0x00007fa14925a475 in *__GI_raise (sig=<optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        pid = <optimized out>
        selftid = <optimized out>
#1  0x00007fa14925d6f0 in *__GI_abort () at abort.c:92
        act = {__sigaction_handler = {sa_handler = 0x7fff6f4278ec, 
            sa_sigaction = 0x7fff6f4278ec}, sa_mask = {__val = {0, 
              140330701343215, 140330693892389, 210380881921, 17689675, 
              140330693487520, 140330705588224, 17689660, 4294967295, 
              17689660, 1, 6361672, 0, 32, 17690208, 0}}, 
          sa_flags = 1237092162, sa_restorer = 0x1}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x0000000000406cf9 in log_error_abort (format=<optimized out>)
    at src/util.c:585
        ap = {{gp_offset = 32, fp_offset = 48, 
            overflow_arg_area = 0x7fff6f427750, 
            reg_save_area = 0x7fff6f427690}}
#3  0x0000000000404a68 in pending_socks_cb (bev=0x10dee60, what=32, 
    arg=0x10e0df0) at src/network.c:931
        down = 0x10e0df0
        circ = 0x0
        up = <optimized out>
        socks = <optimized out>
        __func__ = "pending_socks_cb"
#4  0x00007fa1499904c3 in bufferevent_socket_connect ()
   from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#5  0x00007fa14999058f in ?? () from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#6  0x00007fa1499a7129 in evdns_getaddrinfo () from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#7  0x00007fa149997276 in evutil_getaddrinfo_async ()
   from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#8  0x00007fa1499906c8 in bufferevent_socket_connect_hostname ()
   from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#9  0x0000000000404558 in open_outbound_hostname (port=53999, 
    addr=0x10dec3c "109.105.109.163", af=2, conn=0x10d7060)
    at src/network.c:609
        base = <optimized out>
        buf = 0x10dee60
        newconn = 0x10e0df0
#10 socks_read_cb (bev=<optimized out>, arg=0x10d7060) at src/network.c:654
        af = 2
        r = <optimized out>
        port = 53999
        addr = 0x10dec3c "109.105.109.163"
        status = <optimized out>
        conn = 0x10d7060
        socks_ret = <optimized out>
        __func__ = "socks_read_cb"
#11 0x00007fa14998fe1d in ?? () from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#12 0x00007fa14998597c in event_base_loop () from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#13 0x00000000004032c4 in launch_managed_proxy () at src/managed.c:696
        r = -1
        proxy = 0x10c40d0
#14 0x00000000004028dc in obfs_main (argc=<optimized out>, 
    argv=<optimized out>) at src/main.c:293
        begin = 0x7fff6f427cc0
#15 0x00007fa149246ead in __libc_start_main (main=<optimized out>, 
    argc=<optimized out>, ubp_av=<optimized out>, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff6f427c98)
    at libc-start.c:228
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 3345274397001419417, 
                4203448, 140735060016288, 0, 0, -3345589772015203687, 
                -3373654480587683175}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x40bc90, 0x7fff6f427ca8}, data = {prev = 0x0, 
              cleanup = 0x0, canceltype = 4242576}}}
        not_first_call = <optimized out>
#16 0x00000000004023e1 in _start ()
No symbol table info available.
(gdb) print circ
$3 = (circuit_t *) 0x0
(gdb) print *down
$2 = {cfg = 0x10d78d0, peername = 0x10df200 "109.105.109.163:53999", 
  circuit = 0x0, buffer = 0x10dee60, mode = LSN_SOCKS_CLIENT}

Child Tickets

Change History (22)

comment:1 in reply to:  description Changed 8 years ago by rransom

Replying to arma:

2012-02-16 21:57:09 [error] assertion failure at src/network.c:931: circ

I'm not sure quite what version of obfsproxy it was, since it has no versions and no releases. It was built Feb 13 (Monday morning) at 07:30 EST, so it's presumably missing what we've committed since then.

I couldn't find a revision in Git with obfs_assert(circ); on line 931.

comment:2 Changed 8 years ago by arma

Ok, it happened again. This time running git commit 459f5b82bd82a.

#0  0x00007fb5fa0b8475 in *__GI_raise (sig=<optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        pid = <optimized out>
        selftid = <optimized out>
#1  0x00007fb5fa0bb6f0 in *__GI_abort () at abort.c:92
        act = {__sigaction_handler = {sa_handler = 0x7fffebfa8f8c, 
            sa_sigaction = 0x7fffebfa8f8c}, sa_mask = {__val = {0, 
              140419568544239, 140419561093413, 210380881921, 22641449, 
              140419560688544, 140419572789248, 22641436, 4294967295, 
              22641436, 1, 6361672, 0, 32, 22641712, 0}}, 
          sa_flags = -90020030, sa_restorer = 0x1}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x0000000000406dd9 in log_error_abort (format=<optimized out>)
    at src/util.c:589
        ap = {{gp_offset = 32, fp_offset = 48, 
            overflow_arg_area = 0x7fffebfa8df0, 
            reg_save_area = 0x7fffebfa8d30}}
#3  0x0000000000404aa9 in pending_socks_cb (bev=0x1597c30, what=32, 
    arg=0x1597fa0) at src/network.c:945
        down = 0x1597fa0
        circ = 0x0
        up = <optimized out>
        socks = <optimized out>
        __func__ = "pending_socks_cb"
#4  0x00007fb5fa7ee4c3 in bufferevent_socket_connect ()
   from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#5  0x00007fb5fa7ee58f in ?? () from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#6  0x00007fb5fa805129 in evdns_getaddrinfo () from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#7  0x00007fb5fa7f5276 in evutil_getaddrinfo_async ()
   from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#8  0x00007fb5fa7ee6c8 in bufferevent_socket_connect_hostname ()
   from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#9  0x00000000004045c8 in open_outbound_hostname (port=32982, 
    addr=0x1597b1c "188.40.20.215", af=2, conn=0x1597030) at src/network.c:614
        base = <optimized out>
        buf = 0x1597c30
        newconn = 0x1597fa0
#10 socks_read_cb (bev=<optimized out>, arg=0x1597030) at src/network.c:659
        af = 2
        r = <optimized out>
        port = 32982
        addr = 0x1597b1c "188.40.20.215"
        status = <optimized out>
        conn = 0x1597030
        socks_ret = <optimized out>
        __func__ = "socks_read_cb"
#11 0x00007fb5fa7ede1d in ?? () from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#12 0x00007fb5fa7e397c in event_base_loop () from /usr/lib/libevent-2.0.so.5
No symbol table info available.
#13 0x00000000004032f4 in launch_managed_proxy () at src/managed.c:696
        r = -1
        proxy = 0x15800d0
#14 0x00000000004028f8 in obfs_main (argc=<optimized out>, 
    argv=<optimized out>) at src/main.c:299
        begin = 0x7fffebfa9360
#15 0x00007fb5fa0a4ead in __libc_start_main (main=<optimized out>, 
    argc=<optimized out>, ubp_av=<optimized out>, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffebfa9338)
    at libc-start.c:228
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -4408461484354319446, 
                4203448, 140737152455488, 0, 0, 4408505436013185962, 
                4447880465000549290}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x40be40, 0x7fffebfa9348}, data = {prev = 0x0, 
              cleanup = 0x0, canceltype = 4243008}}}
        not_first_call = <optimized out>
#16 0x00000000004023e1 in _start ()
No symbol table info available.
(gdb) print *down
$3 = {cfg = 0x1593870, peername = 0x1597fd0 "188.40.20.215:32982", 
  circuit = 0x0, buffer = 0x1597c30, mode = LSN_SOCKS_CLIENT}

It similarly happened when I opened my laptop after it was asleep for some hours.

comment:3 Changed 8 years ago by nickm

Ah, it looks like there's an ordering bug here. bufferevent_connect_hostname is allowed to complete immediately while it is called, but the function calling it assumes that it's okay to set up the ->circuit pointer after the open_outbound_hostname call returns.

So the right fix here seems to be setting the circuit earlier, I guess?

comment:4 in reply to:  3 ; Changed 8 years ago by asn

Replying to nickm:

Ah, it looks like there's an ordering bug here. bufferevent_connect_hostname is allowed to complete immediately while it is called, but the function calling it assumes that it's okay to set up the ->circuit pointer after the open_outbound_hostname call returns.

So the right fix here seems to be setting the circuit earlier, I guess?

So I see, I didn't know that bufferevent callbacks are triggered immediately. I thought they were triggered on the next libevent loop; seems like this is the behavior of deferred callbacks, though.

So, does simple_server_listener_cb() and simple_client_listener_cb() have the same problem? open_outbound() seems to do bufferevent_setcb and bufferevent_socket_connect before setting up the circuit.

I'm wondering, do we have any more instances of this bug?

I'll do some more research, try to reproduce it, and then try to make a patch.

comment:5 in reply to:  4 ; Changed 8 years ago by nickm

Replying to asn:

Replying to nickm:

Ah, it looks like there's an ordering bug here. bufferevent_connect_hostname is allowed to complete immediately while it is called, but the function calling it assumes that it's okay to set up the ->circuit pointer after the open_outbound_hostname call returns.

So the right fix here seems to be setting the circuit earlier, I guess?

So I see, I didn't know that bufferevent callbacks are triggered immediately. I thought they were triggered on the next libevent loop; seems like this is the behavior of deferred callbacks, though.

Most callbacks are; some aren't. Connect/resolve callbacks, especially failing ones, can happen immediately. (Might be nice to get this fixed in libevent 2.1)

So, does simple_server_listener_cb() and simple_client_listener_cb() have the same problem? open_outbound() seems to do bufferevent_setcb and bufferevent_socket_connect before setting up the circuit.

It might on some BSDs. The resolve in bufferevent_connect_hostname seems to be the main issue with this case, but ISTR that on some BSDs, connect() can succeed/fail immediately.

comment:6 in reply to:  5 Changed 8 years ago by arma

Replying to nickm:

ISTR that on some BSDs, connect() can succeed/fail immediately.

In this case (on Linux) I don't have a network so connect() fails immediately. I think.

comment:7 Changed 8 years ago by arma

Priority: normalmajor

This bug cores my obfsproxy basically every time my laptop sleeps. So I bet it is impacting 'real' users too.

It's fine to leave the ticket open while researching whether we have other instances of the bug. But can we fix the known instance in the meantime?

comment:8 Changed 8 years ago by asn

I'd like some help:
https://gitweb.torproject.org/obfsproxy.git/blob/HEAD:/src/network.c#l613

It seems like that if the arguments to
bufferevent_socket_connect_hostname() are well-formed, the function
will always return 0. This means that if the DNS lookup fail, it will
go into pending_socks_cb() (since the callbacks were set before the
connect_hostname() call), then into error_or_eof(), then it will
free the whole circuit, and then it will continue happily to line623
which is bufferevent_enable(buf, EV_READ|EV_WRITE);. that causes a
segfault. I'm encountering this problem while trying to solve
#5156. We currently don't have this problem because we assert out in
pending_socks_cb().

So the question is, how should we avoid continuing if
bufferevent_socket_connect_hostname() fails immediately? Or maybe
this is not the correct question to ask?

Also, is this a bug of libevent? Or does the callback approach
prohibit evutil_getaddrinfo_async() from knowing whether it should
return an error?

I uploaded a branch of my current progress in 'bug5156_tmp'.

comment:9 in reply to:  8 Changed 8 years ago by asn

Trying to answer my above question; would moving the bufferevent_enable() and bufferevent_disable() commands before the bufferevent_socket_conect_hostname() be a sufficient countermeasure?

comment:10 in reply to:  8 Changed 8 years ago by nickm

Replying to asn:

I'd like some help:
https://gitweb.torproject.org/obfsproxy.git/blob/HEAD:/src/network.c#l613

It seems like that if the arguments to
bufferevent_socket_connect_hostname() are well-formed, the function
will always return 0.

I believe that's right.

This means that if the DNS lookup fail, it will
go into pending_socks_cb() (since the callbacks were set before the
connect_hostname() call), then into error_or_eof(), then it will
free the whole circuit, and then it will continue happily to line623
which is bufferevent_enable(buf, EV_READ|EV_WRITE);. that causes a
segfault. I'm encountering this problem while trying to solve
#5156. We currently don't have this problem because we assert out in
pending_socks_cb().

So the question is, how should we avoid continuing if
bufferevent_socket_connect_hostname() fails immediately? Or maybe
this is not the correct question to ask?

Also, is this a bug of libevent?

It confuses enough people that it should probably change, but it's not actually a bug: it's working, and working as documented.

Or does the callback approach

prohibit evutil_getaddrinfo_async() from knowing whether it should
return an error?

Generally the callback model means that we launch an operation, and if we launch it successfully, we say "yay, success!" even if the operation itself will fail.

I uploaded a branch of my current progress in 'bug5156_tmp'.

Replying to asn:

Trying to answer my above question; would moving the bufferevent_enable() and bufferevent_disable() commands before the bufferevent_socket_conect_hostname() be a sufficient countermeasure?

Should work!

comment:11 Changed 8 years ago by asn

Status: newneeds_review

Aha!
Please see bug5156 in https://git.gitorious.org/obfsproxy/obfsproxy.git.

comment:12 Changed 8 years ago by nickm

It looks like there's a lot of other stuff in that branch?

comment:13 Changed 8 years ago by nickm

Hm. It's kinda hard to review commits like 232c16e56d3d6ee93e06332b71c91e5c60ce2df0, since they seem to move code _and_ change the code as it's moved. (Or am I wrong? Is it only moving code?)

comment:14 in reply to:  13 Changed 8 years ago by asn

Replying to nickm:

Hm. It's kinda hard to review commits like 232c16e56d3d6ee93e06332b71c91e5c60ce2df0, since they seem to move code _and_ change the code as it's moved. (Or am I wrong? Is it only moving code?)

Maybe bug5156_take2 is cleaner?

comment:15 Changed 8 years ago by nickm

The strategy looks sound; it's worth trying out.

Only...what frees the bufferevent if it fails before the callbacks are set?

comment:16 in reply to:  15 Changed 8 years ago by asn

Replying to nickm:

The strategy looks sound; it's worth trying out.

Only...what frees the bufferevent if it fails before the callbacks are set?

Hm, which case are you thinking of?

Generally, create_other_side_of_conn() should create the bufferevent and attach it to the newconn conn_t. This means that conn_free() will also free the bufferevent.

Then, after circuit_create() is called, newconn and conn are bound in a circuit, which means that conn_freeing one is also freeing the other.

Which case did I miss?

comment:17 Changed 8 years ago by nickm

I'm asking, if the idea is to set the callbacks _after_ bufferevent_connect_hostname(), how do we detect and respond to an immediate failure in bufferevent_connect_hostname()?

Other than that, I'd say "test it and merge it!" (Maybe ask roger to test it too: he seems to be pretty good at reproducing the original bug.)

comment:18 in reply to:  17 ; Changed 8 years ago by asn

Replying to nickm:

I'm asking, if the idea is to set the callbacks _after_ bufferevent_connect_hostname(), how do we detect and respond to an immediate failure in bufferevent_connect_hostname()?

Hm, I think the basic idea is to have a circuit already created when bufferevent_connect_hostname() is called. I don't see the idea of setting callbacks _after_ bufferevent_connect_hostname() in the code:
https://gitorious.org/obfsproxy/obfsproxy/blobs/bug5156_take2/src/network.c#line649

Specifically, the idea is to have nothing after bufferevent_socket_connect_hostname(), because of comment:8.

Am I misunderstanding you?

Other than that, I'd say "test it and merge it!" (Maybe ask roger to test it too: he seems to be pretty good at reproducing the original bug.)

I tested it. The bug is indeed quite easily reproducible, just kill your network interface, fire up obfsproxy obfs2 socks 127.0.0.1:5555 and point a socks client to it (say have ClientTransportPlugin obfs2 socks4 127.0.0.1:5555 and Bridge obfs2 8.8.8.8:1051 in your torrc.)

Still, it would be great if Roger tested it too, before merging.

comment:19 in reply to:  18 Changed 8 years ago by arma

Replying to asn:

I tested it. The bug is indeed quite easily reproducible, just kill your network interface, fire up obfsproxy obfs2 socks 127.0.0.1:5555 and point a socks client to it (say have ClientTransportPlugin obfs2 socks4 127.0.0.1:5555 and Bridge obfs2 8.8.8.8:1051 in your torrc.)

Still, it would be great if Roger tested it too, before merging.

Don't wait up for me -- if you could reproduce it, and you fixed it, and now you can't reproduce it, that sounds like it's fixed to me. I can always reopen the ticket if I encounter it again.

comment:20 Changed 8 years ago by asn

I shall merge this in 24 hours.

comment:21 Changed 8 years ago by nickm_mobile

Imo there's no need to wait

comment:22 in reply to:  21 Changed 8 years ago by asn

Resolution: fixed
Status: needs_reviewclosed

Replying to nickm_mobile:

Imo there's no need to wait

Merged.

Note: See TracTickets for help on using tickets.