Opened 11 years ago

Last modified 7 years ago

#929 closed defect (Fixed)

FAILURE to event_del (ev=0x9376e48) at event.c:806

Reported by: xiando Owned by:
Priority: Very High Milestone: 0.2.1.x-final
Component: Core Tor/Tor Version: 0.2.1.12-alpha
Severity: Keywords:
Cc: xiando, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Feb 21 08:50:09.922 [notice] Tor v0.2.1.12-alpha-dev (r18582).

Crash on failure with event_del (event.c 806) involvement.

# gdb /usr/bin/tor /var/lib/tor/data/core.25599
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libz.so.1...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /usr/lib/libevent-1.4.so.2...done.
Loaded symbols for /usr/lib/libevent-1.4.so.2
Reading symbols from /usr/lib/libssl.so.0.9.8...done.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /usr/lib/libcrypto.so.0.9.8...done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_compat.so.2...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `/usr/bin/tor --runasdaemon 1'.
Program terminated with signal 11, Segmentation fault.
[New process 25599]
[New process 25612]
#0 event_del (ev=0x9376e48) at event.c:806
806 event.c: No such file or directory.

in event.c

(gdb) bt
#0 event_del (ev=0x9376e48) at event.c:806
#1 0x080e2741 in request_finished (req=0x9376e10, head=<value optimized out>) at eventdns.c:616
#2 0xb7ea53ce in event_base_loop (base=0x81430a0, flags=0) at event.c:387
#3 0xb7ea55b4 in event_loop (flags=0) at event.c:463
#4 0x080a9d75 in do_main_loop () at main.c:1435
#5 0x080aa00f in tor_main (argc=3, argv=0xbfe03114) at main.c:2060
#6 0x080e650e in main (argc=Cannot access memory at address 0x1b
) at tor_main.c:30
(gdb)

[Automatically added by flyspray2trac: Operating System: Other Linux]

Child Tickets

Change History (15)

comment:1 Changed 11 years ago by xiando

perhaps it is meaningfull to know that

a) there was nothing at which was even remotely interesting in the logs.
b) I still have the core file lying around.
c) I think it had been running for quite a while when it crashed (a day or two, perhaps)

comment:2 Changed 11 years ago by xiando

libevent 1.4.9

comment:3 Changed 11 years ago by nickm

Hang on to the core and the binary; they might be useful. Can you tell me the contents of ev in event_del?
(Just "print *ev" should do it.)

comment:4 Changed 11 years ago by nickm

[This is a server, yes?]

comment:5 Changed 11 years ago by nickm

I'm mildly suspicious of the way that evdns_request_timeout_callback deletes req->timeout_event, then sometimes calls
request_finished, which will delete it again.

comment:6 Changed 11 years ago by mwenge

i think I have the same problem. It's a result of clearing memory with 0x0F I think:

#0 event_del (ev=0x855fd1c) at event.c:807
#1 0x080e7eb5 in nameserver_probe_callback (result=66, type=0 '\0', count=0, ttl=0, addresses=0x0, arg=0x84c36d0) at eventdns.c:507
#2 0x080e5bc0 in reply_callback (req=0x83a7168, ttl=0, err=1, reply=0x855fc44) at eventdns.c:724
#3 0x080e7645 in reply_handle (req=0x83a7168, flags=<value optimized out>, ttl=4294967295, reply=0xbfec4f8c) at eventdns.c:788
#4 0x080ea87a in nameserver_ready_callback (fd=21, events=<value optimized out>, arg=0x855fc40) at eventdns.c:999
#5 0xb7f8aa22 in event_base_loop (base=0x8146a40, flags=0) at event.c:387
#6 0xb7f8ad69 in event_loop (flags=0) at event.c:463
#7 0x080ac5e0 in do_main_loop () at main.c:1435
#8 0x080ac7f7 in tor_main (argc=1, argv=0xbfec5174) at main.c:2069
#9 0x080eab42 in main (argc=Cannot access memory at address 0x1
) at tor_main.c:30
^done

[19:18] <mwenge> looks like 'if (ev->ev_base == NULL)' and the like don't work anymore because tor is

clearing memory with memset((x), 0xF0, sizeof(*(x))

[19:19] <mwenge> nickm: i think that was your commit a short while ago.
[19:21] * mwenge thinks libevent svn must have changed too, checks
[19:25] <mwenge> no, svn libevent is still checking ev_base for NULL

comment:7 Changed 11 years ago by nickm

The CLEAR isn't the bug, but the old CLEAR usage was covering up this bug. We need to make it so the timeout
event gets deleted iff it was not previously deleted in these cases.

comment:8 Changed 11 years ago by xiando

Feb 22 12:53:33.527 [notice] Tor 0.2.1.12-alpha-dev (r18673) opening log file.
Feb 22 12:53:34.676 [notice] Parsing GEOIP file.
Feb 22 12:53:37.217 [notice] Your Tor server's identity key fingerprint is 'godzilla FF88 4504 6DB4 49DA 989B 025A EA1D 10E1 9E70 9166'
Feb 22 12:53:37.421 [notice] This version of Tor (0.2.1.12-alpha-dev) is newer than any recommended version, according to the directory authorities. Recommended versions are: 0.2.0.33,0.2.0.34,0.2.1.11-alpha,0.2.1.12-alpha
Feb 22 12:53:41.370 [notice] We now have enough directory information to build circuits.
Feb 22 12:53:41.370 [notice] Bootstrapped 80%: Connecting to the Tor network.
Feb 22 12:53:52.451 [notice] Bootstrapped 85%: Finishing handshake with first hop.
Feb 22 12:53:53.698 [notice] Bootstrapped 90%: Establishing a Tor circuit.
Feb 22 12:53:54.571 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
Feb 22 12:53:56.030 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Feb 22 12:53:56.030 [notice] Bootstrapped 100%: Done.
Feb 22 12:55:03.536 [notice] Performing bandwidth self-test...done.
tor[1181]: segfault at f0f0f0f0 ip b7f6f92f sp bfdcf0e0 error 5 in libevent-1.4.so.2.1.2[b7f6a000+15000]

# gdb /usr/bin/tor /var/lib/tor/data/core.1181
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libz.so.1...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /usr/lib/libevent-1.4.so.2...done.
Loaded symbols for /usr/lib/libevent-1.4.so.2
Reading symbols from /usr/lib/libssl.so.0.9.8...done.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /usr/lib/libcrypto.so.0.9.8...done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_compat.so.2...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `/usr/bin/tor --runasdaemon 1'.
Program terminated with signal 11, Segmentation fault.
[New process 1181]
[New process 1191]
#0 event_del (ev=0x8d297f8) at event.c:806
806 event.c: No such file or directory.

in event.c

(gdb) bt
#0 event_del (ev=0x8d297f8) at event.c:806
#1 0x080e2841 in request_finished (req=0x8d297c0, head=<value optimized out>) at eventdns.c:616
#2 0xb7f703ce in event_base_loop (base=0x81430a0, flags=0) at event.c:387
#3 0xb7f705b4 in event_loop (flags=0) at event.c:463
#4 0x080a9da5 in do_main_loop () at main.c:1435
#5 0x080aa03f in tor_main (argc=3, argv=0xbfdcf2e4) at main.c:2060
#6 0x080e660e in main (argc=Cannot access memory at address 0x1b
) at tor_main.c:30
(gdb) ev
Undefined command: "ev". Try "help".
(gdb) print "ev"
You can't do that without a process to debug.
(gdb) "print *ev"
Undefined command: "". Try "help".
(gdb) print *ev
$1 = {ev_next = {tqe_next = 0xf0f0f0f0, tqe_prev = 0xf0f0f0f0}, ev_active_next = {tqe_next = 0xf0f0f0f0, tqe_prev = 0xf0f0f0f0}, ev_signal_next = {

tqe_next = 0xf0f0f0f0, tqe_prev = 0xf0f0f0f0}, min_heap_idx = 4042322160, ev_base = 0xf0f0f0f0, ev_fd = -252645136, ev_events = -3856,

ev_ncalls = -3856, ev_pncalls = 0xf0f0f0f0, ev_timeout = {tv_sec = -252645136, tv_usec = -252645136}, ev_pri = -252645136, ev_callback = 0xf0f0f0f0,
ev_arg = 0xf0f0f0f0, ev_res = -252645136, ev_flags = -252645136}

(gdb)

comment:9 Changed 11 years ago by nickm

I've added some code in r18682 to try to track this down. If the cause is what I think it is, it should give warnings
now instead of crashing. If somebody would tell me when those warnings occur, that'd be great.

comment:10 Changed 11 years ago by nickm

Feb 23 19:29:52.982 [warn] eventdns: Duplicate timeout event_del from line 539: first call was at 616.
Feb 23 19:29:57.865 [warn] eventdns: Duplicate timeout event_del from line 648: first call was at 2019.
Feb 23 19:29:57.865 [warn] eventdns: Duplicate timeout event_del from line 648: first call was at 2019.
Feb 23 19:30:02.870 [warn] eventdns: Duplicate timeout event_del from line 648: first call was at 2019.
Feb 23 19:30:17.869 [warn] eventdns: Duplicate timeout event_del from line 648: first call was at 2019.
Feb 23 19:30:22.971 [warn] eventdns: Duplicate timeout event_del from line 539: first call was at 616.

comment:11 Changed 11 years ago by nickm

Okay; so the 616/539 one happens when we decide a nameserver is up (nameserver_up), and then later the probe we had
sent it fails. Nameserver_probe_failed() gets called from only one place (nameserver_probe_callback), but nameserver_up
can be called from a few.

Similarly with 2019/648, the first call is in evdns_request_timeout_callback (called when a request times out), and
the second is in request_finished(), which evdns_request_timeout_callback calls.

I've fixed the second case in svn, and need to think a little about the best way to fix the first. Maybe nameserver_up
should cancel the in-progress probe if any?

comment:12 Changed 11 years ago by nickm

I've got a patch for the last case of this one, but it requires fixes in the underlying evdns infrastructure, so
I'm running it by Niels first.

comment:13 Changed 10 years ago by nickm

Okay; every case of this should be fixed and/or worked around now, in Tor and in Libevent. (See bug 957 for
the actual full-on solution.)

comment:14 Changed 10 years ago by nickm

flyspray2trac: bug closed.

comment:15 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.