Opened 11 years ago

Last modified 7 years ago

#897 closed defect (Implemented)

crash when compiled with libevent 1.3e

Reported by: keb Owned by:
Priority: Low Milestone:
Component: Core Tor/Tor Version: 0.2.1.8-alpha
Severity: Keywords:
Cc: keb, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

the Tor process exits without a core dump or message in the log
if i compile it with libevent 1.3e
but works fine if i compile with libevent 1.4.8

software: tor-0.2.1.9-alpha
OS: xubuntu linux 8.10
hardware: Celeron 1.7GHz, 512MB RAM
configure line:

./configure --disable-iphone --enable-instrument-downloads --disable-largefile --enable-geoip-stats --enable-debug

log file:
Dec 27 21:41:24.804 [notice] Tor 0.2.1.9-alpha (r17777) opening log file.
Dec 27 21:41:24.852 [notice] Parsing GEOIP file.
Dec 27 21:41:25.843 [notice] Your Tor server's identity key fingerprint is 'cyblings01 A309 1FF0 545C 24A9 EA0D 87F5 FB7E 4098 DED0 C4FF'
Dec 27 21:41:36.432 [notice] We now have enough directory information to build circuits.
Dec 27 21:41:36.432 [notice] Bootstrapped 80%: Connecting to the Tor network.
Dec 27 21:41:36.610 [notice] Bootstrapped 85%: Finishing handshake with first hop.
Dec 27 21:41:37.198 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
Dec 27 21:41:37.433 [notice] Bootstrapped 90%: Establishing a Tor circuit.
Dec 27 21:41:38.988 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Dec 27 21:41:38.988 [notice] Bootstrapped 100%: Done.

i can attach a debug log and torrc file also if needed.

[Automatically added by flyspray2trac: Operating System: Other Linux]

Child Tickets

Attachments (3)

bt.1 (5.1 KB) - added by keb 11 years ago.
trace of second run
tor-0.2.1.10a-libevent13e-20090106-1.log (145.4 KB) - added by keb 11 years ago.
tor 0.2.0.10-alpha, libevent 1.3e
tor-0.2.1.10a-libevent148-20090106-2.log (22.7 KB) - added by keb 11 years ago.
tor 0.2.0.10-alpha, libevent 1.4.8

Download all attachments as: .zip

Change History (18)

comment:1 Changed 11 years ago by nickm

Interesting! I don't think a debug log is what we need, but a stack trace would probably help a lot. Check out

https://wiki.torproject.org/noreply/TheOnionRouter/TorFAQ#ReportBug

for instructions on generating a stack trace if you don't know how.

comment:2 Changed 11 years ago by keb

ok i found a core file and made a trace (see attachment 1), but only after i ran it again.
here is the notice-level log of that second run.

Dec 27 22:21:36.679 [notice] Tor 0.2.1.9-alpha (r17777) opening log file.
Dec 27 22:21:36.695 [notice] Parsing GEOIP file.
Dec 27 22:21:37.896 [notice] Your Tor server's identity key fingerprint is 'cyblings01 A309 1FF0 545C 24A9 EA0D 87F5 FB7E 4098 DED0 C4FF'
Dec 27 22:21:45.296 [notice] We now have enough directory information to build circuits.
Dec 27 22:21:45.296 [notice] Bootstrapped 80%: Connecting to the Tor network.
Dec 27 22:21:45.414 [notice] Bootstrapped 85%: Finishing handshake with first hop.

Changed 11 years ago by keb

Attachment: bt.1 added

trace of second run

comment:3 Changed 11 years ago by nickm

Posting here, because it's short:

(gdb) bt
#0 0xb7fa6430 in kernel_vsyscall ()
#1 0xb7c64880 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7c66248 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ca210d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7cab116 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7cac865 in malloc () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7df0e8e in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#7 0xb7df150f in CRYPTO_malloc () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#8 0xb7e1f63d in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#9 0xb7e1f7d1 in bn_expand2 () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#10 0xb7e22262 in BN_mul () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#11 0xb7e2a078 in BN_mod_mul_montgomery () from /usr/lib/i686/cmov/libcrypto.so.
0.9.8
#12 0xb7e1e2a4 in BN_mod_exp_mont () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#13 0xb7e3f92b in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#14 0xb7e4087e in RSA_public_decrypt () from /usr/lib/i686/cmov/libcrypto.so.0.9
.8
#15 0xb7e416cd in RSA_verify () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#16 0xb7e69837 in EVP_VerifyFinal () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#17 0xb7e751eb in ASN1_item_verify () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#18 0xb7e9b40e in X509_verify () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#19 0x0810765b in tor_tls_verify (severity=6, tls=0x92b0b10, identity_key=0xbfba612c) at tortls.c:1297
#20 0x0807ebd7 in connection_tls_finish_handshake (conn=0x928c4c8) at connection_or.c:954
#21 0x0807faf9 in connection_or_tls_renegotiated_cb (tls=0x92b0b10, _conn=0x928c4c8) at connection_or.c:822
#22 0x08107bf8 in tor_tls_read (tls=0x92b0b10, cp=0x8fdd4f8 "", len=8704) at tortls.c:958
#23 0x0805031c in read_to_buf_tls (tls=0x92b0b10, at_most=8704, buf=0x90e1830) at buffers.c:627
#24 0x080741ff in connection_handle_read (conn=0x928c4c8) at connection.c:2087
#25 0x080af678 in conn_read_callback (fd=15, event=2, _conn=0x928c4c8) at main.c:459
#26 0xb7f4b5dc in event_base_loop () from /usr/lib/libevent-1.3e.so.1
#27 0xb7f4b6d9 in event_loop () from /usr/lib/libevent-1.3e.so.1
#28 0x080af1a9 in do_main_loop () at main.c:1438
#29 0x080af465 in tor_main (argc=3, argv=0xbfba6444) at main.c:2064
#30 0x080ee662 in main (argc=Cannot access memory at address 0x33de

Looks like malloc() is calling abort() [from an assertion failure?] inside OpenSSL while it's trying to verify
a certificate. That might happen if the malloc heap has become corrupt, I guess. This looks like it might be
tricky to investigate. Roger--have we seen something like this before?

comment:4 Changed 11 years ago by nickm

So all I can think to do now is to ask questions at random and hope we get lucky and something tells us the answer.

Does the crash occurs at the same place in the code according to GDB every time?

Did the crash also occur with Tor 0.2.1.8-alpha and libevent 1.3e?

If you disable the EPOLL libevent backend (by seting the environment variable EVENT_NOEPOLL), does the crash still
happen?

comment:5 Changed 11 years ago by keb

I could not find another core file after trying 10 times - the process just disappeared. It gets to a different place each time in the startup sequence according in the notice log file.

There were no crashes with 0.2.1.8-alpha and libevent 1.3e .

Setting EVENT_NOEPOLL=1 and running it made no difference in crashing (i noticed at startup that it was using method "poll" though).

Finally, i made a copy of the data directory and ran it from console and got an immediate backtrace, which doesnt look as informative as the other one:

Dec 28 23:26:39.817 [notice] Tor v0.2.1.9-alpha (r17777). This is experimental software. Do not rely on it for strong anonymity. (Running on Linux i686)
Dec 28 23:26:39.820 [notice] Initialized libevent version 1.3e using method epoll. Good.
Dec 28 23:26:39.820 [notice] Opening OR listener on 0.0.0.0:9001
Dec 28 23:26:39.820 [notice] Opening Directory listener on 0.0.0.0:9030
Dec 28 23:26:39.820 [warn] Fixing permissions on directory /var/lib/tor.1
* glibc detected * /usr/local/bin/tor: corrupted double-linked list: 0x09fb8120 *
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7cb23f4]
/lib/tls/i686/cmov/libc.so.6[0xb7cb5472]
/lib/tls/i686/cmov/libc.so.6(libc_malloc+0x95)[0xb7cb6865]
/usr/local/bin/tor[0x80f5931]
/usr/local/bin/tor[0x80f6461]
/usr/local/bin/tor[0x80f1d2f]
/usr/local/bin/tor[0x80f20fd]
/usr/local/bin/tor[0x80630ee]
/usr/local/bin/tor[0x806a035]
/usr/local/bin/tor[0x806a5e9]
/usr/local/bin/tor[0x806bb6b]
/usr/local/bin/tor[0x806bfae]
/usr/local/bin/tor[0x80ab42d]
/usr/local/bin/tor[0x80af3dc]
/usr/local/bin/tor[0x80ee662]
/lib/tls/i686/cmov/libc.so.6(
libc_start_main+0xe5)[0xb7c59685]
/usr/local/bin/tor[0x804d771]
======= Memory map: ========
08048000-08142000 r-xp 00000000 09:03 712795 /usr/local/bin/tor
08142000-08143000 r--p 000f9000 09:03 712795 /usr/local/bin/tor
08143000-08147000 rw-p 000fa000 09:03 712795 /usr/local/bin/tor
08147000-08148000 rw-p 08147000 00:00 0
09fb3000-09ff5000 rw-p 09fb3000 00:00 0 [heap]
b7af1000-b7afe000 r-xp 00000000 09:03 868370 /lib/libgcc_s.so.1
b7afe000-b7aff000 r--p 0000c000 09:03 868370 /lib/libgcc_s.so.1
b7aff000-b7b00000 rw-p 0000d000 09:03 868370 /lib/libgcc_s.so.1
b7b00000-b7b21000 rw-p b7b00000 00:00 0
b7b21000-b7c00000 ---p b7b21000 00:00 0
b7c0a000-b7c0c000 rw-p b7c0a000 00:00 0
b7c0c000-b7c1c000 r-xp 00000000 09:03 872033 /lib/tls/i686/cmov/libresolv-2.8.90.so
b7c1c000-b7c1d000 r--p 0000f000 09:03 872033 /lib/tls/i686/cmov/libresolv-2.8.90.so
b7c1d000-b7c1e000 rw-p 00010000 09:03 872033 /lib/tls/i686/cmov/libresolv-2.8.90.so
b7c1e000-b7c20000 rw-p b7c1e000 00:00 0
b7c20000-b7c27000 r-xp 00000000 09:03 872034 /lib/tls/i686/cmov/librt-2.8.90.so
b7c27000-b7c28000 r--p 00007000 09:03 872034 /lib/tls/i686/cmov/librt-2.8.90.so
b7c28000-b7c29000 rw-p 00008000 09:03 872034 /lib/tls/i686/cmov/librt-2.8.90.so
b7c29000-b7c3e000 r-xp 00000000 09:03 872024 /lib/tls/i686/cmov/libnsl-2.8.90.so
b7c3e000-b7c3f000 r--p 00014000 09:03 872024 /lib/tls/i686/cmov/libnsl-2.8.90.so
b7c3f000-b7c40000 rw-p 00015000 09:03 872024 /lib/tls/i686/cmov/libnsl-2.8.90.so
b7c40000-b7c43000 rw-p b7c40000 00:00 0
b7c43000-b7d9b000 r-xp 00000000 09:03 872018 /lib/tls/i686/cmov/libc-2.8.90.so
b7d9b000-b7d9d000 r--p 00158000 09:03 872018 /lib/tls/i686/cmov/libc-2.8.90.so
b7d9d000-b7d9e000 rw-p 0015a000 09:03 872018 /lib/tls/i686/cmov/libc-2.8.90.so
b7d9e000-b7da1000 rw-p b7d9e000 00:00 0
b7da1000-b7da3000 r-xp 00000000 09:03 872021 /lib/tls/i686/cmov/libdl-2.8.90.so
b7da3000-b7da4000 r--p 00001000 09:03 872021 /lib/tls/i686/cmov/libdl-2.8.90.so
b7da4000-b7da5000 rw-p 00002000 09:03 872021 /lib/tls/i686/cmov/libdl-2.8.90.so
b7da5000-b7dba000 r-xp 00000000 09:03 872032 /lib/tls/i686/cmov/libpthread-2.8.90.so
b7dba000-b7dbb000 r--p 00014000 09:03 872032 /lib/tls/i686/cmov/libpthread-2.8.90.so
b7dbb000-b7dbc000 rw-p 00015000 09:03 872032 /lib/tls/i686/cmov/libpthread-2.8.90.so
b7dbc000-b7dbe000 rw-p b7dbc000 00:00 0
b7dbe000-b7ef0000 r-xp 00000000 09:03 722537 /usr/lib/i686/cmov/libcrypto.so.0.9.8
b7ef0000-b7ef8000 r--p 00132000 09:03 722537 /usr/lib/i686/cmov/libcrypto.so.0.9.8
b7ef8000-b7f05000 rw-p 0013a000 09:03 722537 /usr/lib/i686/cmov/libcrypto.so.0.9.8
b7f05000-b7f09000 rw-p b7f05000 00:00 0
b7f09000-b7f4b000 r-xp 00000000 09:03 722656 /usr/lib/i686/cmov/libssl.so.0.9.8
b7f4b000-b7f4c000 r--p 00041000 09:03 722656 /usr/lib/i686/cmov/libssl.so.0.9.8
b7f4c000-b7f4f000 rw-p 00042000 09:03 722656 /usr/lib/i686/cmov/libssl.so.0.9.8
b7f4f000-b7f50000 rw-p b7f4f000 00:00 0
b7f50000-b7f63000 r-xp 00000000 09:03 647196 /usr/lib/libevent-1.3e.so.1.0.3
b7f63000-b7f65000 rw-p 00013000 09:03 647196 /usr/lib/libevent-1.3e.so.1.0.3
b7f65000-b7f66000 rw-p b7f65000 00:00 0
b7f66000-b7f7a000 r-xp 00000000 09:03 712354 /usr/lib/libz.so.1.2.3.3
b7f7a000-b7f7c000 rw-p 00013000 09:03 712354 /usr/lib/libz.so.1.2.3.3
b7f93000-b7f95000 rw-p b7f93000 00:00 0
b7f95000-b7faf000 r-xp 00000000 09:03 869966 /lib/ld-2.8.90.so
b7faf000-b7fb0000 r-xp b7faf000 00:00 0 [vdso]
b7fb0000-b7fb1000 r--p 0001a000 09:03 869966 /lib/ld-2.8.90.so
b7fb1000-b7fb2000 rw-p 0001b000 09:03 869966 /lib/ld-2.8.90.so
bf99d000-bf9b2000 rw-p bffeb000 00:00 0 [stack]
Aborted

comment:6 Changed 11 years ago by keb

i should add that in all the runs it never got to the
"Self-testing indicates your DirPort is reachable from the outside. Excellent."
stage.

comment:7 Changed 11 years ago by nickm

Hm.

So 1.3e fails, but 1.4.8 doesn't.
Tor 0.2.1.8-alpha fails, and so does 0.2.1.9-alpha.
The warning, '* glibc detected * /usr/local/bin/tor: corrupted double-linked list: 0x09fb8120 *' suggests that
something is corrupting the system heap, but it's hard to tell what.

Are you able to run under valgrind? It is sometimes good at detecting memory corruption situations. It will dump
a bunch of openssl warnings, so you need to run it with --error-limit=no to see what's happening.

Changed 11 years ago by keb

tor 0.2.0.10-alpha, libevent 1.3e

Changed 11 years ago by keb

tor 0.2.0.10-alpha, libevent 1.4.8

comment:8 Changed 11 years ago by keb

ok i attached logs of 2 runs under valgrind.
(there is a mistake in labelling, they are from tor 0.2.1.10-alpha not 0.2.0.10-alpha.)
the run when compiled with libevent 1.3e died without leaving a core, while the other with 1.4.8 didnt die.

should i try again with 0.2.1.8-alpha?

comment:9 Changed 11 years ago by nickm

oho. The 1.3e log has lots of entries like:
==24552== Invalid write of size 4
==24552== at 0x405B53F: event_set (in /usr/lib/libevent-1.3e.so.1.0.3)
==24552== by 0x80ACEE5: connection_add (main.c:132)
==24552== by 0x8073231: retry_listeners (connection.c:967)
==24552== by 0x807390F: retry_all_listeners (connection.c:1489)
==24552== by 0x8069EF3: options_act_reversible (config.c:1084)
==24552== by 0x806A698: set_options (config.c:800)
==24552== by 0x806BC1A: options_init_from_string (config.c:4092)
==24552== by 0x806C05D: options_init_from_torrc (config.c:3966)
==24552== by 0x80AB39C: tor_init (main.c:1809)
==24552== by 0x80AF34B: tor_main (main.c:2054)
==24552== by 0x80EEA51: main (tor_main.c:30)
==24552== Address 0x43c73b8 is 0 bytes after a block of size 72 alloc'd
==24552== at 0x4025D2E: malloc (vg_replace_malloc.c:207)
==24552== by 0x80F5C10: _tor_malloc (util.c:131)
==24552== by 0x80F7136: _tor_malloc_zero (util.c:157)
==24552== by 0x80ACEAD: connection_add (main.c:130)
==24552== by 0x8073231: retry_listeners (connection.c:967)
==24552== by 0x807390F: retry_all_listeners (connection.c:1489)
==24552== by 0x8069EF3: options_act_reversible (config.c:1084)
==24552== by 0x806A698: set_options (config.c:800)
==24552== by 0x806BC1A: options_init_from_string (config.c:4092)
==24552== by 0x806C05D: options_init_from_torrc (config.c:3966)
==24552== by 0x80AB39C: tor_init (main.c:1809)
==24552== by 0x80AF34B: tor_main (main.c:2054)

==24552== Invalid read of size 4
==24552== at 0x405BB82: event_add (in /usr/lib/libevent-1.3e.so.1.0.3)
==24552== by 0x80ABA37: connection_start_reading (main.c:306)
==24552== by 0x80732BF: retry_listeners (connection.c:978)
==24552== by 0x807390F: retry_all_listeners (connection.c:1489)
==24552== by 0x8069EF3: options_act_reversible (config.c:1084)
==24552== by 0x806A698: set_options (config.c:800)
==24552== by 0x806BC1A: options_init_from_string (config.c:4092)
==24552== by 0x806C05D: options_init_from_torrc (config.c:3966)
==24552== by 0x80AB39C: tor_init (main.c:1809)
==24552== by 0x80AF34B: tor_main (main.c:2054)
==24552== by 0x80EEA51: main (tor_main.c:30)
==24552== Address 0x43c73c0 is 8 bytes after a block of size 72 alloc'd
==24552== at 0x4025D2E: malloc (vg_replace_malloc.c:207)
==24552== by 0x80F5C10: _tor_malloc (util.c:131)
==24552== by 0x80F7136: _tor_malloc_zero (util.c:157)
==24552== by 0x80ACEAD: connection_add (main.c:130)
==24552== by 0x8073231: retry_listeners (connection.c:967)
==24552== by 0x807390F: retry_all_listeners (connection.c:1489)
==24552== by 0x8069EF3: options_act_reversible (config.c:1084)
==24552== by 0x806A698: set_options (config.c:800)
==24552== by 0x806BC1A: options_init_from_string (config.c:4092)
==24552== by 0x806C05D: options_init_from_torrc (config.c:3966)
==24552== by 0x80AB39C: tor_init (main.c:1809)
==24552== by 0x80AF34B: tor_main (main.c:2054)

comment:10 Changed 11 years ago by nickm

(The 1.4.8 log, by contrast, has no such entries.)

Are you sure that when you're linking against libevent 1.3e, you're also building with the libevent 1.3e headers?

comment:11 Changed 11 years ago by nickm

I ask because the "struct event" in libevent 1.3 is longer than the one in libevent 1.4 by about 12 bytes. The only
think Tor uses that field for is to allocate a struct event. But if it's allocating one that's 12 bytes too short
(because it's looking at the 1.4 headers) and linking against libevent 1.3, then libevent 1.3 will write to memory
off the end of the structure.

We should probably patch libevent to catch this; it's an annoying bug.

comment:12 Changed 11 years ago by keb

ok i uninstalled libevent 1.4.8 and then rebuilt Tor and now it works with the system libevent 1.3e

Maybe it is an error in the configure script - when i tell it to use the locally compiled copy --with-libevent-dir=/usr/local/lib it should be linking against the one in there. whenever i don't give it that configure option, it detects (system) and should use the headers that come with the libevent-dev ubuntu package. should i be specifying another configure switch for the includes also?

comment:13 Changed 11 years ago by keb

also, when i apt-get remove libevent-dev i successfully build tor against libevent-1.4.8
using just the --with-libevent-dir switch as above.
so i guess it's a case of my inconsistent system configuration.

comment:14 Changed 11 years ago by nickm

flyspray2trac: bug closed.
Wrote some detection code to catch the misconfiguration here, I hope.

comment:15 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.