Opened 11 years ago

Last modified 7 years ago

#656 closed defect (Deferred)

Tor server crash in SSL_free with DH crypto error in logs

Reported by: mikeperry Owned by:
Priority: High Milestone: 0.2.0.x-final
Component: Core Tor/Tor Version: 0.2.0.23-rc
Severity: Keywords:
Cc: mikeperry, arma, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Just got a crash in r14173. Have a warn in the log right before:

Apr 12 12:30:19.323 [warn] crypto error while generating DH key: BN lib (in Diff
ie-Hellman routines:GENERATE_KEY).

Here is the backtrace of the thread that caused the crash:

#0 0x4cef366c in EVP_CIPHER_CTX_cleanup () from /lib/libcrypto.so.6
#1 0x4cfc4f35 in ssl_clear_cipher_ctx () from /lib/libssl.so.6
#2 0x4cfc6ab5 in SSL_free () from /lib/libssl.so.6
#3 0x080f335c in tor_tls_free (tls=0x40df1ac0) at tortls.c:831
#4 0x0806d2eb in _connection_free (conn=0x46229f00) at connection.c:328
#5 0x080a363c in connection_unlink (conn=0x46229f00) at main.c:212
#6 0x080a390e in close_closeable_connections () at main.c:603
#7 0x4cfe4125 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#8 0x4cfe4349 in event_loop () from /usr/lib/libevent-1.1a.so.1
#9 0x080a5149 in do_main_loop () at main.c:1446
#10 0x080a52fb in tor_main (argc=3, argv=0x59c9d5c4) at main.c:1986
#11 0x080d9ee2 in main (argc=Cannot access memory at address 0x0

The cpuworker thread was in the process of spitting out another
(or perhaps just finished the original?) warn:

#0 0x4cffe402 in kernel_vsyscall ()
#1 0x4cdcbf7b in write () from /lib/libc.so.6
#2 0x4cd6d884 in _IO_new_file_write () from /lib/libc.so.6
#3 0x4cd6d545 in new_do_write () from /lib/libc.so.6
#4 0x4cd6d82f in _IO_new_do_write () from /lib/libc.so.6
#5 0x4cd6e006 in _IO_new_file_sync () from /lib/libc.so.6
#6 0x4cd62c3c in fflush () from /lib/libc.so.6
#7 0x080daaa7 in logv (severity=4, domain=2, funcname=0x0,

format=0x8124004 "crypto error while %s: %s (in %s:%s)",
ap=0x4aaa6eec "°<\022\baáõLÄßõL>ÀõLàV¾4\200") at log.c:295

#8 0x080dad0e in _log (severity=4, domain=2,

format=0x8124004 "crypto error while %s: %s (in %s:%s)") at log.c:314

#9 0x080eb1a7 in crypto_log_errors (severity=4,

doing=0x8123cb0 "generating DH key") at crypto.c:146

#10 0x080ec104 in crypto_dh_generate_public (dh=0x34be56e0) at crypto.c:1467
#11 0x080ec31b in crypto_dh_get_public (scrubbed) at crypto.c:1492
#12 0x080a9e7e in onion_skin_server_handshake (scrubbed)

at onion.c:267

#13 0x08083a70 in cpuworker_main (data=0x4c2f2090) at cpuworker.c:284
#14 0x080e11ed in tor_pthread_helper_fn (_data=0x4c2f20a0) at compat.c:1482
#15 0x4ce5745b in start_thread () from /lib/libpthread.so.0
#16 0x4cddb24e in clone () from /lib/libc.so.6

Unfortunately the logs were at notice. The node had just started, not much else was present.
I'm rerunning it at info.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (14)

comment:1 Changed 11 years ago by nickm

Weird! Is this trunk? Also, exactly what version of OpenSSL are you using? (Please mention any vendor patches, etc)

comment:2 Changed 11 years ago by mikeperry

openssl-0.9.8b-8.3.el5_0.2 on CentOS 5.1 on tor's 0.2.0 branch (0.2.0.23). I have the complete core if you think it may help. It's 160M though..

comment:3 Changed 11 years ago by nickm

Interesting to note: the cpuworker thread has the DH keygen fail, but the main thread is the one that had a different
SSL object entirely crash. I have no idea what's going on here, but I'll poke around the ssl source a little.

comment:4 Changed 11 years ago by mikeperry

Got what looks like another instance (from the backtrace anyways), right after startup. This time
its with Tor version 0.2.0.24-rc (r14422).

(gdb) t a a bt

Thread 2 (process 29466):
#0 0x55c5f402 in kernel_vsyscall ()
#1 0x55abf138 in recv () from /lib/libpthread.so.0
#2 0x08083b7f in cpuworker_main (data=0x519e1d30) at cpuworker.c:255
#3 0x080e15ad in tor_pthread_helper_fn (_data=0x519e1d60) at compat.c:1482
#4 0x55ab845b in start_thread () from /lib/libpthread.so.0
#5 0x55a3c24e in clone () from /lib/libc.so.6

Thread 1 (process 29462):
#0 0x55b5466c in EVP_CIPHER_CTX_cleanup () from /lib/libcrypto.so.6
#1 0x55c25f35 in ssl_clear_cipher_ctx () from /lib/libssl.so.6
#2 0x55c27ab5 in SSL_free () from /lib/libssl.so.6
#3 0x080f375c in tor_tls_free (tls=0x4c569d80) at tortls.c:835
#4 0x0806d45b in _connection_free (conn=0x51a30000) at connection.c:328
#5 0x080a385c in connection_unlink (conn=0x51a30000) at main.c:212
#6 0x080a3b2e in close_closeable_connections () at main.c:603
#7 0x55c45125 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#8 0x55c45349 in event_loop () from /usr/lib/libevent-1.1a.so.1
#9 0x080a5369 in do_main_loop () at main.c:1446
#10 0x080a551b in tor_main (argc=3, argv=0x5f3629a4) at main.c:1988
#11 0x080da282 in main (argc=Cannot access memory at address 0x0
) at tor_main.c:29
(gdb)

comment:5 Changed 11 years ago by mikeperry

Oh, also, there was no DH warn in the logs this time around.

comment:6 Changed 11 years ago by mikeperry

Getting this log message sequence pretty reliably before the crash. Also, if
I turn off my V3 authority status, the crash does not happen.

Apr 23 01:26:18.051 [debug] tor_tls_handshake(): Server sent back a single certi
ficate; looks like a v2 handshake on 0x49ee72c0.
Apr 23 01:26:18.051 [debug] connection_tls_continue_handshake(): wanted read
Apr 23 01:26:18.051 [debug] conn_read_callback(): socket 1470 wants to read.
Apr 23 01:26:18.051 [info] TLS error while handshaking with [scrubbed]: malloc f
ailure (in SSL routines:TLS1_CHANGE_CIPHER_STATE)
Apr 23 01:26:18.051 [info] connection_tls_continue_handshake(): tls error [misc
error]. breaking connection.

comment:7 Changed 11 years ago by mikeperry

Hrmm, not always a 'malloc error':

Apr 23 01:42:47.565 [info] TLS error while handshaking with [scrubbed]: compression library error (in SSL routines:TLS1_CHANGE_CIPHER_STATE)
Apr 23 01:42:47.565 [info] connection_tls_continue_handshake(): tls error [misc error]. breaking connection.
Apr 23 01:42:47.565 [debug] conn_close_if_marked(): Cleaning up connection (fd -1).

comment:8 Changed 11 years ago by mikeperry

Crash still happens on 0.9.8b vanilla, but I'm getting extremely strange stack traces for it, and no obvious loglines:

(gdb) t a a bt

Thread 2 (process 5582):
#0 0x4a8ad402 in kernel_vsyscall ()
#1 0x4a713138 in recv () from /lib/libpthread.so.0
#2 0x080838bf in cpuworker_main (data=0x8b25530) at cpuworker.c:255
#3 0x080e13bd in tor_pthread_helper_fn (_data=0x8a068b8) at compat.c:1482
#4 0x4a70c45b in start_thread () from /lib/libpthread.so.0
#5 0x4a69024e in clone () from /lib/libc.so.6

Thread 1 (process 4514):
#0 0x4a7a2a49 in sk_value () from /lib/libcrypto.so.6
#1 0x080efc60 in tor_tls_handshake (tls=0xd895190) at tortls.c:947
#2 0x0807a1e9 in connection_tls_continue_handshake (conn=0xd895008)

at connection_or.c:627

#3 0x08070430 in connection_handle_read (conn=0xd895008) at connection.c:1950
#4 0x080a5518 in conn_read_callback (fd=24, event=2, _conn=0xd895008)

at main.c:457

#5 0x4a893125 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#6 0x4a893349 in event_loop () from /usr/lib/libevent-1.1a.so.1
#7 0x080a50a9 in do_main_loop () at main.c:1446
#8 0x080a525b in tor_main (argc=3, argv=0x59c4da24) at main.c:1988
#9 0x080d9fc2 in main (argc=769, argv=0x2000) at tor_main.c:29

and

(gdb) t a a bt

Thread 2 (process 23195):
#0 0x5433e402 in kernel_vsyscall ()
#1 0x54198138 in recv () from /lib/libpthread.so.0
#2 0x080838bf in cpuworker_main (data=0x8c492d8) at cpuworker.c:255
#3 0x080e13bd in tor_pthread_helper_fn (_data=0x8a4bce0) at compat.c:1482
#4 0x5419145b in start_thread () from /lib/libpthread.so.0
#5 0x5411524e in clone () from /lib/libc.so.6

Thread 1 (process 23163):
#0 0x54244fbd in ?? ()
#1 0x080ef9fd in tor_tls_client_is_using_v2_ciphers (

ssl=<value optimized out>, address=0x88bf0b8 "[scrubbed]") at tortls.c:648

#2 0x080f03c6 in tor_tls_server_info_callback (ssl=0x8975af8, type=8193,

val=1) at tortls.c:706

#3 0x542f550b in ?? ()
#4 0x08975af8 in ?? ()
#5 0x00002001 in ?? ()
#6 0x00000001 in ?? ()
#7 0x00000000 in ?? ()

comment:9 Changed 11 years ago by mikeperry

Tried firewalling off incoming connections, did not change things..

comment:10 Changed 11 years ago by mikeperry

And now, after crashing every time I started it with the distribution's openssl
yesterday, it is not crashing at all today. Potentially a function of the number
of active authorites? Or perhaps just pure chaos.

comment:11 Changed 11 years ago by nickm

eep. I personally blame magical hedgehogs at the moment.

comment:12 Changed 11 years ago by arma

I'm going to close this bug, since it's been almost a month and we still
have no idea if it's hardware or what.

If it recurs, please reopen.

comment:13 Changed 11 years ago by arma

flyspray2trac: bug closed.
Who knows. Maybe it'll come back.

comment:14 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.