Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#18286 closed defect (fixed)

tor 0.2.8.1-alpha-dev - dumping core on test, tor binary dumps core as well

Reported by: yancm Owned by: yawning
Priority: Very High Milestone: Tor: 0.2.8.x-final
Component: Core Tor/Tor Version: Tor: 0.2.8.1-alpha
Severity: Critical Keywords: crash must-fix-before-028-rc
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

W/NetBSD 6_1_Stable i386, OpenSSL 1.1.0-pre3-dev, Libevent 2.1.5-beta
tor 0.2.8.1-alpha-dev (latest git root sources as of 2016.02.08)

# pwd
/usr/local/src/tor
# gmake test
gmake all-am
gmake[1]: Entering directory '/usr/local/src/tor'
gmake[1]: Leaving directory '/usr/local/src/tor'
./src/test/test
Memory fault (core dumped)
Makefile:7219: recipe for target 'test' failed
gmake: * [test] Error 139
#


I tried to compile with debug symbols, but think it is not working:

# gdb src/test/test test.core
GNU gdb (GDB) 7.3.1
Reading symbols from /usr/local/src/tor/src/test/test...done.
[New process 1]
Cannot access memory at address 0xffffff55
(gdb) bt
#0 0x006852d5 in OBJ_cleanup ()
#1 0xbb8e3880 in ?? ()
#2 0xbbbe66e0 in ?? ()
#3 0xbb8e389a in ?? ()
#4 0xbbbff510 in ?? ()
#5 0xbbbf322b in ?? ()
#6 0x00000003 in ?? ()
#7 0xbfbfebf8 in ?? ()
#8 0x00000000 in ?? ()
(gdb)

Child Tickets

Change History (28)

comment:1 Changed 2 years ago by teor

Keywords: crash added
Milestone: Tor: unspecifiedTor: 0.2.8.x-final
Priority: MediumVery High
Severity: NormalCritical

comment:2 Changed 2 years ago by Sebastian

Can you run a git bisect to find the faulty commit? I can't reproduce the issue.

comment:3 Changed 2 years ago by yancm

Is there a better way to compile debug symbols into the build so gdb will give better diagnostics?

If I back up and compile against the "pkgsrc" version of openssl, tor passes the tests, so I suspect I have something messed up in the way I am compiling against openssl 1.1.0-dev. Will continue to investigate. BTW, openssl 1.1.0-dev passes it's self tests fine...

comment:4 in reply to:  3 Changed 2 years ago by cypherpunks

Replying to yancm:

Is there a better way to compile debug symbols into the build so gdb will give better diagnostics?

When you configure the build, use ./configure CFLAGS="-O0 -ggdb". Also be sure to recompile any already built objects by running make clean before make.

comment:5 Changed 2 years ago by nickm

Status: newneeds_information

comment:6 Changed 2 years ago by yancm

Here's what I have found so far. I compiled tor with debug symbols and have experimented with executables for both test and tor in gdb. I tried to set a break at the main entry point, but any time it ran, it ended up crashed in the openssl code somewhere. I could not get a backtrace as the stack was somehow no longer available. I did manage to get a debug build of the openssl 1.1.0.dev.alpha3 (something like that anyway), but then openssl changed their configure and build scripts with 1.1.0.dev.alpha4 and I have not yet been able to get a new debug build of openssl...or any build really.

My next step is to turn in an openssl ticket on the build problems and will also ask how to preserve the stack on debug as apparently that is a flag in the config that I cannot figure out...

My apologies for moving slow on this one, day job and life etc. are limiting my time on this ATM...

comment:7 Changed 2 years ago by yancm

I can no longer build tor against OpenSSL 1.1.0.dev.alpha* or openssl-1.1.0-pre5-dev which is current as of this typing...

This ticket should probably be closed...

That said, I am still interested in building tor against the latest OpenSSL. I am not familiar enough with tors OpenSSL interface to help with updating the code. I would suggest it looks like the OpenSSL team is nearing some sort of release since the versioning has moved from alpha to preview...

If someone on the tor team has time again to try to adjust the api, I suggest they test against just the 1.1.0 api only by configuring with the following flag "./config --api=1.1.0".

thank you...

comment:8 Changed 2 years ago by teor

Keywords: must-fix-before-028-rc added

It would be great if tor-0.2.8-stable is able to build with OpenSSL 1.1.0, particularly if they are released around the same time.

That said, we could fix OpenSSL 1.1.0 compatibility in a point release, after OpenSSL does their release.

Nick, what do you think?

comment:9 Changed 2 years ago by nickm

Huh. Yeah, it would be good to support OpenSSL 1.1.0. I thought we did! Let's give it a try before the release and try to fix it if we can.

(Also we should make sure we build with LibreSSL.)

comment:10 Changed 2 years ago by yawning

Owner: set to yawning
Status: needs_informationassigned

comment:11 Changed 2 years ago by yawning

Status: assignedneeds_review

Can't reproduce the test crash with OpenSSL 1.1.0-pre4 or master so I'm inclined to say "Upgrade to a more recent OpenSSL development pre-release snapshot".

This gets OpenSSL 1.1.0-pre4 and 1.1.0-pre5-dev (aka master) building and the tests don't crash, I think the changes are low risk enough that they can be taken for 0.2.8.x, though I did the work against master.

https://git.schwanenlied.me/yawning/tor/src/bug18286

comment:12 Changed 2 years ago by yancm

I am still unable to build tor (master) against OpenSSL (master)?

I receive the following:
# gmake
gmake all-am
gmake[1]: Entering directory '/usr/local/src/tor'

CC src/common/crypto.o

src/common/crypto.c: In function 'crypto_early_init':
src/common/crypto.c:295:5: warning: implicit declaration of function 'ERR_load_crypto_strings'
src/common/crypto.c:296:5: warning: implicit declaration of function 'OpenSSL_add_all_algorithms'
src/common/crypto.c: In function 'crypto_thread_cleanup':
src/common/crypto.c:420:3: error: too many arguments to function 'ERR_remove_thread_state'
/usr/local/include/openssl/err.h:363:6: note: declared here
src/common/crypto.c: In function 'openssl_locking_cb_':
src/common/crypto.c:3082:14: error: 'CRYPTO_LOCK' undeclared (first use in this function)
src/common/crypto.c:3082:14: note: each undeclared identifier is reported only once for each function it appears in
src/common/crypto.c: At top level:
src/common/crypto.c:3139:43: error: expected ')' before '*' token
src/common/crypto.c: In function 'setup_openssl_threading':
src/common/crypto.c:3151:3: warning: implicit declaration of function 'CRYPTO_num_locks'
src/common/crypto.c:3156:3: warning: implicit declaration of function 'CRYPTO_set_locking_callback'
src/common/crypto.c:3157:3: warning: implicit declaration of function 'CRYPTO_THREADID_set_callback'
src/common/crypto.c:3157:32: error: 'tor_set_openssl_thread_id' undeclared (first use in this function)
src/common/crypto.c: In function 'crypto_global_cleanup':
src/common/crypto.c:3173:3: error: too many arguments to function 'ERR_remove_thread_state'
/usr/local/include/openssl/err.h:363:6: note: declared here
Makefile:3341: recipe for target 'src/common/crypto.o' failed
gmake[1]: * [src/common/crypto.o] Error 1
gmake[1]: Leaving directory '/usr/local/src/tor'
Makefile:1930: recipe for target 'all' failed
gmake:
* [all] Error 2

When you built OpenSSL did you build 1.1.0 api explicitly? ("./config --api=1.1.0")

comment:13 in reply to:  12 Changed 2 years ago by yawning

Replying to yancm:

When you built OpenSSL did you build 1.1.0 api explicitly? ("./config --api=1.1.0")

No. I just did the minimum required to get it to build with a debug enabled OpenSSL build. I don't see the point in requesting an API that is obviously still in flux (As in, more shit is broken in master than is in pre4). The things that are deprecated by still declared can wait.

As a side note, the line numbers in your output don't match what I have in my branch, and it's puking on things that I did fix in addition to things that I have that you don't, so part of this likely is human error.

That said, I did push a fixup commit that removes slightly more code (orthogonal to your failure case).

comment:14 Changed 2 years ago by yancm

I believe the point in building only against api 1.1.0 is this removes earlier api's that the developers know will be deprecated in 1.1.0; hence if we can build against 1.1.0 api, it probably more stable than building against the amalgam of all api's in master that are not yet deprecated.

I'm not sure I understand the "human error" part? I'm always willing to learn...

comment:15 Changed 2 years ago by nickm

The OpenSSL release notes say:

Deprecated interfaces can now be disabled at build time either relative to the latest release via the "no-deprecated" Configure argument, or via the "--api=1.1.0|1.0.0|0.9.8" option.

So that makes it sound to me like "--api=1.1.0" is talking about deprecated interfaces that will nevertheless be included in the 1.1.0 release.

comment:16 Changed 2 years ago by yancm

This makes sense. I was reading it as talking about deprecated interfaces that *might* be included in the 1.1.0 release and hence relying on them might be more of a moving target.

comment:17 Changed 2 years ago by nickm

I worry a tiny bit about the use of OPENSSL_VER, given that it doesn't seem to exist in older versions of OpenSSL, and that we already have OPENSSL_V_*() macros. But it doesn't seem to actually break anything for me. So, squashing and merging in 0.2.8.

comment:18 Changed 2 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

comment:19 Changed 2 years ago by nickm

I worry a tiny bit about the use of OPENSSL_VER, given that it doesn't seem to exist in older versions of OpenSSL, and that we already have OPENSSL_V_*() macros.

oh damn, WE define OPENSSL_VER(). Never mind.

comment:20 Changed 2 years ago by yancm

Well, it compiles for me but crashes in openssl code as before...i did compile openssl with debug symbols, as well as tor.
openssl config : ./config --debug no-shared
[...]
1.1.0-pre5-dev passed all tests:
All tests successful.
Files=75, Tests=393, 269 wallclock secs ( 1.55 usr 0.21 sys + 219.72 cusr 41.79 csys = 263.27 CPU)
Result: PASS

in my tor directory...
tor config : ./configure CFLAGS="-O0 -ggdb" --with-libevent-dir=/usr/local --enable-static-openssl=1 --with-openssl-dir=/usr/local/ssl

started with gmake clean and build was complete...
then
# gmake test
gmake all-am
gmake[1]: Entering directory '/usr/local/src/tor'
gmake[1]: Leaving directory '/usr/local/src/tor'
./src/test/test
Memory fault (core dumped)
Makefile:7219: recipe for target 'test' failed
gmake: * [test] Error 139
# gdb ./src/test/test test.core
GNU gdb (GDB) 7.3.1
Reading symbols from /usr/local/src/tor/src/test/test...done.
[New process 1]

warning: Corrupted shared library list

warning: Corrupted shared library list
Core was generated by `test'.
Program terminated with signal 11, Segmentation fault.
#0 0x00500a15 in dasync_rsa_priv_enc (flen=-1145047792, from=0xbfbfebf8 "", to=0xbbbf322b <Address 0xbbbf322b out of bounds>, rsa=0x3, padding=-1077941256) at engines/e_dasync.c:563
563 return RSA_PKCS1_OpenSSL()->rsa_priv_enc(flen, from, to, rsa, padding);
(gdb) bt
#0 0x00500a15 in dasync_rsa_priv_enc (flen=-1145047792, from=0xbfbfebf8 "", to=0xbbbf322b <Address 0xbbbf322b out of bounds>, rsa=0x3, padding=-1077941256) at engines/e_dasync.c:563
#1 0xbb8e290a in ?? ()
#2 0xbbbff510 in ?? ()
#3 0xbbbf322b in ?? ()
#4 0x00000003 in ?? ()
#5 0xbfbfebf8 in ?? ()
#6 0x00000000 in ?? ()
(gdb) print flen
$1 = -1145047792
(gdb) print *flen
$2 = 70676
(gdb) print from
$3 = (const unsigned char *) 0xbfbfebf8 ""
(gdb) print *from
$4 = 0 '\000'
(gdb) print to
$5 = (unsigned char *) 0xbbbf322b <Address 0xbbbf322b out of bounds>
(gdb) print *to
Cannot access memory at address 0xbbbf322b
(gdb) print rsa
$6 = (RSA *) 0x3
(gdb) print *rsa
Cannot access memory at address 0x3
(gdb) print padding
$7 = -1077941256
(gdb) print *padding
$8 = 0
(gdb) frame
#0 0x00500a15 in dasync_rsa_priv_enc (flen=-1145047792, from=0xbfbfebf8 "", to=0xbbbf322b <Address 0xbbbf322b out of bounds>, rsa=0x3, padding=-1077941256) at engines/e_dasync.c:563
563 return RSA_PKCS1_OpenSSL()->rsa_priv_enc(flen, from, to, rsa, padding);

I'm quite rusty on my gdb...some pointers would be helpful.

comment:21 Changed 2 years ago by yawning

This is crashing somewhere deep in OpenSSL, and the stack looks rather corrupted. Based on the information you provided, I can't tell if it's an OpenSSL issue or a tor issue. Since the changes I made to get 1.1.0 to work again (tests pass on Linux/amd64) weren't in this particular part of the code, I'd need to replicate the issue to really know more.

If I were to try to debug this, I'd figure out what test was crashing and single step through the code to see exactly when the stack gets trashed, but I don't have time to set up a NetBSD VM to try to replicate the issue at the moment.

comment:22 Changed 2 years ago by nickm

For what it's worth, I just tried to reproduce this with maint-0.2.8 and with openssl master, and that worked okay for me on Linux.

comment:23 Changed 2 years ago by yancm

From what I can tell test is failing on the initial test?

The tor executable fails in the same manner... here is a debug session in tor...
clarity 127 # gdb src/or/tor
GNU gdb (GDB) 7.3.1
Reading symbols from /usr/local/src/tor/src/or/tor...done.
(gdb) break main
Breakpoint 1 at 0x6d248: file src/or/tor_main.c, line 29.
(gdb) start
Temporary breakpoint 2 at 0x6d248: file src/or/tor_main.c, line 29.
Starting program: /usr/local/src/tor/src/or/tor
Cannot access memory at address 0x3bc
Cannot access memory at address 0x3bc
Cannot access memory at address 0x3bc
Cannot access memory at address 0x3bc

Program received signal SIGSEGV, Segmentation fault.
0x002e4c15 in dasync_rsa_priv_enc (flen=-1145047792, from=0xbfbfec44 "", to=0xbbbf322b ",$\350\345\365\377\377\213D$\f;h\a", rsa=0x3, padding=-1077941180) at engines/e_dasync.c:563
563 return RSA_PKCS1_OpenSSL()->rsa_priv_enc(flen, from, to, rsa, padding);
(gdb)

Does this imply it's crashing on initialization?

comment:24 Changed 2 years ago by yawning

So I lied about not having time since this bothered me... With OpenSSL master, tor master, everything else from pkgsrc...

The only options I passed to OpenSSL were -d --prefix=/opt/openssl-git.
Tor was configured with: --with-openssl-dir=/opt/openssl-git/ --enable-gcc-warnings, the latter which breaks the build (Modify CFLAGS to add -Wno-char-subscripts for now).

  FAIL ../src/test/test_options.c:1106: assert(tdata)
  [validate__transproxy FAILED]

[snip]

1/682 TESTS FAILED. (31 skipped)
Makefile:7219: recipe for target 'test' failed
gmake: *** [test] Error 1
thermopylae$ uname -a
NetBSD thermopylae.schwanenlied.me.lan 7.0 NetBSD 7.0 (GENERIC.201509250726Z) amd64
thermopylae$ ./src/or/tor
Apr 05 22:54:43.847 [notice] Tor v0.2.9.0-alpha-dev (git-b46d126e647a0789) running on NetBSD with Libevent 2.0.21-stable, OpenSSL 1.1.0-pre5-dev and Zlib 1.2.3.
Apr 05 22:54:43.847 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning

[snip]

Apr 05 22:54:58.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Apr 05 22:54:58.000 [notice] Bootstrapped 100%: Done

¯\_(ツ)_/¯

comment:25 Changed 2 years ago by yancm

When you ran the tor configure script, what does the openssl section messages say?

On my system, NetBSD 6_Stable, I have to actively *hide* the system SSL libraries and includes in order for the tor configure script to find the right one. (I have turned in a bug report in the past on this one...)

Please do take a look at the configure output...

comment:26 Changed 2 years ago by yawning

I renamed /usr/include/openssl to /usr/include/openssl-fuck along with /usr/lib/libcrypto.a and /usr/lib/libssl.a, and rebuilt my tor and got the same results, and the configure output looks fine, unless NetBSD's gcc has an odd idea of what -I and -L mean.

*shrug*

comment:27 Changed 2 years ago by nickm

I wonder if you have to try explicitly building static?

comment:28 in reply to:  27 Changed 2 years ago by yawning

Replying to nickm:

I wonder if you have to try explicitly building static?

I built a copy of 1.1.0-pre5-dev, and everything still works as expected. I guess I could go back to an older NetBSD since I used 7.0 instead of 6.0.x, but at this point I'm fairly certain it's not a tor issue, and I don't feel like debugging NetBSD and OpenSSL.

Note: See TracTickets for help on using tickets.