Opened 7 weeks ago

Closed 3 weeks ago

#27948 closed defect (fixed)

Backtrace does not work on NetBSD

Reported by: wiz Owned by: teor
Priority: High Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version: Tor: 0.2.5.2-alpha
Severity: Normal Keywords: fast-fix regression 035-backport 034-backport netbsd 033-backport 029-backport
Cc: Actual Points:
Parent ID: #17808 Points:
Reviewer: ahf Sponsor:

Description

I've run the self-tests on NetBSD-8.99.25/amd64, and I see two issues:

# TOTAL: 19
# PASS: 12
# SKIP: 5
# XFAIL: 0
# FAIL: 2
# XPASS: 0
# ERROR: 0

...
FAIL: src/test/test
===================
....
util/thread/conditionvar_timeout: [forking]

FAIL src/test/test_threads.c:285: assert(ti->n_timeouts OP_EQ 2): 1 vs 2Sep 10 14:30:54.789 [err] Error 16 destroying a mutex.

Sep 10 14:30:54.789 [err] tor_assertion_failed_(): Bug: src/common/compat_pthreads.c:172: tor_mutex_uninit: Assertion 0 failed; aborting. (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: Assertion 0 failed in tor_mutex_uninit at src/common/compat_pthreads.c:172. Stack trace: (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefe62985 <log_backtrace+0x4e> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefe7d021 <tor_assertion_failed_+0xa0> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefe80e3d <tor_mutex_uninit+0xa6> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefe693ff <tor_mutex_free_+0x2e> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefc92ee2 <test_threads_conditionvar+0xefa001f3> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefcedd21 <testcase_run_bare_+0xefa00051> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefcedecd <testcase_run_one+0x158> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefcee51c <tinytest_main+0x107> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
Sep 10 14:30:54.793 [err] Bug: 0xefe99621 <main+0x2d1> at ./src/test/test (on Tor 0.3.4.8 da95b91355248ad8)
[Lost connection!]

[conditionvar_timeout FAILED]

util/handle/basic: OK
...
FAIL: src/test/test_bt.sh
=========================

OK
[1] Abort trap "${builddir:-.}/src/test/test-bt-cl" assert 2>&1 |

Done "${PYTHON:-python}" "${abs_top_srcdir:-.}/src/...

BAD

============================================================ T= 1536589911
Tor died: Caught signal 11
0x94a0aa3d <crash_handler+0x94a00043> at ./src/test/test-bt-cl
0x94a0a8cd <crash+0x45> at ./src/test/test-bt-cl
[1] Abort trap "${builddir:-.}/src/test/test-bt-cl" crash 2>&1 |

Done(1) "${PYTHON:-python}" "${abs_top_srcdir:-.}/src/...

-158318
FAIL src/test/test_bt.sh (exit status: 1)

# gdb src/test/test test.core
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from src/test/test...done.
[New process 1]
[New process 5]
[New process 2]
bCore was generated by `test'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007500d531eeca in _lwp_kill () from /usr/lib/libc.so.12
[Current thread is 1 (process 1)]
(gdb) bt
#0 0x00007500d531eeca in _lwp_kill () from /usr/lib/libc.so.12
#1 0x00007500d531eb57 in abort () at /usr/src/lib/libc/stdlib/abort.c:74
#2 0x00000000efe80e42 in tor_mutex_uninit (m=m@entry=0x7500d3e0b080) at src/common/compat_pthreads.c:172
#3 0x00000000efe693ff in tor_mutex_free_ (m=0x7500d3e0b080) at src/common/compat_threads.c:55
#4 0x00000000efc92ee2 in cv_testinfo_free (i=0x7500d3e09080) at src/test/test_threads.c:186
#5 test_threads_conditionvar (arg=<optimized out>) at src/test/test_threads.c:290
#6 0x00000000efcedd21 in testcase_run_bare_ (testcase=testcase@entry=0xf0263850 <thread_tests+80>) at src/ext/tinytest.c:106
#7 0x00000000efcedecd in testcase_run_forked_ (group=<optimized out>, testcase=0xf0263850 <thread_tests+80>) at src/ext/tinytest.c:190
#8 testcase_run_one (group=<optimized out>, testcase=0xf0263850 <thread_tests+80>) at src/ext/tinytest.c:248
#9 0x00000000efcee51c in tinytest_main (c=<optimized out>, v=<optimized out>, groups=<optimized out>) at src/ext/tinytest.c:435
#10 0x00000000efe99621 in main (c=1, v=0x7f7fffc611d8) at src/test/testing_common.c:319

Child Tickets

Change History (16)

comment:1 Changed 6 weeks ago by nickm

Component: Core TorCore Tor/Tor
Keywords: regression 034-backport netbsd added
Milestone: Tor: 0.3.5.x-final
Priority: MediumHigh

Are both of these issues reproducible?

Do they happen every time, or only sometimes?

Do they also occur with 0.3.5.2-alpha? The first one looks like #27073.

comment:2 Changed 6 weeks ago by nickm

Status: newneeds_information

While investigating this, I found #27990 in master. But I still think that this is probably a case of #27073. To make certain, though, it would be helpful to have answers for the questions above.

comment:3 Changed 5 weeks ago by wiz

Yes, it is quite repeatable for me in 0.3.4.8.

0.3.5.2 is better, only the test_bt.sh failure happens there:

FAIL: src/test/test_bt.sh
=========================

OK
[1] Abort trap "${builddir:-.}/src/test/test-bt-cl" assert 2>&1 |

Done "${PYTHON:-python}" "${abs_top_srcdir:-.}/src/...

BAD

============================================================ T= 1539708038
Tor died: Caught signal 11
0x1b6e09fd6 <crash_handler+0x1b6e00043> at ./src/test/test-bt-cl
0x1b6e03b1d <crash+0x45> at ./src/test/test-bt-cl
[1] Abort trap "${builddir:-.}/src/test/test-bt-cl" crash 2>&1 |

Done(1) "${PYTHON:-python}" "${abs_top_srcdir:-.}/src/...

-158318
FAIL src/test/test_bt.sh (exit status: 1)

comment:4 Changed 5 weeks ago by wiz

Fails the same way in 3.5.3-alpha:

FAIL: src/test/test_bt.sh
=========================

OK
[1] Abort trap "${builddir:-.}/src/test/test-bt-cl" assert 2>&1 |

Done "${PYTHON:-python}" "${abs_top_srcdir:-.}/src/...

BAD

============================================================ T= 1539845377
Tor died: Caught signal 11
0x1de809fd6 <crash_handler+0x1de800043> at ./src/test/test-bt-cl
0x1de803b1d <crash+0x45> at ./src/test/test-bt-cl
[1] Abort trap "${builddir:-.}/src/test/test-bt-cl" crash 2>&1 |

Done(1) "${PYTHON:-python}" "${abs_top_srcdir:-.}/src/...

-158318
FAIL src/test/test_bt.sh (exit status: 1)

comment:5 Changed 5 weeks ago by teor

Keywords: 033-backport 029-backport added
Parent ID: #17808
Status: needs_informationnew
Summary: tor-0.3.4.8: self test failures on NetBSDBacktrace does not work

In #17808 we treated backtrace failures as expected on FreeBSD. We should do the same for NetBSD until we work out how to get a good backtrace on NetBSD.

comment:6 Changed 5 weeks ago by teor

Keywords: fast-fix added

comment:7 Changed 5 weeks ago by teor

Summary: Backtrace does not workBacktrace does not work on NetBSD

comment:8 Changed 5 weeks ago by wiz

Here's the man page for the backtrace() function on NetBSD.

http://netbsd.gw.com/cgi-bin/man-cgi?backtrace++NetBSD-current

You'll need to link against libexecinfo.

comment:9 Changed 4 weeks ago by teor

Keywords: 035-backport added
Owner: set to teor
Status: newassigned
Version: Tor: 0.2.5.2-alpha

Update metadata

comment:10 in reply to:  8 Changed 4 weeks ago by teor

Replying to wiz:

Here's the man page for the backtrace() function on NetBSD.

http://netbsd.gw.com/cgi-bin/man-cgi?backtrace++NetBSD-current

You'll need to link against libexecinfo.

Tor checks for libexecinfo on every platform, and links it if required (see #17151 in 0.2.7.4-rc.)

Based on what we discovered in #17808, support for execinfo seems to vary by compiler and architecture on BSD-derived platforms.

So either Tor is calling backtrace() in an architecture-specific manner, or the compilers are not including debug info in the right format.

If you can work out how to fix this issue, we'd love some help in #17808.

comment:11 Changed 4 weeks ago by teor

Status: assignedneeds_review

In the meantime, I modified the test script to treat NetBSD, OpenBSD, and macOS failures as expected, like we did with FreeBSD in #18204.

See my branch bug27948-029 on https://github.com/teor2345/tor.git

0.2.9 pull request:
https://github.com/torproject/tor/pull/436

master pull request:
https://github.com/torproject/tor/pull/437

We can continue working on the underlying issue in #17808.

comment:12 Changed 3 weeks ago by dgoulet

Reviewer: ahf

comment:13 Changed 3 weeks ago by ahf

Status: needs_reviewmerge_ready

Looks fine.

comment:14 Changed 3 weeks ago by nickm

Resolution: user disappeared
Status: merge_readyclosed

Merged to maint-0.2.9 and forward.

comment:15 Changed 3 weeks ago by nickm

Resolution: user disappeared
Status: closedreopened

comment:16 Changed 3 weeks ago by nickm

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.