I have a "backtrace" branch right now that an dump stack traces on assertion failures. It works on glibc/ELF, and on OSX. We should expand it to work on Windows too, and BSD if we can.
Other fixes to make before it's ready:
It should be able to log a stack trace too.
It should log the stack trace on an assertion.
There should be an option to tell it not to log to the stack_dumps file, perhaps.
Perhaps the logfile should be pid-controlled?
It should support Windows.
It should handle deadly signals (SEGV, etc) as well.
It should indicate to the user somehow (if it can) that stuff might be saved to the stack_dumps file.
It should have tests.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Trac: Description: It's so easy to dump stack traces these days!
I have a "backtrace" branch right now that an dump stack traces on assertion failures. It works on glibc/ELF, and on OSX. We should expand it to work on Windows too, and BSD if we can.
Other fixes to make before it's ready:
It should be able to log a stack trace too.
It should log the stack trace on an assertion.
There should be an option to tell it not to log to the stack_dumps file, perhaps.
Perhaps the logfile should be pid-controlled?
It should support Windows.
It should handle deadly signals (SEGV, etc) as well.
It should indicate to the user somehow (if it can) that stuff might be saved to the stack_dumps file.
It should have tests.
to
It's so easy to dump stack traces these days!
I have a "backtrace" branch right now that an dump stack traces on assertion failures. It works on glibc/ELF, and on OSX. We should expand it to work on Windows too, and BSD if we can.
Other fixes to make before it's ready:
It should be able to log a stack trace too.
It should log the stack trace on an assertion.
There should be an option to tell it not to log to the stack_dumps file, perhaps.
Perhaps the logfile should be pid-controlled?
It should support Windows.
It should handle deadly signals (SEGV, etc) as well.
It should indicate to the user somehow (if it can) that stuff might be saved to the stack_dumps file.
Updated my branch. Fun stuff! I cut the gordian knot of "where to store traces on crash" by saying "just put them in the log fd and on stderr."
For FreeBSD (and others?) we should look at linking libexecinfo, which is supposed to provide the same information as the backtrace() functions in <execinfo.h> on Linux/OSX.
For windows, we will need to look into StackWalk64 for stack dumping. I believe that the function to catch segfaults and whatnot is SetUnhandledExceptionFilter, but I'm less sure about that.
Trac: Description: It's so easy to dump stack traces these days!
I have a "backtrace" branch right now that an dump stack traces on assertion failures. It works on glibc/ELF, and on OSX. We should expand it to work on Windows too, and BSD if we can.
Other fixes to make before it's ready:
It should be able to log a stack trace too.
It should log the stack trace on an assertion.
There should be an option to tell it not to log to the stack_dumps file, perhaps.
Perhaps the logfile should be pid-controlled?
It should support Windows.
It should handle deadly signals (SEGV, etc) as well.
It should indicate to the user somehow (if it can) that stuff might be saved to the stack_dumps file.
It should have tests.
to
It's so easy to dump stack traces these days!
I have a "backtrace" branch right now that an dump stack traces on assertion failures. It works on glibc/ELF, and on OSX. We should expand it to work on Windows too, and BSD if we can.
Other fixes to make before it's ready:
It should be able to log a stack trace too.
It should log the stack trace on an assertion.
There should be an option to tell it not to log to the stack_dumps file, perhaps.
Perhaps the logfile should be pid-controlled?
It should support Windows.
It should handle deadly signals (SEGV, etc) as well.
It should indicate to the user somehow (if it can) that stuff might be saved to the stack_dumps file.
Actually, I'm going to suggest we review and merge now, and leave Windows support for a bit later, since that's a whole new project, and apparently a big one, since the windows API is not trivial.
Leaving that in the history is probably bad for future bisection, no?
tor_log_err_sigsafe_helper() might be better named tor_log_err_sigsafe_write()
+ if (log_time_granularity >= 2000) {+ int g = log_time_granularity / 1000;
A comment for the magic numbers? You previously had 15 minutes. Does this mean that the
stack_dump file will have second granularity by default? Is that a problem?
+tor_log_get_sigsafe_err_fds(const int **out)+{+ *out = sigsafe_log_fds;+ return n_sigsafe_log_fds;
Can't we have a race here if tor_log_get_sigsafe_err_fds() is called and then tor_log_update_sigsafe_err_fds()? Or do the usage of the first mandate LOCK_LOGS? Or are we sure there's no way this can happen?
fd7aa1af commit message does not say anything about the found_real_stderr thing.
+ assert(n_sigsafe_log_fds >= 2);
Using assert() because tor_assert() will not be happy with an unconfigured crash handler, right? Might worth a comment.
+#ifdef PC_FROM_UCONTEXT+#if defined(__linux__)+ const int n = 1;+#elif defined(__darwin__) || defined(__APPLE__) || defined(__OpenBSD__) \+ || defined(__FreeBSD__)+ const int n = 2;+#endif
Maybe add an #else #error here ? The compiler should be unhappy because n will be uninitialized, but it might save a little bit of time for a future port.
+ puts("Argument should be \"assert\" or \"crash\"");
“or none”
+ return crash(x) + crash(x*2);
What's the point of testing the reader's knowledge about C evaluation order? :)
Running the test program with assert has the logging function in the trace:
Ok, here's a review. I hope I'm not wasting anyone's time.
Interesting how the new format_dec_number_sigsafe() and format_hex_number_sigsafe() are slightly different given they do almost the same thing.
Yeah; perhaps they should share more code.
{{{
+++ b/configure.ac
crt_externs.h \
execinfo.h \ grp.h \
}}}
Small whitespace issue.
Fixed
{{{
tor_assert(0 * 0 == 1);
}}}
Leaving that in the history is probably bad for future bisection, no?
That's what the later "fixup!" in the commit message for 62645b5e18a8994c9257078f571d4227037a894c is for. See the "--autosquash" argument for git rebase.
tor_log_err_sigsafe_helper() might be better named tor_log_err_sigsafe_write()
Renaming.
{{{
if (log_time_granularity >= 2000) {
int g = log_time_granularity / 1000;
}}}
A comment for the magic numbers? You previously had 15 minutes. Does this mean that the
stack_dump file will have second granularity by default? Is that a problem?
It means that the file will have the same granularity as the rest of the system logs.
{{{
+tor_log_get_sigsafe_err_fds(const int **out)
+{
*out = sigsafe_log_fds;
return n_sigsafe_log_fds;
}}}
Can't we have a race here if tor_log_get_sigsafe_err_fds() is called and then tor_log_update_sigsafe_err_fds()? Or do the usage of the first mandate LOCK_LOGS? Or are we sure there's no way this can happen?
Hm. During a signal handler, it's not safe to grab a lock, so we might need to tolerate that, or use some kind of atomic set/fetch.
fd7aa1af commit message does not say anything about the found_real_stderr thing.
Added in a squash! commit.
{{{
assert(n_sigsafe_log_fds >= 2);
}}}
Using assert() because tor_assert() will not be happy with an unconfigured crash handler, right? Might worth a comment.
Added a comment. The other reason is that tor_assert logs, and logging inside log functions is potentially a recursive nightmare.
{{{
+#ifdef PC_FROM_UCONTEXT
+#if defined(linux)
const int n = 1;
+#elif defined(darwin) || defined(APPLE) || defined(OpenBSD) \
|| defined(FreeBSD)
const int n = 2;
+#endif
}}}
Maybe add an #else #error here ? The compiler should be unhappy because n will be uninitialized, but it might save a little bit of time for a future port.
Added
{{{
puts("Argument should be "assert" or "crash"");
}}}
“or none”
Added
{{{
return crash(x) + crash(x*2);
}}}
What's the point of testing the reader's knowledge about C evaluation order? :)
Added a comment. The actual point is to prevent the compiler from adding a tail-call optimization during the first call to crash().
Running the test program with assert has the logging function in the trace:
{{{
$ ./test-bt-cl assert
Oct 11 22:55:39.705 [err] tor_assertion_failed_(): Bug: src/test/test_bt_cl.c:36: crash: Assertion 1 == 0 failed; aborting.
Oct 11 22:55:39.705 [err] Bug: Assertion 1 == 0 failed in crash at src/test/test_bt_cl.c:36. Stack trace:
Oct 11 22:55:39.705 [err] Bug: ./test-bt-cl(log_backtrace+0x35) [0x7f3236a5e095]
Oct 11 22:55:39.705 [err] Bug: ./test-bt-cl(tor_assertion_failed_+0x9f) [0x7f3236a68a8f]
Oct 11 22:55:39.705 [err] Bug: ./test-bt-cl(crash+0x79) [0x7f3236a5de69]
[…]
}}}
Is that something we want to keep?
I'd say, "yes".
For the other one:
{{{
$ ./test-bt-cl crash
============================================================ T=1381524943
Tor died: Caught signal 11
./test-bt-cl(+0x8fc7)[0x7f156871afc7]
./test-bt-cl(crash+0x48)[0x7f156871ae38]
./test-bt-cl(crash+0x48)[0x7f156871ae38]
./test-bt-cl(oh_what+0x25)[0x7f156871ae95]
}}}
There's no symbol for the first function and crash() is there twice. Is that expected?
This investigation turned up nothing helpful; it seems to be happening because of the "n = 1" in backtrace.c, but I am sure that in testing, n = 1" was required under some circumstances. I'm going to call this a harmless wart for now.
Looks okay to me, but (nitpick) I don't think '0' + digit is quite perfectly portable.
f6d8bc9389db72dc654560d0f86fd3edd3b14934
Looks fine
308ffbcfdd99424a1750b09f90721d9da38f761b
Why round to 15 minutes?
ff187fd1335968468f307d48157e5dcf4ad88b1f
Looks fine
82743d6e560bbbbcbd22e09c3ec26f54d0656b75
Does TOR_CHECK_LDFLAGS actually add it to LDFLAGS for us? what if it fails - shouldn't we set USE_BACKTRACE then?
62645b5e18a8994c9257078f571d4227037a894c
Yeah, good move there :)
ab2a00ac4378e85f67da6e681a7c8609e3153db1
Looks okay to me, but yeesh signal handlers can get hairy. I presume you've tested all this by actually invoking it from a signal handler?
fd7aa1af256b11cad9875fec875a570205aea0e0
Looks fine
c8ca168d0cfe58b8f86badbc17300f85adfedb31
Is not our code, and I sure hope Google didn't break it too badly. OpenBSD support seems to be missing and probably shouldn't be - and wouldn't be too hard to find out and add. Linux/sparc{,64} is rather more obscure but I happen to have it at hand, so we could add that too.
10411ee8c13bcf8de65a225a0cee5b0844fb5b9f
Looks fine
3591d2079dabc7450b1a973045e000212a90b67d
Looks fine
1d6af8099d9f3dafefdf75bb1411ec2bd669087f
Yay, unit tests! Shouldn't tor_log_err_sigsafe() actually get invoked from a signal handler at some point though?
e1e638a169dd0a34455057b3f7cc895defd1f3cf
Yay, more unit tests!
abdfd8367b326e918ee49b4f35aa899859c6fca7
Looks fine
74ac03f54b6d640e97330d4207cbb943337fa71d
Looks fine
8b926717e601578f7f83aee9c7384d2ffb44d985
Looks fine
85a520682f832314ffe5615ad2544558092bee52
Looks fine
e9f95a67ae4b35326293865034fe92e1df59ae43
Is an empty diff?
f0b9dd03abfd18a037f4ae0bbacecd34b35cad1a
Ouch! Am I reading that wrong, or did you just #error on any Linux that isn't one of the archs pc_from_ucontext.m4 supports? :(
4871da62cd814a5ca50f03924b3492e8d3b2f89a
Looks fine
511a8af458bbf56edaf7d1110dfe28c11274b7fb
Looks okay, but wouldn't it be better to just compile stuff like this with -O0?
Looks okay to me, but (nitpick) I don't think '0' + digit is quite perfectly portable.
Indeed it isn't, but it's one of the things Tor requires. (Specifically, we require that your character set contain ASCII as a subset.) We check for this in compat.h:
#if 'a'!=97 || 'z'!=122 || 'A'!=65 || ' '!=32#error "It seems that you encode characters in something other than ASCII."#endif
82743d6e560bbbbcbd22e09c3ec26f54d0656b75
Does TOR_CHECK_LDFLAGS actually add it to LDFLAGS for us? what if it fails - shouldn't we set USE_BACKTRACE then?
Yes, it does add it to LDFLAGS. If the check fails, then presumably the platform is one where the -rdynamic flag doesn't exist, and therefore isn't necessary.
We might need more sophisticated checks, but the ones I had were sufficient to make it work everywhere I tried.
ab2a00ac4378e85f67da6e681a7c8609e3153db1
Looks okay to me, but yeesh signal handlers can get hairy. I presume you've tested all this by actually invoking it from a signal handler?
Yes, though that's not sufficient. The failure mode when invoking non-signal-safe stuff from signal handlers is not "the code fails" but rather "if the signal arrives while exactly the wrong thing is happening, the signal handler code fails, or the other code fails once the signal handler is done" As such, the only real safe way to do this is to enumerate all the system and/or library functions you're calling and make sure they're all on the POSIX safe list.
The ones we call from tor_log_err_sigsafe() are: time(), write(), va_start(), va_ag(), va_end(), strlen(), tor_assert()... whoops. tor_assert() shouldn't be on that list.
Fixing in a fixup commit.
c8ca168d0cfe58b8f86badbc17300f85adfedb31
Is not our code, and I sure hope Google didn't break it too badly. OpenBSD support seems to be missing and probably shouldn't be - and wouldn't be too hard to find out and add. Linux/sparc{,64} is rather more obscure but I happen to have it at hand, so we could add that too.
Indeed. I'm assuming that this will be easier for us to fix up and submit upstream fixes for than it would be for us to build one of these from scratch. (For a while, I was trying the latter, and getting quite frustrated at the amount of research needed.)
They mention OpenBSD support in the comments and later in the file; supposedly it works?
1d6af8099d9f3dafefdf75bb1411ec2bd669087f
Yay, unit tests! Shouldn't tor_log_err_sigsafe() actually get invoked from a signal handler at some point though?
It does in the "./test/test-bt-cl crash" test, I believe?
e9f95a67ae4b35326293865034fe92e1df59ae43
Is an empty diff?
Yes; it's only to add another missing bit of commit log message to the commit it's squashing on to.
f0b9dd03abfd18a037f4ae0bbacecd34b35cad1a
Ouch! Am I reading that wrong, or did you just #error on any Linux that isn't one of the archs pc_from_ucontext.m4 supports? :(
I think I #errored in the case where PC_FROM_UCONTEXT is defined, but the platfrom isn't one of linux, darwin, apple, openbsd, freebsd. I think instead going for '1' as a sane default makes more sense. Doing that.
511a8af458bbf56edaf7d1110dfe28c11274b7fb
Looks okay, but wouldn't it be better to just compile stuff like this with -O0?
Maybe? It's nice to have confirmation that running with -O2 doesn't prevent the backtrace code from working.
Based on the above, I'm adding a few fixup commits, squashing as 'backtrace_squashed', and merging. Thanks!
Stack trace unreadable if FPO enabled in general or used optimization options (then FPO applied selectively by compiler depends function's code) for x86 platform. It's not problem for x86_64 because GCC generates unwind tables there by default.
If to enable unwind tables by "-fasynchronous-unwind-tables" then stack trace readable with any optimizations.
-fomit-frame-pointer Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines. On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage. Starting with GCC version 4.6, the default setting (when not optimizing for size) for 32-bit GNU/Linux x86 and 32-bit Darwin x86 targets has been changed to -fomit-frame-pointer. The default can be reverted to -fno-omit-frame-pointer by configuring GCC with the --enable-frame-pointer configure option. Enabled at levels -O, -O2, -O3, -Os.
log_backtrace() is thread unsafe. If threads going to assert for the same time then calling of backtrace() from every thread going to fill the same cb_buf array.
Closing this ticket again; new issues should ideally be children of #11046 (moved) (or comments on #11046 (moved) if you don't have time to open a new ticket). Thanks!
Trac: Resolution: N/Ato implemented Status: reopened to closed