My Tor node is only a few weeks old. The stable version kept crashing every 1-2 days, so I updated to 0.2.4.17-rc. Now it crashes every 3-5 days. The newest crash looked like this:
Since you're on Debian, look in /etc/default/tor for the 'ulimit -c unlimited' line, and uncomment it.
I did this already a few crashes ago. But I can't find anything even remotly similar to a coredump anywhere on my server.
I believe the core file will show up in /var/lib/tor/
That directory is completely empty. The DataDirectory (I moved my Tor installation) contains only the expected files, but no cores. I searched the whole disk for files with names like core or dump.
Is there a global option to disable core dumps regardless of Tor's options?
Does Tor maybe need special privileges for AppArmor to write cores?
Maybe DisableDebuggerAttachment 0 does the trick. At least I didn't have that option set before. Unfortunately changing it killed the running tor instance:
Sep 19 18:20:18.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.Sep 19 18:20:18.000 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc".Sep 19 18:20:18.000 [notice] Read configuration file "/etc/tor/torrc".Sep 19 18:20:18.000 [warn] Failed to parse/validate config: While Tor is running, disabling DisableDebuggerAttachment is not allowed.Sep 19 18:20:18.000 [err] Reading config failed--see warnings above. For usage, try -h.Sep 19 18:20:18.000 [warn] Restart failed (config error?). Exiting.
Well, it took almost five days this time. The good news is: now I have a core file. The bad news: Tor crashed without the failed assertion from last time.
Backtrace from gdb:
Core was generated by `/usr/bin/tor --defaults-torrc /usr/share/tor/tor-service-defaults-torrc --hush'.Program terminated with signal 6, Aborted.#0 0x00007f822b8ab1e5 in raise () from /lib/x86_64-linux-gnu/libc.so.6(gdb) bt#0 0x00007f822b8ab1e5 in raise () from /lib/x86_64-linux-gnu/libc.so.6#1 0x00007f822b8ae398 in abort () from /lib/x86_64-linux-gnu/libc.so.6#2 0x00007f822b8e67cb in ?? () from /lib/x86_64-linux-gnu/libc.so.6#3 0x00007f822b8f0a26 in ?? () from /lib/x86_64-linux-gnu/libc.so.6#4 0x00007f822b8f17a3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6#5 0x00007f822d0b003f in circuitmux_detach_circuit (cmux=0x7f8235060760, circ=0x7f823354d160) at ../src/or/circuitmux.c:1061#6 0x00007f822d0a84db in circuit_set_circid_chan_helper (circ=circ@entry=0x7f823354d160, direction=direction@entry=1, id=id@entry=0, chan=chan@entry=0x0) at ../src/or/circuitlist.c:142#7 0x00007f822d0a8ab3 in circuit_set_p_circid_chan (circ=circ@entry=0x7f823354d160, id=id@entry=0, chan=chan@entry=0x0) at ../src/or/circuitlist.c:217#8 0x00007f822d0bc893 in command_process_destroy_cell (chan=<optimized out>, cell=<optimized out>) at ../src/or/command.c:506#9 command_process_cell (chan=<optimized out>, cell=0x7fff497c5870) at ../src/or/command.c:153#10 0x00007f822d09da1b in channel_tls_handle_cell (cell=cell@entry=0x7fff497c5870, conn=conn@entry=0x7f82341fccb0) at ../src/or/channeltls.c:923#11 0x00007f822d0ddc57 in connection_or_process_cells_from_inbuf (conn=0x7f82341fccb0) at ../src/or/connection_or.c:1972#12 0x00007f822d0e0ef2 in connection_or_process_inbuf (conn=conn@entry=0x7f82341fccb0) at ../src/or/connection_or.c:483#13 0x00007f822d0cc4c5 in connection_process_inbuf (conn=conn@entry=0x7f82341fccb0, package_partial=package_partial@entry=1) at ../src/or/connection.c:4001#14 0x00007f822d0d265d in connection_handle_read_impl (conn=0x7f82341fccb0) at ../src/or/connection.c:2839#15 connection_handle_read (conn=conn@entry=0x7f82341fccb0) at ../src/or/connection.c:2880#16 0x00007f822d02e061 in conn_read_callback (fd=<optimized out>, event=<optimized out>, _conn=0x7f82341fccb0) at ../src/or/main.c:718#17 0x00007f822c6a8ccc in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5#18 0x00007f822d02e9f5 in do_main_loop () at ../src/or/main.c:1992#19 0x00007f822d0301de in tor_main (argc=4, argv=0x7fff497c5fb8) at ../src/or/main.c:2708#20 0x00007f822b897995 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6#21 0x00007f822d02aabb in _start ()
Is this a different story?
Do you need further information (which gdb commands)?
Hmmm. It's not entirely clear to me if that crash is happening at the call to cmux->policy->free_circ_data or in that function but screwing up the stack. That pointer looks okay, and the parameters look possibly okay [1]. The only thing in ewma_free_circ_data() that depends on anything else to not crash is tor_free(), so perhaps this could be a heap corruption bug.
Running it under valgrind is probably very helpful; it would also be nice to have a copy of the core dump, tor binary and torrc.
[1] But the circuit id in the hash entry (2147613735) doesn't match the one in circ (2147556202), and the magic number in circ indicates it's an or_circuit_t. This is probably the cmux for that circuit's reverse direction, but it'd be nice to see the circuit as an or_circuit_t to verify that p_circ_id is 2147613735.
Agreed; running it under valgrind would maybe get some useful results.
Is there any more debugging/logging code that would help figure out what's going on?
I think valgrind would be the most useful thing. Also, it'd be nice to see how this binary was built and if anything changes if built with -O0; since that line doesn't look like there's any very clear possible cause, and the circuitmux.c in the Debian repo matches ours, I'd be curious to see if the actual crash was elsewhere and the optimizer has confused things. Since so far the exact site of the crash seems nondeterministic, though, we may not see this again if the reporter re-runs with a different build.
Also:
Where can I find whatever extra patches Debian is adding?
to get the debug symbols. The reason I compiled Tor by hand in the first place was to make use of the optimized openssl-library. I didn't know of the botnet attacking Tor at the time and wanted to run 50Mbps on one Tor instance if possible.
The stock Debian versions of Tor and openssl crashed the same though.
Would I kill this server if I attached a core that big?
Your core file likely includes sensitive keys (including your long-term relay identity key), as well as maybe sensitive client traffic. You should keep it secret.
Core was generated by `/usr/bin/tor --defaults-torrc /usr/share/tor/tor-service-defaults-torrc --hush'.Program terminated with signal 11, Segmentation fault.#0 circuit_get_by_rend_token_and_purpose (purpose=purpose@entry=3 '\003', token=token@entry=0x7fff48e60aa0 "hS(i\252\272\026\267M\366\303<퉥\215iE\247\207", len=20) at ../src/or/circuitlist.c:11411141 if (! circ->marked_for_close &&(gdb) bt#0 circuit_get_by_rend_token_and_purpose (purpose=purpose@entry=3 '\003', token=token@entry=0x7fff48e60aa0 "hS(i\252\272\026\267M\366\303<퉥\215iE\247\207", len=20) at ../src/or/circuitlist.c:1141#1 0x00007fa9d5325c75 in circuit_get_rendezvous ( cookie=cookie@entry=0x7fff48e60aa0 "hS(i\252\272\026\267M\366\303<퉥\215iE\247\207") at ../src/or/circuitlist.c:1155#2 0x00007fa9d52cf1f3 in rend_mid_establish_rendezvous (circ=0x7fa9dbf37890, request=request@entry=0x7fff48e60aa0 "hS(i\252\272\026\267M\366\303<퉥\215iE\247\207", request_len=request_len@entry=20) at ../src/or/rendmid.c:238#3 0x00007fa9d52ce87c in rend_process_relay_cell (circ=circ@entry=0x7fa9dbf37890, layer_hint=layer_hint@entry=0x0, command=33, length=20, payload=payload@entry=0x7fff48e60aa0 "hS(i\252\272\026\267M\366\303<퉥\215iE\247\207") at ../src/or/rendcommon.c:1440#4 0x00007fa9d52c5e7a in connection_edge_process_relay_cell (cell=cell@entry=0x7fff48e60a90, circ=circ@entry=0x7fa9dbf37890, conn=conn@entry=0x0, layer_hint=layer_hint@entry=0x0) at ../src/or/relay.c:1578#5 0x00007fa9d52c7a71 in circuit_receive_relay_cell (cell=cell@entry=0x7fff48e60a90, circ=circ@entry=0x7fa9dbf37890, cell_direction=cell_direction@entry=CELL_DIRECTION_OUT) at ../src/or/relay.c:212#6 0x00007fa9d533770c in command_process_relay_cell (chan=0x7fa9de927520, cell=0x7fff48e60a90) at ../src/or/command.c:465#7 command_process_cell (chan=0x7fa9de927520, cell=0x7fff48e60a90) at ../src/or/command.c:149#8 0x00007fa9d5318a1b in channel_tls_handle_cell (cell=cell@entry=0x7fff48e60a90, conn=conn@entry=0x7fa9dcf0a4d0) at ../src/or/channeltls.c:923#9 0x00007fa9d5358c57 in connection_or_process_cells_from_inbuf (conn=0x7fa9dcf0a4d0) at ../src/or/connection_or.c:1972#10 0x00007fa9d535bef2 in connection_or_process_inbuf (conn=conn@entry=0x7fa9dcf0a4d0) at ../src/or/connection_or.c:483#11 0x00007fa9d53474c5 in connection_process_inbuf (conn=conn@entry=0x7fa9dcf0a4d0, package_partial=package_partial@entry=1) at ../src/or/connection.c:4001#12 0x00007fa9d534d65d in connection_handle_read_impl (conn=0x7fa9dcf0a4d0) at ../src/or/connection.c:2839#13 connection_handle_read (conn=conn@entry=0x7fa9dcf0a4d0) at ../src/or/connection.c:2880#14 0x00007fa9d52a9061 in conn_read_callback (fd=<optimized out>, event=<optimized out>, _conn=0x7fa9dcf0a4d0) at ../src/or/main.c:718#15 0x00007fa9d4923ccc in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5#16 0x00007fa9d52a99f5 in do_main_loop () at ../src/or/main.c:1992#17 0x00007fa9d52ab1de in tor_main (argc=4, argv=0x7fff48e611d8) at ../src/or/main.c:2708#18 0x00007fa9d3b12995 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6#19 0x00007fa9d52a5abb in _start ()
If anyone can give me pointers on how to use valgrind, I will try that.
(I first thought that it was some alternate Tor client implementation that was tickling a bug, but now that I see three different places that it seems to have died at, I no longer think that.)