I'm seeing this error in 0.3.5.2-alpha which crashes tor soon after starting and was seeing it earlier in 0.3.5.1-alpha.
My build (on Debian 64bit) identifies as Tor 0.3.5.2-alpha (git-1a85ac8c2de77f40) running on Linux with Libevent 2.1.8-stable, OpenSSL 1.0.1t, Zlib 1.2.8, Liblzma N/A, and Libzstd 1.3.2.
Do let me know if you'd like to see some of the config context.
INTERNAL ERROR: Raw assertion failed at src/lib/smartlist_core/smartlist_core.c::191191: : idx>=0./tor(dump_stack_symbols_to_error_fds+0x31)[0x55b21eaab571]./tor(tor_raw_assertion_failed_msg_+0x86)[0x55b21eaabbf6]./tor(smartlist_del+0xbf)[0x55b21eaaa36f]./tor(retry_all_listeners+0x441)[0x55b21ea37911]./tor(+0x533db)[0x55b21e90e3db]./tor(+0x5bed1)[0x55b21e916ed1]/usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(+0x22c05)[0x7f5bd5b5dc05]/usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(event_base_loop+0x5a7)[0x7f5bd5b5e537]./tor(do_main_loop+0x203)[0x55b21e913843]./tor(tor_run_main+0x11f5)[0x55b21e915cc5]./tor(tor_main+0x3a)[0x55b21e90e07a]./tor(main+0x19)[0x55b21e90dbd9]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5bd4aa1b45]./tor(+0x52c29)[0x55b21e90dc29]./tor(dump_stack_symbols_to_error_fds+0x31)[0x55b21eaab571]./tor(tor_raw_assertion_failed_msg_+0x86)[0x55b21eaabbf6]./tor(smartlist_del+0xbf)[0x55b21eaaa36f]./tor(retry_all_listeners+0x441)[0x55b21ea37911]./tor(+0x533db)[0x55b21e90e3db]./tor(+0x5bed1)[0x55b21e916ed1]/usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(+0x22c05)[0x7f5bd5b5dc05]/usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(event_base_loop+0x5a7)[0x7f5bd5b5e537]./tor(do_main_loop+0x203)[0x55b21e913843]./tor(tor_run_main+0x11f5)[0x55b21e915cc5]./tor(tor_main+0x3a)[0x55b21e90e07a]./tor(main+0x19)[0x55b21e90dbd9]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5bd4aa1b45]./tor(+0x52c29)[0x55b21e90dc29]
Trac: Username: jchevali
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I'm not opening up a port for DNS, though my ORPort settings in torrc look somewhat peculiar, like so:
ORPort 9999ORPort [1234:5678::1]:9999
If I comment out the second line in torrc and start tor 0.3.5.2-alpha, it does not crash.
My configuration is supported up to 0.3.4.8 however. Though perhaps that's the reason why after a very long time this bridge hasn't still got the guard flag. I had thought this config would earn me an IPv6-certified flag but that never happened either.
However, I suspect multithreading issue here - from the stack trace it seems that retry_all_listeners is called multiple times, before the first invocation was able to finish.
INTERNAL ERROR: Raw assertion failed at src/lib/smartlist_core/smartlist_core.c:191: idx>=0./tor(dump_stack_symbols_to_error_fds+0x31)[0x55ed593789f1]./tor(tor_raw_assertion_failed_msg_+0x86)[0x55ed59379076]./tor(smartlist_del+0xbf)[0x55ed593777ef]./tor(retry_all_listeners+0x459)[0x55ed591e7939]./tor(+0x6354b)[0x55ed591eb54b]./tor(+0x68231)[0x55ed591f0231]/usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(+0x22c05)[0x7f24a506ac05]/usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(event_base_loop+0x5a7)[0x7f24a506b537]./tor(do_main_loop+0x90)[0x55ed591efc50]./tor(tor_run_main+0x1373)[0x55ed591ddd53]./tor(tor_main+0x3a)[0x55ed591db1aa]./tor(main+0x19)[0x55ed591dad09]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f24a3faeb45]./tor(+0x52d59)[0x55ed591dad59]
If you can't reproduce, don't give up, post here your torrc. You may include any settings you like, I only need to have in there the two ORPort entries. I'll replace the IPv6 in square brackets with my own and fire it up. If it won't show the bug, it means some other setting in my torrc was a factor in this bug. If it does show the bug to me but when you ran it it didn't, it means that some other OS configuration of mine could have been a factor in this bug, maybe some shared libraries version. We could start narrowing it down.
Then I did docker build . in the debian folder of my tor-docker-images repo (linked above). Docker container is now built.
Shelled into the container by running docker run -it --entrypoint /bin/bash 9da6b7fb8b7b and verified that it has IPv6 network interface:
tor@128312c91195:/src/tor$ ip -6 addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1 inet6 ::1/128 scope host valid_lft forever preferred_lft forever37: eth0@if38: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP inet6 2001:db8:1::242:ac11:2/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::42:acff:fe11:2/64 scope link valid_lft forever preferred_lft forever
Now let's run tor with above torrc entries:
tor@128312c91195:/src/tor$ printf "ORPort 9999\nORPort [2001:db8:1::242:ac11:2]:9999\n" | tor -f -Oct 01 18:02:01.027 [notice] Tor 0.3.5.2-alpha-dev (git-43211c3a0c4b56d4) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1t, Zlib 1.2.8, Liblzma N/A, and Libzstd N/A.Oct 01 18:02:01.028 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warningOct 01 18:02:01.028 [notice] This version is not a stable Tor release. Expect more bugs than usual.Oct 01 18:02:01.031 [notice] Your ContactInfo config option is not set. Please consider setting it, so we can contact you if your server is misconfigured or something else goes wrong.Oct 01 18:02:01.032 [notice] Based on detected system memory, MaxMemInQueues is set to 1499 MB. You can override this by setting MaxMemInQueues by hand.Oct 01 18:02:01.032 [notice] By default, Tor does not run as an exit relay. If you want to be an exit relay, set ExitRelay to 1. To suppress this message in the future, set ExitRelay to 0.Oct 01 18:02:01.032 [notice] Opening Socks listener on 127.0.0.1:9050Oct 01 18:02:01.032 [notice] Opened Socks listener on 127.0.0.1:9050Oct 01 18:02:01.033 [notice] Opening OR listener on 0.0.0.0:9999Oct 01 18:02:01.033 [notice] Opened OR listener on 0.0.0.0:9999Oct 01 18:02:01.033 [notice] Opening OR listener on [2001:db8:1::242:ac11:2]:9999Oct 01 18:02:01.033 [notice] Opened OR listener on [2001:db8:1::242:ac11:2]:9999
It does not crash and proceeds to download network information from directory server.
What if this is actually happening inside "retry_listener_ports", but we can't tell from the stack trace, because the function got inlined?
In particular, I'm looking at how retry_listener_ports can call SMARTLIST_DEL on a few different things. Is it possible that it is calling SMARTLIST_DEL_CURRENT(old_conns, conn); twice for the same value of conn? That would explain how the index could become negative.
If it helps, the last time I tested that (8 days ago), I got the impression that the crash started to happen with a delay of 45 seconds following the start of the server, while before the patch it used to crash pretty much instantly.
Anyway I should be starting another round of testing pretty soon.
I've set the log level to info and I'm comparing a successful run with a failed run.
For both runs, there's two entries logged by routerlist_remove_old_routers thirteen seconds after launch. The two entries are separated by a pageful of other entries, but they seem to have the same log time (as do those other entries); or at least they appear to have the same time up to the second since there's no millisecond resolution (there is but it's 000).
One minute after that, there's a further entry of routerlist_remove_old_routers in either log. The successful run continues logging that every minute seemingly, while also doing other stuff. The bad run crashes right after this last entry, with the following message:
INTERNAL ERROR: Raw assertion failed at src/lib/smartlist_core/smartlist_core.c:191: idx>=0./tor(dump_stack_symbols_to_error_fds+0x31)[0x561de403a571]./tor(tor_raw_assertion_failed_msg_+0x86)[0x561de403abf6]./tor(smartlist_del+0xbf)[0x561de403936f]./tor(retry_all_listeners+0x459)[0x561de3fc6939]./tor(+0x533db)[0x561de3e9d3db]./tor(+0x5bed1)[0x561de3ea5ed1]/usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(+0x22c05)[0x7f4acb72ac05]/usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(event_base_loop+0x5a7)[0x7f4acb72b537]./tor(do_main_loop+0x203)[0x561de3ea2843]./tor(tor_run_main+0x11f5)[0x561de3ea4cc5]./tor(tor_main+0x3a)[0x561de3e9d07a]./tor(main+0x19)[0x561de3e9cbd9]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f4aca66eb45]./tor(+0x52c29)[0x561de3e9cc29]
Here's the gist of my torrc besides the two entries discussed:
Just to confirm: you're not seeing Closed no-longer-configured ... in the logs, do you?
retry_all_listeners is called from two places in the codebase: 1) options_act_reversible (when config is (re-)loaded and 2) retry_listener_callback (every minute). I suspect a race condition here, which would explain why it's timing sensitive and doesn't happen every time. Above patch should show us if that's the case. Look for new stacktraces and log entries that mention retry_all_listeners.
Okay. I think for your usercase it should be fixed by my patch for #27928 (moved).
The problem is the following. In retry_listener_ports there are two smartlist_del calls that remove conn (loop variable in outer loop) from old_conns smartlist. We may remove it in the inner loop (if we determine that rebinding may be needed (this logic wasn't exactly right - see #27928 (moved))) or in the outer loop (if we found exact match to one of the wanted ports). Or both, which happened in this case. The code may determine that rebinding may be needed, remove the conn in the inner loop, keep iterating the inner loop, find exact match and then try to remove it in outer loop, which makes tor explode.
With the new line added to connection.c, tor is holding up without crashing.
It did print the following to the log 16 seconds after launch though,
Oct 02 16:47:23.000 [info] pathbias_should_count(): Bug: Circuit 4 is now being counted despite being ignored in the past. Purpose is Measuring circuit timeout, path state is new (on Tor 0.3.5.2-alpha-dev fe6365055bd4ec4e)Oct 02 16:47:23.000 [info] pathbias_count_build_success(): Bug: Succeeded circuit is in strange path state new. Circuit is a Measuring circuit timeout currently open. (on Tor 0.3.5.2-alpha-dev fe6365055bd4ec4e)
And one minute later,
Oct 02 16:48:23.000 [info] pathbias_should_count(): Bug: Circuit 47 is now being counted despite being ignored in the past. Purpose is Measuring circuit timeout, path state is new (on Tor 0.3.5.2-alpha-dev fe6365055bd4ec4e)
#27928 (moved) is ready for merging, and (mostly) addresses the root problem here.
However pull requests !369 and !382 are purely experimental and I do not propose them to get into upstream. I will try making some further patch soonish.
I've tested dev at commit de86a70 which is where they were last merged. For reference, connection.c's md5sum at this point is 8a928954d14790ed8f2a9d494d14ea58 (please verify).
It falls down as follows:
[info] routerlist_remove_old_routers(): We have 6432 live routers and 15253 old router descriptors.[notice] Opening OR listener on 0.0.0.0:9999[warn] Could not bind to 0.0.0.0:9999: Address already in use. Is Tor already running?[notice] Opening OR listener on 0.0.0.0:9999[warn] Could not bind to 0.0.0.0:9999: Address already in use. Is Tor already running?[err] tor_assertion_failed_(): Bug: src/core/mainloop/connection.c:2901: retry_all_listeners: Assertion new_conn failed; aborting.[err] Bug: Assertion new_conn failed in retry_all_listeners at src/core/mainloop/connection.c:2901. Stack trace:[err] Bug: ./tor(log_backtrace_impl+0x47) [0x55ce6e0bd117][err] Bug: ./tor(tor_assertion_failed_+0x93) [0x55ce6e0b87a3][err] Bug: ./tor(retry_all_listeners+0x6e9) [0x55ce6df2be99][err] Bug: ./tor(+0x6383b) [0x55ce6df2f83b][err] Bug: ./tor(+0x68621) [0x55ce6df34621][err] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(+0x22c05) [0x7fb7a7442c05][err] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.1.so.6(event_base_loop+0x5a7) [0x7fb7a7443537][err] Bug: ./tor(do_main_loop+0x91) [0x55ce6df34081][err] Bug: ./tor(tor_run_main+0x1373) [0x55ce6df22023][err] Bug: ./tor(tor_main+0x3a) [0x55ce6df1f47a][err] Bug: ./tor(main+0x19) [0x55ce6df1efd9][err] Bug: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fb7a6386b45][err] Bug: ./tor(+0x53029) [0x55ce6df1f029]
If I uncomment in torrc the second ORPort line containing the IPv6 it does not fall down.
Trac: Username: jchevali Resolution: fixed toN/A Status: closed to reopened
Alternatively, the line added to connection.c at PR383 (which turns this connection.c's md5sum into 78fa6d6ac9acb8d324b0fc0826343c46 - please verify) can also keep tor up and running, even with the dual ORPort clause.
I noticed that as this bug was closed PR383 was not merged. Was that intentional?
I'm sorry, perhaps you were meant to merge that for #27928 (moved). Maybe I should have waited till you merged that onto the dev line to test. If so, feel free to close once more. My apologies.