alif, if you could answer any of the questions above, that would help us diagnose and fix this bug. I have some guesses below, but they're just guesses.
Some ideas, based on looking at the code: There are two ways I think this could happen: if we reach cpuworker_queue_work() without having called cpu_init(), or if we somehow fail to create a threadpool in cpu_init() when we do call it. But I don't think it can be the second case, since that would have created a nonfatal assertion from threadpool_new().
We call cpu_init() in two cases: when our settings change, the transition affects workers, and we have become a server; or when we start as a server in main.c.
I think that the check in the first cpu_init() case might be wrong: if we start as a client, and then transition to a bridge (not a public server), I don't think we will trigger options_transition_affects_workers().
Well, I'm no longer able to reproduce this, nickm! Sorry.
It persisted for a couple of days after having updated Tor to 0.3.1.7 using a deb from the projects repository, until I had to reboot for a different reason.
Now I'm back to "[notice] While fetching directory info, no running diverseness known. Will try again later. (purpose 6)" which is preventing me from making a circuit via obfs3, even though I'm able to do so in the Tor-browser via obfs4. But that's a different issue.
Anyway, my torrc at the time of the errors is the following (I had disabled bridges to try to debug and to make the report less complicated). I removed commented lines for clarity and redacted secrets:
Log notice file /var/log/tor/notices.logControlPort 9051HashedControlPassword 16:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPortForwarding 1Address redacted.example.comNickname XXXXXXXXContactInfo XXXXXXXXelsewhereXXXXXXDirPort 9030 # what port to advertise for directory connectionsDirPortFrontPage /usr/share/doc/tor/tor-exit-notice.htmlExitPolicy reject *:* # no exits allowedHiddenServiceStatistics 1UseBridges 0UpdateBridgesFromAuthority 1ClientTransportPlugin obfs2,obfs3,ScrambleSuit exec /usr/bin/obfsproxy managed#Some bridge definitions go here; obfs3 and plain
Also here's my /etc/apparmor.d/abstractions/tor since I had modified it to be able to run obfsproxy in ubuntu 14.04:
<begin /etc/apparmor.d/abstractions/tor>
Now, trying to solve my connectivity problem, I installed obfs4proxy from the Xenial repository, and copied over the obfs4 bridge definitions from Tor-browser's torrc to /var/lib/tor but still nothing changed. I still got "[notice] While fetching directory info, no running dirservers known. Will try again later. (purpose 6)"
But after I copied the "cached-x" files from Tor browser's Data directory to my system and restarted the Tor service, the exception occurred again:
Oct 03 00:54:11.000 [notice] Tor 0.3.1.7 (git-5fa14939bca67c23) opening log file.Oct 03 00:54:11.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.Oct 03 00:54:11.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.Oct 03 00:54:11.000 [warn] Could not open "/usr/share/doc/tor/tor-exit-notice.html": Permission deniedOct 03 00:54:11.000 [warn] DirPortFrontPage file '/usr/share/doc/tor/tor-exit-notice.html' not found. Continuing anyway.Oct 03 00:54:11.000 [notice] Bootstrapped 0%: StartingOct 03 00:54:12.000 [notice] Starting with guard context "bridges"Oct 03 00:54:12.000 [notice] new bridge descriptor 'XXXXXXX' (cached): $XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX~XXXXXXXXX at XX.XX.XXX.XXOct 03 00:54:12.000 [notice] new bridge descriptor 'XXXXXXXX' (cached): $XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX~XXXXXXXXXXXXX at XX.XXX.XX.XX………Oct 03 00:54:12.000 [notice] new bridge descriptor 'XXXX' (cached): $XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX~XXXX at XX.XX.XX.XXOct 03 00:54:12.000 [notice] new bridge descriptor 'XXXXX' (cached): $XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX~XXXXX at XXX.XX.XX.XXOct 03 00:54:12.000 [notice] Delaying directory fetches: Pluggable transport proxies still configuringOct 03 00:54:12.000 [notice] Opening Socks listener on /var/run/tor/socksOct 03 00:54:12.000 [notice] Opening Control listener on /var/run/tor/controlOct 03 00:54:13.000 [err] tor_assertion_failed_(): Bug: ../src/or/cpuworker.c:499: cpuworker_queue_work: Assertion threadpool failed; aborting. (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: Assertion threadpool failed in cpuworker_queue_work at ../src/or/cpuworker.c:499. Stack trace: (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(log_backtrace+0x42) [0x55fb088902b2] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(tor_assertion_failed_+0x94) [0x55fb088a8904] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(cpuworker_queue_work+0x65) [0x55fb0884c395] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(consdiffmgr_rescan+0x9a7) [0x55fb0883f037] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(+0x4ec7d) [0x55fb08776c7d] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x754) [0x7fa5e1eecf24] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(do_main_loop+0x24d) [0x55fb08777a4d] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(tor_main+0x1c35) [0x55fb0877b215] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(main+0x19) [0x55fb087733c9] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fa5e1099f45] (on Tor 0.3.1.7 )Oct 03 00:54:13.000 [err] Bug: /usr/bin/tor(+0x4b41b) [0x55fb0877341b] (on Tor 0.3.1.7 )
Commenting out #DirPort 9030 solves it. Re-enabling it reproduces that assertion failure.
I now have a working Tor service that is able to go all the way to Bootstrapped 100%: Done.
Please, note that I haven't tested commenting out Dirport within my original configuration before the introduced obfs4, bridge definitions and data files copied from Tor-Browser.
Also note that when the assertion failure disappeared and I was left with "[notice] While fetching directory info, no running dirservers known. Will try again later. (purpose 6)" in Comment:10, I had DirPort 9030 enabled!
(please reopen if this bug occurs in any version released after today.)
It still occurs if server_mode() is false but dir_server_mode() is true. Doesn't seem to make a difference (with 0.3.1.9) if it is set up like that in torrc on startup, or the result of being reconfigured.
(Use case for this configuration: http://github.com/rustybird/corridor calls SETCONF DirPort="127.0.0.1:9030 NoAdvertise" to ensure the client continues to refresh the consensus even when dormant.)
Trac: Username: rustybird Resolution: fixed toN/A Status: closed to reopened
(please reopen if this bug occurs in any version released after today.)
It still occurs if server_mode() is false but dir_server_mode() is true. Doesn't seem to make a difference (with 0.3.1.9) if it is set up like that in torrc on startup, or the result of being reconfigured.
(Use case for this configuration: http://github.com/rustybird/corridor calls SETCONF DirPort="127.0.0.1:9030 NoAdvertise" to ensure the client continues to refresh the consensus even when dormant.)
Running a directory mirror will cause a lot of unnecessary load and disk usage, particularly with newer tor versions. You'll generate a whole bunch of compressed diffs that you'll never serve.
Also, if you want a consensus with IPv6 addresses on a client, use UseMicrodescriptors 0.
If you don't care about descriptors, and want to save bandwidth, use FetchServerDescriptors 0. You might find some bugs using this option, it's not well-tested.
You can set SOCKSPort 0 if you're not using it. It might add a bit of security.
Running maint-0.3.2, I start my Tor client with fetchuselessdescriptors 1 dirport 9030, and on startup I get this stacktrace and abort:
Jan 06 04:25:28.000 [notice] Bootstrapped 85%: Finishing handshake with first hopJan 06 04:25:29.000 [err] tor_assertion_failed_(): Bug: src/or/cpuworker.c:499: cpuworker_queue_work: Assertion threadpool failed; aborting. (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: Assertion threadpool failed in cpuworker_queue_work at src/or/cpuworker.c:499. Stack trace: (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(log_backtrace+0x42) [0x55f592aa5922] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(tor_assertion_failed_+0x8c) [0x55f592ac071c] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(cpuworker_queue_work+0x6f) [0x55f592a4bb1f] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(consdiffmgr_rescan+0x82f) [0x55f592a3e44f] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(+0x51aaf) [0x55f592973aaf] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x7fc) [0x7fbd389a03dc] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(do_main_loop+0x244) [0x55f5929747c4] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(tor_main+0x1c25) [0x55f592978005] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(main+0x19) [0x55f59296ff29] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514) Jan 06 04:25:29.000 [err] Bug: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fbd3794ab45] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514)Jan 06 04:25:29.000 [err] Bug: src/or/tor(+0x4df79) [0x55f59296ff79] (on Tor 0.3.2.8-rc-dev 5f2c7a85671ee514) Aborted
Looks like the consensus diff manager wants to use the threadpool, but I'm not a relay so nothing set it up.
Running maint-0.3.2, I start my Tor client with fetchuselessdescriptors 1 dirport 9030, and on startup I get this stacktrace and abort
To be clear, this is repeatable. I just did it again now, with Tor master:
Apr 09 21:59:53.749 [err] Bug: Assertion threadpool failed in cpuworker_queue_work at src/or/cpuworker.c:510. Stack trace: (on Tor 0.3.4.0-alpha-dev 21c81348a39dd235)Apr 09 21:59:53.749 [err] Bug: src/or/tor(log_backtrace+0x42) [0x5649ed2260b2] (on Tor 0.3.4.0-alpha-dev 21c81348a39dd235)Apr 09 21:59:53.749 [err] Bug: src/or/tor(tor_assertion_failed_+0x8c) [0x5649ed24140c] (on Tor 0.3.4.0-alpha-dev 21c81348a39dd235)Apr 09 21:59:53.749 [err] Bug: src/or/tor(cpuworker_queue_work+0x6f) [0x5649ed1c9cbf] (on Tor 0.3.4.0-alpha-dev 21c81348a39dd235)Apr 09 21:59:53.749 [err] Bug: src/or/tor(consdiffmgr_rescan+0x839) [0x5649ed1bc169] (on Tor 0.3.4.0-alpha-dev 21c81348a39dd235)[...]
The most useful information here would be the tor version and your configuration (the torrc file)
Tor version as reported by apt-cache show tor: 0.3.2.10-1~bionic+1
My torrc: https://pastebin.com/raw/CWTMmwHc
Hmm, I took the torrc from comment:28 to test the patch. The original assert seems to be fixed but now it crashes on a different place:
Apr 17 14:01:00.000 [notice] Bootstrapped 0%: StartingApr 17 14:01:00.000 [notice] Starting with guard context "default"Apr 17 14:01:00.000 [err] tor_assertion_failed_(): Bug: src/or/router.c:142: dup_onion_keys: Assertion onionkey failed; aborting. (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: Assertion onionkey failed in dup_onion_keys at src/or/router.c:142. Stack trace: (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(log_backtrace+0x43) [0x557795fffab3] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(tor_assertion_failed_+0x8d) [0x557796018add] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(dup_onion_keys+0x10f) [0x557795f22a9f] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(server_onion_keys_new+0x41) [0x557795ef2f91] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(+0x1283b7) [0x557795fb93b7] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(threadpool_new+0x18b) [0x55779601f91b] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(cpu_init+0xad) [0x557795fb97dd] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(do_main_loop+0x15d) [0x557795ee0d2d] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(tor_main+0xe25) [0x557795ee3b25] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(main+0x19) [0x557795edc729] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7faf9e096a87] (on Tor 0.3.1.10-dev 386f8016b7373bec)Apr 17 14:01:00.000 [err] Bug: ./src/or/tor(_start+0x2a) [0x557795edc77a] (on Tor 0.3.1.10-dev 386f8016b7373bec)
I guess that when we are not in server mode, Tor won't create the onionkey in init_keys()... I wonder if we should try to fix these situations with patches like the one from comment:29, or we should just disallow having a DirPort without an ORPort and abort if such a configuration is seen. IIUC, we are planning to eventually deprecate DirPort anyhow and just use BEGIN_DIR, right?
Maybe we will deprecate DirPort some time in the future. Maybe we won't, There are bootstrapping and diagnostic issues.
But here are some questions we can answer right now:
Do we support DirPort without ORPort?
If we do, when was the last Tor release that it actually worked?
Why don't we have any tests for DirPort only operation?
As far as I can tell, people who get this bug seem to be setting DirPort as a workaround.
They don't actually want to serve descriptors, they just want them available locally.
If we can't find a use case that involves serving descriptors, I think we should:
fix the hibernation options so they allow people to download descriptors every hour if that's what they need, then
deprecate DirPort-only operation
But I'm not sure if we can remove features as a backport, so we are stuck with fixing crashes like this (or saying "don't do that").
now: we fix the bugs in this feature in 0.3.1 and later
in 0.3.4 or 0.3.5: we decide if we want to support DirPort-only and write tests for it, or if we want to deprecate it
One use case for DirPort-only is a local directory mirror for large deployments. It can be configured using the FallbackDir torrc option, to take load off relays or authorities. But we could just tell people to use ORPort 12345 PublishDescriptor 0 as a workaround.
I've updated bug23693_031_redux with an actual commit to actually work. I'm fine merging it to 0.3.1 and forward; we can open a separate ticket to test or disable the feature.
that 0.3.3.5-rc1 and 0.3.4.0-alpha are available in the PPA, presumably with the fix for this bug merged in it, however when I update with apt, I am still stuck with the 0.3.2.10 version which is unfortunately unusable.
Or is there a particular step to do to be able to update to the pre-release versions?