Opened 3 months ago

Last modified 2 days ago

#23693 reopened defect

0.3.1.7: Assertion threadpool failed in cpuworker_queue_work

Reported by: alif Owned by: nickm
Priority: Medium Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version: Tor: 0.3.1.7
Severity: Normal Keywords: 029-backport, 030-backport, 031-backport, review-group-24
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by arma)

On Ubuntu 14.04 I installed Tor version 0.3.1.7 (git-5fa14939bca67c23)

Upon starting tor as a service, it soon crashes. The following are the log entries:

Sep 29 02:26:03.000 [notice] Tor 0.3.1.7 (git-5fa14939bca67c23) opening log file.
Sep 29 02:26:03.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.
Sep 29 02:26:03.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.
Sep 29 02:26:03.000 [warn] Could not open "/usr/share/doc/tor/tor-exit-notice.html": Permission denied
Sep 29 02:26:03.000 [warn] DirPortFrontPage file '/usr/share/doc/tor/tor-exit-notice.html' not found. Continuing anyway.
Sep 29 02:26:03.000 [notice] Bootstrapped 0%: Starting
Sep 29 02:26:04.000 [notice] Starting with guard context "default"
Sep 29 02:26:04.000 [notice] Opening Socks listener on /var/run/tor/socks
Sep 29 02:26:04.000 [notice] Opening Control listener on /var/run/tor/control
Sep 29 02:26:04.000 [notice] Bootstrapped 5%: Connecting to directory server
Sep 29 02:26:04.000 [notice] Bootstrapped 10%: Finishing handshake with directory server
Sep 29 02:26:04.000 [notice] Bootstrapped 15%: Establishing an encrypted directory connection
Sep 29 02:26:05.000 [notice] Bootstrapped 20%: Asking for networkstatus consensus
Sep 29 02:26:05.000 [notice] Bootstrapped 25%: Loading networkstatus consensus
Sep 29 02:26:08.000 [err] tor_assertion_failed_(): Bug: ../src/or/cpuworker.c:499: cpuworker_queue_work: Assertion threadpool failed; aborting. (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug: Assertion threadpool failed in cpuworker_queue_work at ../src/or/cpuworker.c:499. Stack trace: (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(log_backtrace+0x42) [0x5624134a32b2] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(tor_assertion_failed_+0x94) [0x5624134bb904] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(cpuworker_queue_work+0x65) [0x56241345f395] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(consdiffmgr_add_consensus+0x2f3) [0x562413450fe3] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(networkstatus_set_current_consensus+0x9f1) [0x562413395971] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(connection_dir_reached_eof+0xc09) [0x5624134678d9] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(+0x105e6b) [0x562413440e6b] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(+0x4e921) [0x562413389921] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x754) [0x7eff0e3a9f24] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(do_main_loop+0x24d) [0x56241338aa4d] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(tor_main+0x1c35) [0x56241338e215] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(main+0x19) [0x5624133863c9] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7eff0d556f45] (on Tor 0.3.1.7 )
Sep 29 02:26:08.000 [err] Bug:     /usr/bin/tor(+0x4b41b) [0x56241338641b] (on Tor 0.3.1.7 )__

Child Tickets

Change History (18)

comment:1 Changed 3 months ago by arma

Component: - Select a componentCore Tor/Tor
Milestone: Tor: 0.3.2.x-final

comment:2 Changed 3 months ago by arma

Description: modified (diff)

comment:3 Changed 3 months ago by arma

Is this repeatable?

comment:4 Changed 3 months ago by arma

Can you paste your torrc file? It looks like you modified it from the original.

Also, is this the Tor deb? Or did you install Tor from some other way?

comment:5 Changed 3 months ago by arma

Summary: 0.3.1.7 daemon fails0.3.1.7: Assertion threadpool failed in cpuworker_queue_work

comment:6 Changed 2 months ago by nickm

alif, if you could answer any of the questions above, that would help us diagnose and fix this bug. I have some guesses below, but they're just guesses.

Some ideas, based on looking at the code: There are two ways I think this could happen: if we reach cpuworker_queue_work() without having called cpu_init(), or if we somehow fail to create a threadpool in cpu_init() when we do call it. But I don't think it can be the second case, since that would have created a nonfatal assertion from threadpool_new().

We call cpu_init() in two cases: when our settings change, the transition affects workers, and we have become a server; or when we start as a server in main.c.

I think that the check in the first cpu_init() case might be wrong: if we start as a client, and then transition to a bridge (not a public server), I don't think we will trigger options_transition_affects_workers().

comment:7 Changed 2 months ago by nickm

Owner: set to nickm
Status: newaccepted

comment:8 Changed 2 months ago by nickm

Status: acceptedneeds_review

Possible fix in branch bug23693_029 in my public repository, assuming I have the diagnosis right.

comment:9 Changed 2 months ago by nickm

Keywords: 029-backport 030-backport 031-backport added

comment:10 Changed 2 months ago by alif

Well, I'm no longer able to reproduce this, nickm! Sorry.
It persisted for a couple of days after having updated Tor to 0.3.1.7 using a deb from the projects repository, until I had to reboot for a different reason.

Now I'm back to "[notice] While fetching directory info, no running diverseness known. Will try again later. (purpose 6)" which is preventing me from making a circuit via obfs3, even though I'm able to do so in the Tor-browser via obfs4. But that's a different issue.

Anyway, my torrc at the time of the errors is the following (I had disabled bridges to try to debug and to make the report less complicated). I removed commented lines for clarity and redacted secrets:

<begin torrc>

Log notice file /var/log/tor/notices.log

ControlPort 9051
HashedControlPassword 16:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

PortForwarding 1

Address redacted.example.com

Nickname XXXXXXXX

ContactInfo XXXXXXXXelsewhereXXXXXX

DirPort 9030 # what port to advertise for directory connections
DirPortFrontPage /usr/share/doc/tor/tor-exit-notice.html

ExitPolicy reject *:* # no exits allowed

HiddenServiceStatistics 1

UseBridges 0
UpdateBridgesFromAuthority 1

ClientTransportPlugin obfs2,obfs3,ScrambleSuit exec /usr/bin/obfsproxy managed

#Some bridge definitions go here; obfs3 and plain

<end torrc>

Also here's my /etc/apparmor.d/abstractions/tor since I had modified it to be able to run obfsproxy in ubuntu 14.04:
<begin /etc/apparmor.d/abstractions/tor>

# vim:syntax=apparmor

  #include <abstractions/base>
  #include <abstractions/nameservice>

  network tcp,
  network udp,

  capability chown,
  capability dac_read_search,
  capability fowner,
  capability fsetid,
  capability setgid,
  capability setuid,

  /usr/bin/tor r,
  /usr/sbin/tor r,

  # Needed by obfs4proxy
  /proc/sys/net/core/somaxconn r,

  /proc/sys/kernel/random/uuid r,
  /sys/devices/system/cpu/ r,
  /sys/devices/system/cpu/** r,

  /etc/tor/* r,
  /usr/share/tor/** r,

  /usr/bin/obfsproxy PUx,
  /usr/bin/obfs4proxy Pix,

<end /etc/apparmor.d/abstractions/tor>

Last edited 2 months ago by alif (previous) (diff)

comment:11 Changed 2 months ago by alif

Now, trying to solve my connectivity problem, I installed obfs4proxy from the Xenial repository, and copied over the obfs4 bridge definitions from Tor-browser's torrc to /var/lib/tor but still nothing changed. I still got "[notice] While fetching directory info, no running dirservers known. Will try again later. (purpose 6)"

But after I copied the "cached-x" files from Tor browser's Data directory to my system and restarted the Tor service, the exception occurred again:

Oct 03 00:54:11.000 [notice] Tor 0.3.1.7 (git-5fa14939bca67c23) opening log file.
Oct 03 00:54:11.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.
Oct 03 00:54:11.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.
Oct 03 00:54:11.000 [warn] Could not open "/usr/share/doc/tor/tor-exit-notice.html": Permission denied
Oct 03 00:54:11.000 [warn] DirPortFrontPage file '/usr/share/doc/tor/tor-exit-notice.html' not found. Continuing anyway.
Oct 03 00:54:11.000 [notice] Bootstrapped 0%: Starting
Oct 03 00:54:12.000 [notice] Starting with guard context "bridges"
Oct 03 00:54:12.000 [notice] new bridge descriptor 'XXXXXXX' (cached): $XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX~XXXXXXXXX at XX.XX.XXX.XX
Oct 03 00:54:12.000 [notice] new bridge descriptor 'XXXXXXXX' (cached): $XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX~XXXXXXXXXXXXX at XX.XXX.XX.XX
…
…
…
Oct 03 00:54:12.000 [notice] new bridge descriptor 'XXXX' (cached): $XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX~XXXX at XX.XX.XX.XX
Oct 03 00:54:12.000 [notice] new bridge descriptor 'XXXXX' (cached): $XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX~XXXXX at XXX.XX.XX.XX
Oct 03 00:54:12.000 [notice] Delaying directory fetches: Pluggable transport proxies still configuring
Oct 03 00:54:12.000 [notice] Opening Socks listener on /var/run/tor/socks
Oct 03 00:54:12.000 [notice] Opening Control listener on /var/run/tor/control
Oct 03 00:54:13.000 [err] tor_assertion_failed_(): Bug: ../src/or/cpuworker.c:499: cpuworker_queue_work: Assertion threadpool failed; aborting. (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug: Assertion threadpool failed in cpuworker_queue_work at ../src/or/cpuworker.c:499. Stack trace: (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(log_backtrace+0x42) [0x55fb088902b2] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(tor_assertion_failed_+0x94) [0x55fb088a8904] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(cpuworker_queue_work+0x65) [0x55fb0884c395] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(consdiffmgr_rescan+0x9a7) [0x55fb0883f037] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(+0x4ec7d) [0x55fb08776c7d] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x754) [0x7fa5e1eecf24] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(do_main_loop+0x24d) [0x55fb08777a4d] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(tor_main+0x1c35) [0x55fb0877b215] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(main+0x19) [0x55fb087733c9] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fa5e1099f45] (on Tor 0.3.1.7 )
Oct 03 00:54:13.000 [err] Bug:     /usr/bin/tor(+0x4b41b) [0x55fb0877341b] (on Tor 0.3.1.7 )

the files I copied are:

  • cached-certs
  • cached-descriptors
  • cached-descriptors.new
  • cached-microdesc-consensus
  • cached-microdescs
  • cached-microdescs.new

lines changed in torrc:

ClientTransportPlugin obfs2,obfs3,obfs4,scramblesuit exec /usr/bin/obfs4proxy
#ClientTransportPlugin obfs2,obfs3,ScrambleSuit exec /usr/bin/obfsproxy managed

following the previous two lines in torrc are some obfs4 definitions copied from tor-browser

Last edited 2 months ago by alif (previous) (diff)

comment:12 Changed 2 months ago by alif

Commenting out #DirPort 9030 solves it. Re-enabling it reproduces that assertion failure.

I now have a working Tor service that is able to go all the way to Bootstrapped 100%: Done.

Please, note that I haven't tested commenting out Dirport within my original configuration before the introduced obfs4, bridge definitions and data files copied from Tor-Browser.

Also note that when the assertion failure disappeared and I was left with "[notice] While fetching directory info, no running dirservers known. Will try again later. (purpose 6)" in Comment:10, I had DirPort 9030 enabled!

Last edited 2 months ago by alif (previous) (diff)

comment:13 Changed 7 weeks ago by nickm

Keywords: review-group-24 added

review-group-24 is now open.

comment:14 in reply to:  8 Changed 6 weeks ago by dgoulet

Status: needs_reviewmerge_ready

Replying to nickm:

Possible fix in branch bug23693_029 in my public repository, assuming I have the diagnosis right.

lgtm; I confirm that going from client -> bridge is working properly.

Agree on the backport.

comment:15 Changed 6 weeks ago by nickm

Thanks! I've merged this to 0.2.9 and forward.

comment:16 Changed 6 weeks ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.2.9.x-final
Resolution: fixed
Status: merge_readyclosed

(please reopen if this bug occurs in any version released _after_ today.)

comment:17 Changed 3 days ago by rustybird

Resolution: fixed
Status: closedreopened

(please reopen if this bug occurs in any version released _after_ today.)

It still occurs if server_mode() is false but dir_server_mode() is true. Doesn't seem to make a difference (with 0.3.1.9) if it is set up like that in torrc on startup, or the result of being reconfigured.

(Use case for this configuration: http://github.com/rustybird/corridor calls SETCONF DirPort="127.0.0.1:9030 NoAdvertise" to ensure the client continues to refresh the consensus even when dormant.)

comment:18 Changed 2 days ago by nickm

Milestone: Tor: 0.2.9.x-finalTor: 0.3.2.x-final
Note: See TracTickets for help on using tickets.