SETCONF Log results in segfault when not running a relay
To be precise, we get a segfault when SETCONF
:ing Log
so its value changes in any way (but only when no running as a relay). Here's a backtrace:
#0 crash_handler (sig=11, si=0xbfffeb4c, ctx_=0xbfffebcc)
at ../src/common/backtrace.c:121
#1 <signal handler called>
#2 0xb7bb9e8d in pthread_mutex_lock ()
from /lib/i386-linux-gnu/libpthread.so.0
#3 0xb7cadd76 in pthread_mutex_lock () from /lib/i386-linux-gnu/libc.so.6
#4 0x8014323c in tor_mutex_acquire (m=m`entry=0x50)
at ../src/common/compat_pthreads.c:125
#5 0x80142175 in threadpool_queue_update (pool=0x0,
dup_fn=dup_fn`entry=0x800f1e50 <worker_state_new>,
fn=fn`entry=0x800f1de0 <update_state_threadfn>,
free_fn=free_fn`entry=0x800f1d80 <worker_state_free>, arg=arg`entry=0x0)
at ../src/common/workqueue.c:329
#6 0x800f247f in cpuworkers_rotate_keyinfo () at ../src/or/cpuworker.c:181
#7 0x800c9431 in options_act (old_options=0x801ebf48)
at ../src/or/config.c:1731
#8 set_options (new_val=new_val`entry=0x80a52ee8, msg=msg`entry=0xbffff174)
at ../src/or/config.c:653
#9 0x800cb4e7 in options_trial_assign (list=0x80a503c0,
use_defaults=use_defaults`entry=0, clear_first=clear_first`entry=1,
msg=msg`entry=0xbffff174) at ../src/or/config.c:2066
#10 0x800e98f8 in control_setconf_helper (conn=conn`entry=0x80a51978,
len=len`entry=5, body=<optimized out>, body`entry=0x80a51ae0 "Log\r\n",
use_defaults=0) at ../src/or/control.c:739
#11 0x800ee215 in handle_control_resetconf (body=<optimized out>,
len=<optimized out>, conn=<optimized out>) at ../src/or/control.c:786
#12 connection_control_process_inbuf (conn=conn`entry=0x80a51978)
at ../src/or/control.c:3444
#13 0x800cfb0c in connection_process_inbuf (conn=conn`entry=0x80a51978,
package_partial=package_partial`entry=1) at ../src/or/connection.c:4584
#14 0x800d5cfb in connection_handle_read_impl (conn=0x80a51978)
at ../src/or/connection.c:3340
#15 connection_handle_read (conn=conn`entry=0x80a51978)
at ../src/or/connection.c:3381
#16 0x8002d5e1 in conn_read_callback (fd=21, event=2, _conn=0x80a51978)
at ../src/or/main.c:777
#17 0xb7f4f522 in event_base_loop ()
from /usr/lib/i386-linux-gnu/libevent-2.0.so.5
#18 0x8002dfbf in do_main_loop () at ../src/or/main.c:2104
#19 0x80031955 in tor_main (argc=argc`entry=3, argv=argv`entry=0xbffff724)
at ../src/or/main.c:3078
#20 0x8002a1e3 in main (argc=3, argv=0xbffff724) at ../src/or/tor_main.c:30
Note that in #5 (closed), pool
is NULL
, and then we have tor_mutex_acquire(&pool->lock);
where &pool->lock
becomes the pointer 0x50
, that we later use as an argument in pthread_mutex_lock()
which of course leads to this segfault. That pool
is NULL
is because when we call cpuworkers_rotate_keyinfo()
earlier in the stack, and the global variable threadpool
is NULL
(from initialization) because cpu_init()
was never called. I set a breakpoint to verify, but from reading the code around all calls of cpu_init()
it's clear it will only be called when the Tor client is acting as a relay since it's always wrapped inside a if (server_mode(...))
.
So, in options_act()
, we'll end up call cpuworkers_rotate_keyinfo()
because transition_affects_workers
is set, which is done like this:
const int transition_affects_workers =
old_options && options_transition_affects_workers(old_options, options);
Changing the Log
option is something that probably rightfully will cause options_transition_affects_workers
to return true, resulting in this variable being set, and hence doing all sorts of cpuworker-related stuff even though they aren't used since we're not running a relay. At least in the context where we end up calling cpuworkers_rotate_keyinfo()
, this seems wrong, but if it's also wrong in the other places when code is run because transition_affects_workers
, then it feels like we should instead do:
const int transition_affects_workers =
old_options && server_mode(options) &&
options_transition_affects_workers(old_options, options);
Or something similar. My point is that we need to be running a relay for the concept of cpuworkers (and options transitions affecting them) to be relevant. Otherwise a server_mode()
check is needed somewhere around that cpuworkers_rotate_keyinfo()
call, and possibly at other places.