Opened 7 weeks ago

Closed 4 weeks ago

#31898 closed defect (fixed)

Occasional crash during shutdown when using Tor API

Reported by: sysrqb Owned by: nickm
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version: Tor: 0.4.1.1-alpha
Severity: Normal Keywords: crash, 042-must 041-backport consider-backport-after-0423, 041-regression dgoulet-merge
Cc: catalyst, hans@… Actual Points: .3
Parent ID: Points: 1
Reviewer: teor Sponsor: Sponsor31-must

Description

This is using a JNI wrapper.

I can probably create a test case for easier debugging. It seems like this is a race condition within the new pubsub system. I can reproduce this with an example program that configures Tor via the API and sends a SIGNAL TERM shortly after tor begins running and the control socket becomes available.

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f8bcc822535 in __GI_abort () at abort.c:79
#2  0x00007f8b94b01dbe in tor_raw_abort_ () at src/lib/err/torerr.c:227
#3  0x00007f8b94b016d0 in crash_handler (sig=<optimized out>, si=<optimized out>, ctx_=<optimized out>) at src/lib/err/backtrace.c:175
#4  <signal handler called>
#5  0x00007f8b94ae0ae4 in max_in_sl (sl=<optimized out>, sl=<optimized out>, dflt=0) at src/lib/dispatch/dispatch_new.c:36
#6  dispatch_new (cfg=0x7f8b64033520) at src/lib/dispatch/dispatch_new.c:121
#7  0x00007f8b94ade9ac in pubsub_builder_finalize (builder=0x7f8b64029eb0, items_out=0x7f8b94c193b8 <the_pubsub_items>) at src/lib/pubsub/pubsub_build.c:293
#8  0x00007f8b9496666c in tor_mainloop_connect_pubsub (builder=0x7f8b64029eb0) at src/core/mainloop/mainloop_pubsub.c:56
#9  0x00007f8b94952741 in pubsub_install () at src/app/main/main.c:1246
#10 tor_run_main (tor_cfg=<optimized out>) at src/app/main/main.c:1297
#11 0x00007f8b9495034c in RunMain (env=0x7f8bc4250b18, thisObj=0x7f8b9431d9b8) at src/feature/api/TorApi.cc:179
#12 0x00007f8b949502dd in Java_api_TorApi_runMain (env=0x7f8bc4250b18, thisObj=0x7f8b9431d9b8) at src/feature/api/TorApi.cc:279
#13 0x00007f8bac8d4990 in ?? ()
#14 0x00007f8bac8ce450 in ?? ()
#15 0x0000000718c7c3a8 in ?? ()
#16 0x00007f8b9431d950 in ?? ()
#17 0x0000000000000000 in ?? ()

Bisecting found a8c0f4ddfe3f0a63bd499959c8d921346aa9766e is the first bad commit.

The fault is at the first dereference (i don't know if it's *u or *maxptr). I'm guessing the memory was already freed for exit by the time this procedure is called.

 29 static int
 30 max_in_sl(const smartlist_t *sl, int dflt)
 31 {
 32   uint16_t *maxptr = NULL;
 33   SMARTLIST_FOREACH_BEGIN(sl, uint16_t *, u) {
 34     if (!maxptr)
 35       maxptr = u;
 36     else if (*u > *maxptr)
 37       maxptr = u;
 38   } SMARTLIST_FOREACH_END(u);
 39 
 40   return maxptr ? *maxptr : dflt;
 41 }

Tor was configured with --Log "notice stdout" --AvoidDiskWrites 1 --SocksPort unix:${HOME}/private_tor/test_socks --DisableNetwork 0.

This shouldn't be a common edge case, but I stumbled upon it while testing something else.

Child Tickets

Change History (16)

comment:1 Changed 7 weeks ago by arma

Cc: catalyst added
Milestone: Tor: 0.4.2.x-final
Version: Tor: 0.4.2.1-alpha
$ git describe --contains a8c0f4ddf
tor-0.4.2.1-alpha~170^2~2

so it looks like it's in the 0.4.2 tree, not the 0.4.1 stable. Perfect time to fix it. :)

comment:2 Changed 7 weeks ago by teor

Keywords: 042-regression crash 042-must added
Points: 1
Sponsor: Sponsor31-must

Marking with sponsor 31, because this is a sponsor 31 bug.

comment:3 Changed 7 weeks ago by nickm

Keywords: 041-backport? 041-regression? added
Owner: set to nickm
Status: newaccepted

comment:4 Changed 7 weeks ago by nickm

A smaller test case might help here, or maybe a link to the RunMain source.

One thing to note is that the stack trace that you're describing seems to be happening inside the initialization code, before Tor is actually running. Maybe this is something that happens on a second initialization and not a first?

comment:5 in reply to:  4 Changed 7 weeks ago by sysrqb

Replying to nickm:

A smaller test case might help here, or maybe a link to the RunMain source.

I rebased the branch I used for testing the API on top of current master. The configuration is handled here and RunMain is there.

This is basically a thin wrapper around the API.

One thing to note is that the stack trace that you're describing seems to be happening inside the initialization code, before Tor is actually running. Maybe this is something that happens on a second initialization and not a first?

Yes, that is what this code is doing. On the Java-side, it takes in commandline args which are passed into tor for configuration, but it configures and runs tor twice. The first time the program inserts --verify-config as the second argument, and then after tor_run_main returns, it calls tor_main_configuration_set_command_line again except this time with the original args (without --verify-config) and then calls tor_run_main again.

Is reusing a tor_main_configuration_t with different args a supported use case? I haven't looked at the code yet. I don't see this crash during every execution, I usually run the program in a while loop until it crashes (which usually takes between 5 and 30 iterations).

comment:6 Changed 7 weeks ago by nickm

Is reusing a tor_main_configuration_t with different args a supported use case?

Hm. I hadn't planned for that. I'm not sure why it wouldn't work, though.

comment:7 Changed 7 weeks ago by nickm

Okay, I think I found the bug: max_in_sl() is implemented incorrectly.

comment:8 Changed 7 weeks ago by nickm

Actual Points: .3
Keywords: 041-backport added; 042-regression 041-backport? removed
Priority: MediumHigh

Branch is bug31898_041; PR at https://github.com/torproject/tor/pull/1385 . See the commit messages for an explanation of why the bug took us so long to hit in practice.

I'll put this in needs_review once the tests pass.

comment:9 Changed 7 weeks ago by nickm

Status: acceptedneeds_review

comment:10 Changed 7 weeks ago by eighthave

Actual Points: .3
Cc: hans@… added
Keywords: 042-regression 041-backport? added; 041-backport removed
Priority: HighMedium

comment:11 Changed 7 weeks ago by nickm

Actual Points: .3
Keywords: 041-backport 041-regression added; 042-regression 041-backport? 041-regression? removed

comment:12 Changed 7 weeks ago by teor

Keywords: consider-backport-after-0423 added
Reviewer: teor
Status: needs_reviewmerge_ready
Version: Tor: 0.4.2.1-alphaTor: 0.4.1.1-alpha

Looks fine to me.

The actual fix is small, this PR is mostly tests and test refactoring.

comment:13 Changed 7 weeks ago by nickm

Keywords: asn-merge added

comment:14 Changed 7 weeks ago by dgoulet

Keywords: dgoulet-merge added; asn-merge removed

comment:15 Changed 7 weeks ago by dgoulet

Milestone: Tor: 0.4.2.x-finalTor: 0.4.1.x-final

Merged! Moving to 041 milestone for backport.

comment:16 Changed 4 weeks ago by nickm

Resolution: fixed
Status: merge_readyclosed

Merged to 0.4.1.

Note: See TracTickets for help on using tickets.