Opened 10 months ago

Closed 2 months ago

#27073 closed defect (fixed)

conditionvar_timeout intermittently fails on maint-0.3.3

Reported by: teor Owned by: nickm
Priority: Medium Milestone: Tor: 0.3.4.x-final
Component: Core Tor/Tor Version: Tor: 0.3.3.8
Severity: Normal Keywords: tor-test, 029-backport, 034-backport, 032-unreached-backport, 033-backport-unreached
Cc: nickm Actual Points:
Parent ID: Points:
Reviewer: dgoulet Sponsor:

Description

util/thread/conditionvar_timeout failed in the last maint-0.3.3 build:
https://travis-ci.org/torproject/tor/builds/411701508

With the following message:

util/thread/conditionvar_timeout: [forking] 
  FAIL src/test/test_threads.c:285: assert(ti->n_timeouts OP_EQ 2): 1 vs 2Aug 03 13:03:59.190 [err] Error 16 destroying a mutex.
Aug 03 13:03:59.190 [err] tor_assertion_failed_(): Bug: src/common/compat_pthreads.c:172: tor_mutex_uninit: Assertion 0 failed; aborting. (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug: Assertion 0 failed in tor_mutex_uninit at src/common/compat_pthreads.c:172. Stack trace: (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(log_backtrace+0x46) [0x55cd7c253ec6] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(tor_assertion_failed_+0xe4) [0x55cd7c294434] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(tor_mutex_uninit+0xa6) [0x55cd7c29d736] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(tor_mutex_free_+0x25) [0x55cd7c262dc5] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(+0x54508a) [0x55cd7bed908a] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(+0x5c840d) [0x55cd7bf5c40d] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(testcase_run_one+0x310) [0x55cd7bf5c7d0] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(tinytest_main+0x1eb) [0x55cd7bf5d58b] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(main+0x4e9) [0x55cd7bbb61b9] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x2b2813951f45] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(+0x224fdb) [0x55cd7bbb8fdb] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
[Lost connection!] 
  [conditionvar_timeout FAILED]

Child Tickets

Change History (12)

comment:1 Changed 10 months ago by nickm

Milestone: Tor: unspecifiedTor: 0.3.5.x-final

The variability of the result here, combined with the test in which it's happening, make me suspect that this is another one of those failures based on timing that can only happen on heavily loaded systems.

If that guess is right, the usual (bad) fix would be to increase a timeout in the test. The better fix would be to revise the test so that it survives when the process is paused for a long time.

comment:3 Changed 8 months ago by nickm

Keywords: 029-backport 032-backport 033-backport 034-backport added; regression? removed
Owner: set to nickm
Status: newaccepted

Oh, I've found two problems here: first, the test launches 4 threads, but it doesn't make sure they're all done initializing before it starts signalling them. That probably isn't what's going wrong here.

The problem is that waiting for 200 msec is not enough when the system is under heavy load. Instead, we should watch for the threads exiting, and require that they do so in some reasonable amount of time, but put the boundary very high on what counts as "reasonable".

I've got a patch for the test to do this in bug27073_029, with PR at https://github.com/torproject/tor/pull/331

comment:4 Changed 8 months ago by nickm

Status: acceptedneeds_review

comment:5 Changed 8 months ago by asn

Reviewer: dgoulet

comment:6 Changed 8 months ago by dgoulet

Status: needs_reviewmerge_ready

lgtm;

comment:7 Changed 8 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

merging to master; marking for possible backport.

comment:8 Changed 6 months ago by teor

Keywords: 032-unreached-backport added; 032-backport removed

0.3.2 is end of life, so 032-backport is now 032-unreached-backport.

comment:9 Changed 3 months ago by teor

Keywords: 033-backport-unreached added; 033-backport removed

These merge_ready tickets did not get backported to 0.3.3, and 0.3.3 is now unsupported.

comment:10 Changed 3 months ago by teor

I re-opened the pull request https://github.com/torproject/tor/pull/331 to get a recent merge built in CI.

comment:11 Changed 2 months ago by teor

This merge to 0.2.9 conflicts with 0.3.4, here is the 0.3.4 pull request:
https://github.com/torproject/tor/pull/794

comment:12 Changed 2 months ago by teor

Resolution: fixed
Status: merge_readyclosed

#27073, #28096, #25773, #23681, and #23512 were merged to 0.2.9 and later after testing the new maint-0.2.9, maint-0.3.4, and master on CI.

Note: See TracTickets for help on using tickets.