Opened 2 months ago

Last modified 4 weeks ago

#27073 merge_ready defect

conditionvar_timeout intermittently fails on maint-0.3.3

Reported by: teor Owned by: nickm
Priority: Medium Milestone: Tor: 0.3.4.x-final
Component: Core Tor/Tor Version: Tor: 0.3.3.8
Severity: Normal Keywords: tor-test, 029-backport 032-backport 033-backport 034-backport
Cc: nickm Actual Points:
Parent ID: Points:
Reviewer: dgoulet Sponsor:

Description

util/thread/conditionvar_timeout failed in the last maint-0.3.3 build:
https://travis-ci.org/torproject/tor/builds/411701508

With the following message:

util/thread/conditionvar_timeout: [forking] 
  FAIL src/test/test_threads.c:285: assert(ti->n_timeouts OP_EQ 2): 1 vs 2Aug 03 13:03:59.190 [err] Error 16 destroying a mutex.
Aug 03 13:03:59.190 [err] tor_assertion_failed_(): Bug: src/common/compat_pthreads.c:172: tor_mutex_uninit: Assertion 0 failed; aborting. (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug: Assertion 0 failed in tor_mutex_uninit at src/common/compat_pthreads.c:172. Stack trace: (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(log_backtrace+0x46) [0x55cd7c253ec6] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(tor_assertion_failed_+0xe4) [0x55cd7c294434] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(tor_mutex_uninit+0xa6) [0x55cd7c29d736] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(tor_mutex_free_+0x25) [0x55cd7c262dc5] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(+0x54508a) [0x55cd7bed908a] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(+0x5c840d) [0x55cd7bf5c40d] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(testcase_run_one+0x310) [0x55cd7bf5c7d0] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(tinytest_main+0x1eb) [0x55cd7bf5d58b] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(main+0x4e9) [0x55cd7bbb61b9] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x2b2813951f45] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
Aug 03 13:03:59.191 [err] Bug:     ./src/test/test(+0x224fdb) [0x55cd7bbb8fdb] (on Tor 0.3.3.8-dev 2a6c1585b0f13e03)
[Lost connection!] 
  [conditionvar_timeout FAILED]

Child Tickets

Change History (7)

comment:1 Changed 2 months ago by nickm

Milestone: Tor: unspecifiedTor: 0.3.5.x-final

The variability of the result here, combined with the test in which it's happening, make me suspect that this is another one of those failures based on timing that can only happen on heavily loaded systems.

If that guess is right, the usual (bad) fix would be to increase a timeout in the test. The better fix would be to revise the test so that it survives when the process is paused for a long time.

comment:3 Changed 5 weeks ago by nickm

Keywords: 029-backport 032-backport 033-backport 034-backport added; regression? removed
Owner: set to nickm
Status: newaccepted

Oh, I've found two problems here: first, the test launches 4 threads, but it doesn't make sure they're all done initializing before it starts signalling them. That probably isn't what's going wrong here.

The problem is that waiting for 200 msec is not enough when the system is under heavy load. Instead, we should watch for the threads exiting, and require that they do so in some reasonable amount of time, but put the boundary very high on what counts as "reasonable".

I've got a patch for the test to do this in bug27073_029, with PR at https://github.com/torproject/tor/pull/331

comment:4 Changed 5 weeks ago by nickm

Status: acceptedneeds_review

comment:5 Changed 4 weeks ago by asn

Reviewer: dgoulet

comment:6 Changed 4 weeks ago by dgoulet

Status: needs_reviewmerge_ready

lgtm;

comment:7 Changed 4 weeks ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

merging to master; marking for possible backport.

Note: See TracTickets for help on using tickets.