Opened 3 years ago

Closed 3 years ago

#19974 closed defect (fixed)

Test failure for monotonic time on some machines

Reported by: cypherpunks Owned by: nickm
Priority: Medium Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version: Tor: 0.2.9.2-alpha
Severity: Normal Keywords: regression
Cc: s7r Actual Points:
Parent ID: Points: .1
Reviewer: Sponsor:

Description

  • git commit 20136a8207f0cda0d23093a884caae0358f98bac
  • debian unstable 32bit
util/monotonic_time: 
  FAIL ../src/test/test_util.c:5356: assert(usec1 OP_LE nsec1 / 1000 +10): 140514332 vs 140514092
  [monotonic_time FAILED]

Child Tickets

Change History (11)

comment:1 Changed 3 years ago by nickm

Milestone: Tor: 0.2.9.x-final

Can you describe the system this was running on?

comment:2 Changed 3 years ago by cypherpunks

The machine is a "classic" i686 ; tor is used as a transparent proxy for other machines on the local network.

Anyway the test did not fail again after restarting the process without modifying anything else ... maybe the machine is too old and starts to be prone to hardware errors.

comment:3 Changed 3 years ago by nickm

Keywords: regression added
Summary: Test failureTest failure for monotonic time on slow machines

comment:4 Changed 3 years ago by nickm

Owner: set to nickm
Points: .1
Status: newaccepted

comment:5 Changed 3 years ago by arma

Do we have a guess what this is a 'regression' from?

comment:6 Changed 3 years ago by teor

0.2.9.1-alpha:

    - Tor now uses the operating system's monotonic timers (where
      available) for internal fine-grained timing. Previously we would
      look at the system clock, and then attempt to compensate for the
      clock running backwards. Closes ticket 18908.

comment:8 Changed 3 years ago by s7r

Are we sure it occurs as described on slow machines only? Because the machine under my control which experienced this isn't old and shouldn't be so slow at all.

I have tried with Tor process off and absolutely nothing else running on the machine, got two times in a row:

util/monotonic_time:
  FAIL src/test/test_util.c:5536: assert(usecc1 OP_LE nsecc1 / 1000 + 10): 10601698 vs 10601677
  [monotonic_time FAILED]
util/monotonic_time:
  FAIL src/test/test_util.c:5536: assert(usecc1 OP_LE nsecc1 / 1000 + 10): 7245695 vs 7245672
  [monotonic_time FAILED]

3rd time it reported OK without restarting the server or any process. After 3 minutes 4th time again fails:

util/monotonic_time:
  FAIL src/test/test_util.c:5534: assert(usec1 OP_LE nsec1 / 1000 +10): 13577362 vs 13577356
  [monotonic_time FAILED]
util/monotonic_time:
  FAIL src/test/test_util.c:5534: assert(usec1 OP_LE nsec1 / 1000 +10): 7813125 vs 7813119
  [monotonic_time FAILED]

Before and after each failed test I've checked the date on the machine against a time server and it was synced exactly.

I think we can eliminate the variant that the Tor process hangs, as described in https://trac.torproject.org/projects/tor/ticket/20423#comment:21 because as we see it happens even when Tor isn't actually running as a process on the server. Tried several times on another machine running Debian but got OK result every time.

comment:9 Changed 3 years ago by teor

Summary: Test failure for monotonic time on slow machinesTest failure for monotonic time on some machines

comment:10 Changed 3 years ago by arma

Does this mean that the monotonic time implementation is broken, on at least some machines, and we should reconsider whether we want to use it?

comment:11 Changed 3 years ago by nickm

Resolution: fixed
Status: acceptedclosed

I think it means that the monotonic time implementation just takes longer than 10 us sometimes. My commit in fce425e3ff0281de29f04ac46d8c395befee607d is an attempt to work around that in the tests. Please reopen if it doesn't work.

Note: See TracTickets for help on using tickets.