Opened 7 months ago

Closed 6 months ago

#25450 closed defect (fixed)

Intermittent test failures for hs_service/service_intro_point

Reported by: isis Owned by: isis
Priority: Very High Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version: Tor: 0.3.1.3-alpha
Severity: Minor Keywords: tor-unittests review-group-35 031-backport 032-backport
Cc: Actual Points:
Parent ID: Points: .5
Reviewer: Sponsor: Sponsor8-can

Description

weasel and teor both noted this. I've also experienced this test failing about 1 out of 30 or so times on my development laptop.

10:32               weasel+ | 10:28:33 hs_service/service_intro_point: [forking] 
10:32               weasel+ | 10:28:33   FAIL ../src/test/test_hs_service.c:420: assert(ip->time_to_expire OP_GE now + INTRO_POINT_LIFETIME_MIN_SECONDS + 5): 1520483305 vs 1520483309
10:32               weasel+ | 10:28:33   [service_intro_point FAILED]
10:32               weasel+ | https://jenkins.torproject.org/job/tor-debian-0.3.2-nightly-binaries/ARCHITECTURE=i386,SUITE=buster/179/console
10:32               weasel+ | maybe I'm imagining or misattributing things, but hasn't this test ALWAYS been buggy?
10:33                teor4  | I remember it being an issue before. Probably one for asn or dgoulet

It's because there is a hardcoded time_t that uses the real system time and then later the test expects an HS intro point to have been established within 5 seconds of real clock time.

I'm going to do the stupid fix of increasing it to 500 seconds. Possibly in the future there should be a more elaborate solution involving mocking the time() function by passing in a function pointer to its implementation into the functions being tested?

Child Tickets

Change History (14)

comment:1 Changed 7 months ago by isis

Keywords: review-group-35 added
Status: assignedneeds_review

The easy version of the fix is in my bug25450 branch.

comment:2 Changed 7 months ago by teor

Keywords: 031-backport 032-backport added
Status: needs_reviewmerge_ready

Looks trivial, and correct.

If we still want the 0.3.1 and 0.3.2 unit tests to work reliably, we should backport this fix.

comment:3 in reply to:  2 ; Changed 7 months ago by isis

Replying to teor:

Looks trivial, and correct.

If we still want the 0.3.1 and 0.3.2 unit tests to work reliably, we should backport this fix.


Actually wait, it's incorrect. The test fails more now. (See the TravisCI results here.)

This was because this test was testing that the IP was created, and its expiration time was set to a time greater than or equal to now + INTRO_POINT_LIFETIME_MIN_SECONDS + 5 (original code):

    /* Time to expire MUST also be in that range. We add 5 seconds because
     * there could be a gap between setting now and the time taken in
     * service_intro_point_new. On ARM, it can be surprisingly slow... */
    tt_u64_op(ip->time_to_expire, OP_GE,
              now + INTRO_POINT_LIFETIME_MIN_SECONDS + 5);

However, this appears to be a typo, since, according to the comment above it, adding five seconds was done because the IP creation can be slow on some systems. But the five seconds is added to the minimum time we're comparing against, and so it actually functions to make this test more likely to fail on slower systems. (It should either subtract five seconds, or instead add it to ip->time_to_expire.)

I've fixed up my commit so that it now subtracts 500 seconds instead of adding, and now it passes.

comment:4 in reply to:  3 Changed 7 months ago by isis

Replying to isis:

I've fixed up my commit so that it now subtracts 500 seconds instead of adding, and now it passes.


Okay, this was a lie, make check-changes failed because it didn't like me insulting my patch, apparently. (I wrote Fixes bug 25450 (shoddily); bugfix on 0.3.1.3-alpha.)

It passes now.

comment:5 Changed 7 months ago by teor

I don't think we taught the script to ignore snark.

Also, I bet there are a few more sign inversion errors hiding in our codebase.

comment:6 in reply to:  5 Changed 7 months ago by isis

Replying to teor:

I don't think we taught the script to ignore snark.

Also, I bet there are a few more sign inversion errors hiding in our codebase.


After a brief foray into parts of the codebase I don't normally touch… yes, very likely. More testing! :)

It seems this test doesn't exist in the maint-0.3.1 branch, even though git describe b547c542393 tells me it was 0.3.1.3-alpha? It does exist in 0.3.2, so I've backported to that in my bug25450_032 branch (tests pass).

comment:7 Changed 7 months ago by teor

I think you want git describe --contains, which I can't run on my current device to check the output.

comment:8 in reply to:  7 Changed 6 months ago by isis

Replying to teor:

I think you want git describe --contains, which I can't run on my current device to check the output.


Aha, thanks! I used to alias git describe to that in my .gitconfig, but past me seems to have commented that out for unknown reasons. 🙄 🤷🏻‍

comment:9 Changed 6 months ago by nickm

looks okay; merging.

comment:10 Changed 6 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.2.x-final
Resolution: fixed
Status: merge_readyclosed

comment:11 Changed 6 months ago by nickm

Resolution: fixed
Status: closedreopened

I've seen this on one of the jenkins builders:

10:21:25 hs_service/service_intro_point: [forking] 
10:21:25   FAIL ../tor/src/test/test_hs_service.c:423: assert(ip->time_to_expire OP_LE now + 
INTRO_POINT_LIFETIME_MAX_SECONDS - 500): 1521540810 vs 1521540658

Should that - be a +?

comment:12 in reply to:  11 Changed 6 months ago by isis

Replying to nickm:

I've seen this on one of the jenkins builders:

10:21:25 hs_service/service_intro_point: [forking] 
10:21:25   FAIL ../tor/src/test/test_hs_service.c:423: assert(ip->time_to_expire OP_LE now + 
INTRO_POINT_LIFETIME_MAX_SECONDS - 500): 1521540810 vs 1521540658

Should that - be a +?


Hmmm… originally it was a +, but that appeared to cause more failures, so I changed it to a - with this reasoning (above).

EDIT: Oh, I see now, one of them is testing again the MAX. Yes, you're correct. I'll change it.

Last edited 6 months ago by isis (previous) (diff)

comment:13 Changed 6 months ago by isis

Status: reopenedmerge_ready

Okay, there's a fix for that, as another commit in my bug25450 branch (along with a comment above the test so that nobody misses the distinction between the two assertions like I just did again).

comment:14 Changed 6 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

Okay, cool! I've cherry-picked that to 0.3.2 and merged it forward. Marking this ticket closed again ... this time for reals?

Note: See TracTickets for help on using tickets.