Opened 3 years ago

Closed 3 years ago

#22209 closed defect (fixed)

Channelpadding tests fail on kevent-based systems

Reported by: nickm Owned by: nickm
Priority: High Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords:
Cc: mikeperry Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

First observed on freebsd, reproduced on osx:

[532]$ ./src/test/test channelpadding/..
channelpadding/channelpadding_decide_to_pad_channel: [forking] [warn] kevent: Bad file descriptor

  FAIL src/test/test_channelpadding.c:759: assert(tried_to_write_cell OP_EQ 1): 0 vs 1May 09 13:07:08.604 [err] tor_assertion_failed_: Bug: src/or/scheduler.c:398: scheduler_release_channel__real: Assertion channels_pending failed; aborting. (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug: Assertion channels_pending failed in scheduler_release_channel__real at src/or/scheduler.c:398. Stack trace: (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     0   test                                0x000000010f350f49 log_backtrace + 73 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     1   test                                0x000000010f3b600e tor_assertion_failed_ + 318 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     2   test                                0x000000010f2de166 scheduler_release_channel__real + 310 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     3   test                                0x000000010ee2b4bc channel_change_state + 1180 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     4   test                                0x000000010ee2ac52 channel_mark_for_close + 386 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     5   test                                0x000000010ee3b0e4 channel_free_list + 2308 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     6   test                                0x000000010ee3a4ef channel_free_all + 223 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     7   test                                0x000000010ea8a673 test_channelpadding_decide_to_pad_channel + 1283 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     8   test                                0x000000010edf25d9 testcase_run_bare_ + 329 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     9   test                                0x000000010edf2374 testcase_run_one + 2932 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     10  test                                0x000000010edf3861 tinytest_main + 1553 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     11  test                                0x000000010edf0397 main + 1703 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     12  libdyld.dylib                       0x00007fffa1660235 start + 1 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
May 09 13:07:08.612 [err] Bug:     13  ???                                 0x0000000000000002 0x0 + 2 (on Tor 0.3.1.0-alpha-dev 9f5b71a7cafbaf14)
[Lost connection!] 
  [channelpadding_decide_to_pad_channel FAILED]
channelpadding/channelpadding_negotiation: [forking] OK
channelpadding/channelpadding_consensus: [forking] [warn] kevent: Bad file descriptor

  FAIL src/test/test_channelpadding.c:398: assert(tried_to_write_cell OP_EQ 1): 0 vs 1
  [channelpadding_consensus FAILED]
channelpadding/channelpadding_timers: [forking] [warn] kevent: Bad file descriptor

  FAIL src/test/test_channelpadding.c:324: assert(tried_to_write_cell OP_EQ CHANNELS_TO_TEST): 0 vs 100
  [channelpadding_timers FAILED]
3/4 TESTS FAILED. (0 skipped)

The problem might be because of the "kevent: bad file descriptor" thing; there might be a libevent mainloop that is not surviving across fork(), or is getting closed somewhere.

Child Tickets

Change History (7)

comment:2 Changed 3 years ago by nickm

mike -- could you look at the stack trace? I'll do the event_reinit() thing, since it's backend stuff.

comment:3 Changed 3 years ago by nickm

95fa7d1cf82c68d7d39f423cf71fa8e097662de3 in master fixes these failures on OSX for me, by adding the event_reinit() call.

I'm leaving this ticket open, though, until we do something about the stack trace that the missing event_reinit() had caused.

comment:4 Changed 3 years ago by mikeperry

I am having trouble finding the branch that matches those line numbers exactly, but those asserts are checking to see if cells were written due to timers firing. If not re-initting libevent was causing timers not to fire, that would explain those test assert fails.

The backtrace assert is also hard to tell. Is chan null, or is channels_pending NULL? It might make sense that channels_pending was NULL if libevent never got set up properly for anything to ever get written?

Or is the backtrace still happening and is something different?

(Also, do we have jenkins bots to watch for this? I don't see any for freebsd, unless they are just named weird).

comment:5 Changed 3 years ago by nickm

Owner: set to nickm
Status: newassigned

On IRC I've said I'll look at the users of channel_pending and see if they should handle this failure more gracefully.

comment:6 Changed 3 years ago by nickm

Mike says that the problem seems to be that the tests weren't calling scheduler_init() anywhere sensible ... but it isn't apparent how fixing the libevent issue made it get called, or made its not getting called not matter?

comment:7 Changed 3 years ago by nickm

Resolution: fixed
Status: assignedclosed

I've looked over this some more, and I don't think that there's more to do in 0.3.1. I think we'll be okay.

Note: See TracTickets for help on using tickets.