Opened 6 months ago

Closed 6 months ago

Last modified 5 months ago

#30117 closed defect (fixed)

Support stem's backtrace signals in Travis

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.8-alpha
Severity: Normal Keywords: tor-ci-fail-sometimes, 035-backport, 040-backport, asn-merge
Cc: Actual Points: 0.4
Parent ID: #29437 Points: 0.2
Reviewer: nickm Sponsor: Sponsor31-can

Description

In #29437, we're trying to track down a stem hang in our CI.
In #30012, I added backtrace support to stem.

But we need that backtrace support in our CI to see why stem is hanging.

Child Tickets

Change History (11)

comment:1 Changed 6 months ago by teor

Actual Points: 0.20.3
Status: assignedneeds_revision

Timelimit doesn't signal the whole process group, so we don't actually see the backtrace in the individual tests. We'll need to use kill -pid instead, with some kind of sleep command.

But I did get this far:
https://github.com/torproject/tor/pull/928

comment:2 Changed 6 months ago by teor

Summary: Temporarily use a stem branch with backtrace supportSupport stem's backtrace signals in Travis

#30012 has been merged into stem, so we don't need to use a special stem branch any more.

comment:3 in reply to:  1 Changed 6 months ago by teor

Actual Points: 0.30.4
Keywords: 035-backport 040-backport added
Reviewer: nickm
Status: needs_revisionneeds_review

I change timelimit's signals so they trigger a stem backtrace, and unwrapped "make test-stem" so that timelimit can signal python directly.

See my pull request on 0.3.5:
https://github.com/torproject/tor/pull/932

Assigning review to nickm, because he's on CI this week.

Replying to teor:

Timelimit doesn't signal the whole process group, so we don't actually see the backtrace in the individual tests. We'll need to use kill -pid instead, with some kind of sleep command.

We might need to make stem's tests propagate the signal to their child processes. See #30122 for that change in stem.

Edit: fix ticket number

Last edited 6 months ago by teor (previous) (diff)

comment:4 Changed 6 months ago by nickm

Status: needs_reviewneeds_revision

The patches here look plausible, but the CI is failing:

To run stem's tests you'll need mock...
https://pypi.python.org/pypi/mock/
You can get it by running 'sudo pip install mock'.
The command "if [[ "$TEST_STEM" != "" ]]; then make src/app/tor; timelimit -p -t 540 -s USR1 -T 30 -S ABRT python "$STEM_SOURCE_DIR"/run_tests.py --tor src/app/tor --integ --log notice --target RUN_ALL; fi" exited with 1.

comment:5 Changed 6 months ago by nickm

Is it possible that this is an issue where we've installed mock for python3 but not python2, or something like that?

comment:6 in reply to:  5 Changed 6 months ago by teor

Status: needs_revisionneeds_review

Replying to nickm:

Is it possible that this is an issue where we've installed mock for python3 but not python2, or something like that?

python3 includes mock, it's an optional package in python2.
I pushed a fixup.

Related: from next week, "python" will mean python3 in Travis:
https://changelog.travis-ci.com/upcoming-python-default-version-update-96873

comment:7 Changed 6 months ago by nickm

Keywords: asn-merge added

Mostly LGTM, except the fixup commit is targeted to the wrong commit. I've made a new branch that squashes it correctly; if it passes CI, let's merge. The branch is ticket30117_035_squashed with PR at https://github.com/torproject/tor/pull/941

comment:8 Changed 6 months ago by nickm

Status: needs_reviewmerge_ready

comment:9 Changed 6 months ago by asn

Resolution: fixed
Status: merge_readyclosed

merged!

comment:10 Changed 6 months ago by teor

Milestone: Tor: 0.4.1.x-finalTor: 0.3.5.x-final

Please don't close tickets until they have been backported!

Merged to 0.3.5 and merged forward to 0.4.0.
(The change was already in master.)

Strictly, I shouldn't have merged to 0.4.0 now dgoulet is back. But nickm and asn have looked at this branch. And CI is important.

comment:11 Changed 5 months ago by teor

Sponsor: Sponsor31-can

Setting as sponsor 31 can, because we use these jobs to make sure our refactoring works.

Note: See TracTickets for help on using tickets.