Opened 8 months ago

Closed 4 months ago

#32804 closed defect (fixed)

Travis CI hangs during compile or test

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.4.4.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: tor-ci-fail-rarely, tor-test, hang, tor-ci
Cc: Actual Points:
Parent ID: #29645 Points: 1
Reviewer: Sponsor:

Description

Like #29645, sometimes src/test/test hangs in Travis CI:
https://travis-ci.org/torproject/tor/jobs/626796156#L3585

Child Tickets

Change History (18)

comment:1 Changed 7 months ago by ahf

Keywords: tor-ci added

comment:2 Changed 7 months ago by nickm

Keywords: 043-should added

comment:3 Changed 7 months ago by teor

I saw this happen again today on macOS. It seems rare enough that it could be a Travis VM issue.

Alternately, we could put a timelimit on src/test/test, and try to get a stacktrace when it hangs.

comment:4 Changed 5 months ago by teor

We're seeing hangs during compilation and chutney on macOS, maybe 1 in 10.

I'll start keeping a full list on this ticket:
https://travis-ci.org/github/teor2345/tor/jobs/662962958 (restarted)

comment:5 Changed 5 months ago by teor

Summary: test hangs in Travis CITravis CI hangs during compile or test

comment:6 Changed 5 months ago by teor

Status: newneeds_information

I sent this email to support at travis ci, following their CI support issue template:

Hi,

We are seeing about half our macOS jobs hang, even though
https://www.traviscistatus.com/ shows:
* no incidents
* 100% uptime for macOS builds

Hangs have been happening intermittently for the past week or two,
but it became severe from approximately Monday 16 March 1100 UTC
to Monday 16 March 1600 UTC.

Since Monday 16 March 1600 UTC we have run 6 macOS jobs, and they
have all passed.

Last time a similar issue happened, I asked:

Are you going to make changes to http://www.traviscistatus.com/ so
that this kind of failure is visible?

Are you going to change your monitoring so you discover issues
like this earlier?

- GitHub accounts

torproject

- Repository names

torproject/tor

- Explanation of the issue

macOS jobs hang during the build, at:
* git clone x2
* setting up build cache
* configure x2
* compile x2

There doesn't seem to be any particular CI stage or command
that triggers the issue.

- Copy of the error message (if any)

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

- Link to a specific build showing the issue

Here are the failed jobs:

https://travis-ci.org/github/torproject/tor/jobs/663014823
https://travis-ci.org/github/torproject/tor/jobs/663014930
https://travis-ci.org/github/torproject/tor/jobs/663015119
https://travis-ci.org/github/torproject/tor/jobs/663015215
https://travis-ci.org/github/torproject/tor/jobs/663015346
https://travis-ci.org/github/torproject/tor/jobs/663015421
https://travis-ci.org/github/torproject/tor/jobs/663015422

Here are the first few successful jobs since the failure rate decreased:

https://travis-ci.org/github/torproject/tor/jobs/663098299
https://travis-ci.org/github/torproject/tor/jobs/663098300
https://travis-ci.org/github/torproject/tor/jobs/663144316
https://travis-ci.org/github/torproject/tor/jobs/663144317

Here is a failed job from last week:

https://travis-ci.org/github/torproject/tor/jobs/661800891
Last edited 5 months ago by teor (previous) (diff)

comment:7 Changed 5 months ago by teor

Here's another hang from last week:
https://travis-ci.org/github/torproject/tor/jobs/661110237

Affects ticket #33032.

comment:8 Changed 5 months ago by teor

Here's a Linux network timeout hang on #33633:
https://travis-ci.org/github/torproject/tor/jobs/663854119#L203

It would be nice if Travis wrapped its own apt-get in travis_retry.

Last edited 5 months ago by teor (previous) (diff)

comment:9 Changed 5 months ago by teor

New issues:

"An error occurred while generating the build script."

https://travis-ci.org/github/teor2345/tor/jobs/664737699

comment:10 Changed 5 months ago by teor

Same with #33428 in chutney:

"An error occurred while generating the build script."

https://travis-ci.com/github/ANURADHAJHA99/chutney/jobs/300248520

Edit: I can't restart the job, it's a user repository.

Last edited 5 months ago by teor (previous) (diff)

comment:12 Changed 5 months ago by teor

MacStadium and Travis CI have both reported macOS outages from March 19 1930-1950 UTC:

They believe it's fixed, but we're still seeing issues.

comment:13 Changed 5 months ago by catalyst

Keywords: tor-ci-fail-rarely added; tor-ci-rarely-fail removed

swap keyword so this ticket will show up in tor-ci-fail searches

comment:14 Changed 5 months ago by catalyst

There has been a series of macOS failures in the Travis infrastructure. The latest is https://www.traviscistatus.com/incidents/h898fkzlp6xf which is supposedly resolved now. Restarted a few errored macOS builds in an attempt to verify this.

comment:15 Changed 5 months ago by catalyst

Restarted about 3 errored macOS jobs. They all succeeded.

comment:16 Changed 5 months ago by teor

I still haven't received any response from Travis support.

Please continue linking failed jobs in this ticket, so I can use those links in a follow-up email.
(They like fresh failures in each email.)

Last edited 4 months ago by teor (previous) (diff)

comment:17 Changed 4 months ago by teor

Keywords: 043-should removed
Milestone: Tor: 0.4.3.x-finalTor: 0.4.4.x-final

This appears to be a Travis infrastructure issue. It shouldn't block our releases.

comment:18 Changed 4 months ago by teor

Resolution: fixed
Status: needs_informationclosed

Travis replied, they believe the issue is resolved on their end:

Hello Teor,

Thanks for your patience on this issue and sorry for the delayed response.

The errors were due to a temporary glitch in our Mac infrastructure that let errors go undetected. We have identified areas of our systems that require additional monitoring to assist us be more proactive in identifying and resolving these kind of errors. Sorry about this please.

We observed your builds are fine now. Please let us know if you run into any more issues.

Thanks and happy building!

Note: See TracTickets for help on using tickets.