Opened 9 months ago

Closed 5 weeks ago

Last modified 5 weeks ago

#24629 closed enhancement (implemented)

Activate osx builds on travis, at low priority

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: not-just-linux, tor-ci, teor-was-assigned, 034-triage-20180328, 034-removed-20180328, 034-backport, 035-removed-20180711, fast-fix
Cc: dmr Actual Points:
Parent ID: Points: 0.5
Reviewer: catalyst Sponsor:

Description

We want to activate osx builds on Travis CI.
But they can be very slow, so we need to customise the settings.

If possible, we want osx builds to behave as follows:

  • if there is a pending build, and more commits are pushed to the branch, cancel the pending build and re-queue the latest commits
  • let travis show builds as "complete" if osx takes a long time, but still show osx failures eventually

Child Tickets

TicketTypeStatusOwnerSummary
#26497defectclosedset CARGO_HOME correctly or stop using it
#26952enhancementclosedteorTry enabling ccache on Travis
#26972enhancementclosedteorCreate make target to ensure that all Rust files have been formatted with rustfmt
#27087defectclosedteorRun a single asciidoc build in Travis
#27088defectclosedteorPass MODULES_OPTIONS in DISTCHECK_CONFIGURE_FLAGS
#27090defectclosedteorTravis: enable lzma and zstd in configure, if available
#27091defectclosedConfigure jenkins with CARGO_HOME in a writeable directory
#27093defectclosedteorConsistently use ${abs_top_srcdir:-../../..} in test_rust.sh

Change History (38)

comment:1 Changed 9 months ago by Hello71

this can be configured by:

comment:2 Changed 9 months ago by teor

I wish auto-cancellation was available as a default in the travis config file.

Also, I am not sure we want to make osx "allow_failures" just so we can set "fast_finish" on it.

Are there any other options?

comment:3 in reply to:  2 Changed 9 months ago by teor

Status: assignedneeds_information
Type: defectenhancement

Replying to teor:

I wish auto-cancellation was available as a default in the travis config file.

I just checked my travis settings, and auto-cancellation is on by default.
I don't think we need to do anything here.

Also, I am not sure we want to make osx "allow_failures" just so we can set "fast_finish" on it.

The comments in travis.yml don't seem to match the documentation:
https://docs.travis-ci.com/user/customizing-the-build/#Fast-Finishing

Are there any other options?

Apparently not.

See my branch travis-osx, building now at https://travis-ci.org/teor2345/tor

Edit: add link to documentation

Last edited 9 months ago by teor (previous) (diff)

comment:4 Changed 9 months ago by teor

Status: needs_informationneeds_revision

This rust failure shows that the fast_finish comment is wrong:
https://travis-ci.org/teor2345/tor/jobs/316721014

The other builds continued despite the failure.

comment:5 Changed 9 months ago by Hello71

we should try to minimize the number of macOS jobs that run, both to be a better netizen and pragmatically because they are limited to two at a time, so if you have a bunch they will not run in a good order.

for example, I think it is probably unnecessary to test three ways of compiling rust with both gcc *and* clang.

comment:6 Changed 9 months ago by teor

I think we could do rust / gcc and plain c / clang, and call it done.

comment:7 Changed 9 months ago by Hello71

I think we should do:

  • Linux:
    • gcc, rust online
    • gcc, rust offline
    • gcc, no rust
    • clang, no rust
  • macOS:
    • gcc, rust online
    • clang, no rust

with a view to changing number five to offline, whenever that gets fixed. this will utilise all of the allowed parallel build slots (4/6 Linux, 2/6 macOS) and ensure the best immediate feedback. for better immediate feedback, we could also enable cargo cache (https://docs.travis-ci.com/user/caching/#Rust-Cargo-cache).

comment:8 Changed 9 months ago by teor

Ok, I've run out of time for this, there are commits for this issue and #24630 in my branch travis-osx-v2. Feel free to keep hacking on it!

comment:9 Changed 9 months ago by ahf

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

comment:10 Changed 8 months ago by teor

Keywords: teor-was-assigned added
Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final
Owner: teor deleted
Status: needs_revisionassigned

Unassigning myself from less important tickets, and moving them to 0.3.4

comment:11 Changed 6 months ago by nickm

Keywords: 034-triage-20180328 added

comment:12 Changed 6 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:13 Changed 6 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

comment:14 Changed 4 months ago by teor

Milestone: Tor: unspecifiedTor: 0.3.5.x-final

Travis has recently upgraded its macOS builders, so we should try enabling macOS again:
https://blog.travis-ci.com/2018-01-30-macos-update

This should only take an hour or two to rebase onto master and test.

Putting this in 0.3.5, because a lack of macOS CI is holding up tickets like #17949.

comment:15 Changed 4 months ago by nickm

Keywords: 034-backport added

comment:16 Changed 3 months ago by nickm

Keywords: 035-removed-20180711 added
Milestone: Tor: 0.3.5.x-finalTor: unspecified

These tickets are being triaged out of 0.3.5. The ones marked "035-roadmap-proposed" may return.

comment:17 Changed 2 months ago by dmr

Cc: dmr added
Status: assignednew

assigned without an owner seems like an invalid (or at least misleading) state.
Bumping to new again.

(Feel free to correct me if this is wrong.)

comment:18 Changed 2 months ago by teor

Keywords: fast-fix added
Milestone: Tor: unspecifiedTor: 0.3.5.x-final
Owner: set to teor
Status: newassigned

Assigning to myself, because working i386 macOS builds will be really helpful for testing PrivCount in Tor.

comment:19 Changed 2 months ago by teor

Status: assignedneeds_review

Please see my branch travis-osx-v7 on https://github.com/teor2345/tor.git

I rewrote and simplified the build matrix.

We have the following builds on all platforms (Linux, macOS) and compilers (gcc, clang):

  • no Rust (with hardening, which is the default)
  • Rust online (without hardening due to #25386/#26398)

We have the following builds on Linux gcc:

  • coverage
  • no hardening
  • distcheck
  • disable module dirauth
  • Rust offline (without hardening due to #25386/#26398)
  • distcheck with Rust online (without hardening due to #25386/#26398)
  • disable module dirauth with Rust online (without hardening due to #25386/#26398)

That gives us 15 builds rather than the previous 13, so CI should complete in a similar time. (I'm seeing 5 builders allocated at a time when building this branch.)

comment:20 Changed 2 months ago by teor

Status: needs_reviewneeds_revision

Oops, these builds are failing.

comment:21 in reply to:  19 Changed 2 months ago by teor

Status: needs_revisionneeds_review

i386 macOS builds were really complex, so I didn't set them up. I'm happy to proceed with rust building on x86_64 for now.

Please see my branch travis-osx-v8 on https://github.com/teor2345/tor.git

Travis CI is at:
https://travis-ci.org/teor2345/tor/builds/408372308

Replying to teor:

I rewrote and simplified the build matrix.

We have the following builds on all platforms (Linux, macOS) and compilers (gcc, clang):

  • no Rust (with hardening, which is the default)
  • Rust online (without hardening due to #25386/#26398)

We have the following builds on Linux gcc:

  • coverage
  • no hardening
  • distcheck
  • disable module dirauth
  • Rust offline (without hardening due to #25386/#26398)
  • distcheck with Rust online (without hardening due to #25386/#26398)
  • disable module dirauth with Rust online (without hardening due to #25386/#26398)

That gives us 15 builds rather than the previous 13, so CI should complete in a similar time. (I'm seeing 5 builders allocated at a time when building this branch.)

The previous commit on master ran for 29 minutes:
https://travis-ci.org/torproject/tor/builds/408312564

These new builds ran for 24 minutes.
So I think we're fine.

comment:22 in reply to:  19 Changed 2 months ago by Hello71

Replying to teor:

Please see my branch travis-osx-v7 on https://github.com/teor2345/tor.git

I rewrote and simplified the build matrix.

We have the following builds on all platforms (Linux, macOS) and compilers (gcc, clang):

  • no Rust (with hardening, which is the default)
  • Rust online (without hardening due to #25386/#26398)

We have the following builds on Linux gcc:

  • coverage
  • no hardening
  • distcheck
  • disable module dirauth
  • Rust offline (without hardening due to #25386/#26398)
  • distcheck with Rust online (without hardening due to #25386/#26398)
  • disable module dirauth with Rust online (without hardening due to #25386/#26398)

That gives us 15 builds rather than the previous 13, so CI should complete in a similar time.

overall, I like this. I'm not super convinced that we need all these miscellaneous builds, e.g. no hardening when we already have Rust without hardening, but I guess they could help diagnose some CI failures, and I think we're far from exceeding the total maximum jobs.

(I'm seeing 5 builders allocated at a time when building this branch.)

unfortunately, I can't find where I found this information, but I believe the maximum at the time that I wrote "4/6" and "2/6" was the lower of: 5 Linux jobs, 2 Mac jobs, 6 total jobs (at any given time). therefore, adding Mac jobs and decreasing Linux jobs is expected to lower the total time.

comment:23 Changed 2 months ago by teor

I fixed a comment typo and some redundant env initialisations.

I also added a commit that sorts the config into chronological order, and groups related items together.

Then I added extra logging:

  • rustup version for rust builds
  • config.log for failures

And caching (see #26952):

  • ccache
  • cargo cache for rust builds

Cached builds are about 30% faster.

Please see travis-osx-v9.
There were some fixups to earlier commits, so I force-pushed it.

comment:24 Changed 8 weeks ago by asn

Reviewer: catalyst

comment:25 Changed 7 weeks ago by catalyst

Status: needs_reviewneeds_revision

I like a lot of these changes. Simplifying stuff so we don't have to list every matrix row explicitly seems quite helpful. I think there are a few minor changes remaining.

I think $TRAVIS_BUILD_DIR/target isn't the right directory for caching Rust build output; it's probably more correct to use $TRAVIS_BUILD_DIR/src/rust/target, because that's where we point CARGO_TARGET_DIR in all our cargo invocations.

Also we should probably stop setting CARGO_HOME explicitly so more stuff ends up in $HOME/.cargo to cache. (see #26497; we could put the fix in that ticket, or maybe do it in this branch but make that ticket a child of this one)

comment:26 Changed 7 weeks ago by teor

Travis updated their default macOS image to macOS 10.13 / Xcode 9.4 last Tuesday 31 July:
https://blog.travis-ci.com/2018-07-19-xcode9-4-default-announce

So I restarted the build of travis-osx-v9, and it seems to be fine:
https://travis-ci.org/teor2345/tor/builds/408771608

comment:27 Changed 7 weeks ago by teor

Status: needs_revisionneeds_review

I have two branches working their way through Travis right now:

  • travis-osx-master
  • travis-osx-034

They include all the changes from #26497, #26952, and they merge hello71's changes from #26038.

I would like to backport earlier than 0.3.4, but it's a bit more complicated, so I'm going to take a break and let travis complete.

comment:28 Changed 7 weeks ago by teor

Status: needs_reviewneeds_revision

Ok, so this was quite challenging to backport, because there were a lot of little bugs. But I think it is worth it to have working CI.

Here are the branches I created today:

I'm going to leave Travis to run overnight, and hopefully it's all successful in the morning. I expect 0.3.3 to take a little while, and then 0.3.4 and 0.3.5 should be trivial.

comment:29 Changed 7 weeks ago by catalyst

Thanks! I'm commenting only on master, because I haven't looked at the others yet.

This mostly looks good.

mkdir in before_install should create $TRAVIS_BUILD_DIR/src/rust/target, not $TRAVIS_BUILD_DIR/target.

Please explicitly delete the CARGO_HOME settings from src/rust/tor_rust/include.am and src/test/test_rust.sh instead of merging the changes from #26038. There are other changes in #26038 that I think we don't want, like:

  • deleting the registry setting in [source.crates-io] in src/rust/.cargo/config.in; I'm not sure whether this breaks anything, but I'd rather play it safe and leave it there until we know why we should delete it.
  • warning if CARGO_HOME is set; I'm not sure the added complexity is worth it, especially given that the checks are added in at least four places.
  • explicit setting of target-dir in [build] in src/rust/.cargo/config.in; this seems redundant with the explicit setting of CARGO_TARGET_DIR in various cargo invocations.

We could use target-dir in the cargo config instead of explicitly setting CARGO_TARGET_DIR for each cargo invocation, but in that case we should make a comment that it assumes that cargo is always invoked from a specific directory (which it should be after this change set).

comment:30 in reply to:  29 Changed 7 weeks ago by Hello71

Replying to catalyst:

(I think these comments should go in #26038, but I'm responding to them here because you made them here.)

Please explicitly delete the CARGO_HOME settings from src/rust/tor_rust/include.am and src/test/test_rust.sh instead of merging the changes from #26038. There are other changes in #26038 that I think we don't want, like:

  • deleting the registry setting in [source.crates-io] in src/rust/.cargo/config.in; I'm not sure whether this breaks anything, but I'd rather play it safe and leave it there until we know why we should delete it.

I believe cargo vendor used to print this line a long time ago, and now it has been the default for some time, so they stopped printing it. I could be remembering wrong though.

  • warning if CARGO_HOME is set; I'm not sure the added complexity is worth it, especially given that the checks are added in at least four places.

This is not correct. My patch only checks if CARGO_HOME is set once. It is supposed to check separately if CARGO_HOME is a relative path at use time, on the basis that the user may have changed their CARGO_HOME since they used configure. However, it does this wrong, since it should actually error instead of just warn, and also there is more than one place that CARGO_HOME is used. I think I do agree that checking for relative path can be done a single time in configure with AC_ARG_VAR though.

  • explicit setting of target-dir in [build] in src/rust/.cargo/config.in; this seems redundant with the explicit setting of CARGO_TARGET_DIR in various cargo invocations.

We could use target-dir in the cargo config instead of explicitly setting CARGO_TARGET_DIR for each cargo invocation, but in that case we should make a comment that it assumes that cargo is always invoked from a specific directory (which it should be after this change set).

This is also not correct, although it is my fault for writing a poor comment. Paths in cargo configuration are interpreted relative to the parent of the .cargo directory, so as long as the correct cargo config is read, the target will always be set to the same place. Therefore, "./target" is equivalent to writing "@abs_top_builddir@/src/rust/target". Whether it is read or not, however, is unrelated to CARGO_TARGET_DIR, but in fact the CARGO_HOME. I assume however that if you can set CARGO_HOME correctly then you can also set the current directory correctly.

comment:31 Changed 7 weeks ago by teor

Let's deal with #26038 in a separate branch after these branches merge.

comment:32 in reply to:  29 ; Changed 7 weeks ago by teor

Status: needs_revisionneeds_review

Replying to catalyst:

Thanks! I'm commenting only on master, because I haven't looked at the others yet.

This mostly looks good.

mkdir in before_install should create $TRAVIS_BUILD_DIR/src/rust/target, not $TRAVIS_BUILD_DIR/target.

Done in my 0.3.2 and later branches.

Please explicitly delete the CARGO_HOME settings from src/rust/tor_rust/include.am and src/test/test_rust.sh instead of merging the changes from #26038. There are other changes in #26038 that I think we don't want...

Done in my 0.3.2 branch, and re-done in 0.3.3 because a file moved.

Replying to teor:

Ok, so this was quite challenging to backport, because there were a lot of little bugs. But I think it is worth it to have working CI.

(And it's good to have consistent CI as well.)

Here are the revised branches I created:

I don't know if we want to fix #27091 in jenkins or in tor, I've asked weasel on IRC. We shouldn't merge until we work out how to fix jenkins.

comment:33 Changed 7 weeks ago by teor

I added a missing asciidoc dependency to the macOS build and merged it forward.

New builds are under https://travis-ci.org/teor2345/tor/branches . I made some custom travis configs that run macOS asciidoc, their branches end in "-all". (We shouldn't merge the "-all" branches, because their travis configs run all the combinations.)

Here's the first branch where macOS asciidoc worked:

Weasel fixed #27091 for jenkins, but we need a long-term fix based on whatever rust does, or whatever debian does to package rust crates.

comment:34 Changed 6 weeks ago by catalyst

Status: needs_reviewneeds_information

Overall this looks very good! I think this is a lot of work and will make things more maintainable in the future.

I would like to know if it ends up being significantly slower to default to using offline builds for most --enable-rust builds instead of cached online builds. I think the more places we use --enable-cargo-online-mode, the more places we can have trouble with a transient network failure.

comment:35 Changed 5 weeks ago by catalyst

Status: needs_informationmerge_ready

It's probably best to merge this now, and then reduce the number of online subjobs as a separate ticket.

comment:36 in reply to:  32 Changed 5 weeks ago by teor

Please merge my branches from https://github.com/teor2345/tor.git

  • travis-osx-029 to maint-0.2.9
  • travis-osx-032 to maint-0.3.2
  • travis-osx-033 to maint-0.3.3
  • travis-osx-034 to maint-0.3.4
  • travis-osx-master to master

comment:37 Changed 5 weeks ago by nickm

Resolution: implemented
Status: merge_readyclosed

Okay. I've merged this; somebody please open a "reduce online subjobs" ticket at your convenience.

comment:38 Changed 5 weeks ago by teor

The follow-up tickets are:

  • #27252 in 0.3.5 - reduce online rust (and look for other redundant jobs)
  • #27162 in 0.3.6 - consider running Travis on beta, nightly, and tor's lowest supported rust
Note: See TracTickets for help on using tickets.