Opened 5 months ago

Last modified 33 hours ago

#29280 needs_revision task

Use Chutney in Tor's CI

Reported by: cohosh Owned by: teor
Priority: High Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: CI, PTs, 029-backport, 035-backport, 040-backport, network-team-roadmap-2019-Q1Q2, reviewer-was-teor-20190422, tor-ci, 041-deferred-20190530
Cc: cohosh Actual Points: 1.5
Parent ID: #29267 Points: 1
Reviewer: nickm Sponsor: Sponsor19

Description

Get Chutney working for continuous integration. This will also help with Tor in general and so has broader benefits than just CI for pluggable transports.

Child Tickets

TicketStatusOwnerSummaryComponent
#28013closedcatalystrun test-network-all in Travis CICore Tor/Tor
#29703closedteorBackport test-network.sh fixes to 0.2.9Core Tor/Tor
#29748closednickmLet test-network.sh tolerate some number of failuresCore Tor/Chutney
#30212closedAdd extra chutney networks to Tor's CICore Tor/Tor
#30859closednickmSkip "make test" in Travis stem buildsCore Tor/Tor

Change History (32)

comment:1 Changed 4 months ago by nickm

Component: ObfuscationCore Tor/Tor
Milestone: Tor: 0.4.1.x-final

comment:2 Changed 4 months ago by nickm

Actual Points: 1
Keywords: 029-backport 034-backport 035-backport 040-backport added
Status: newneeds_review

Please see the branches below. They add a new make target, "make test-network-forgiving" that tries to run test-network.sh a few times, and succeeds if it succeeds once. They add a builder to do this on travis.

All of them passed on my first attempt when I pushed them to github myself, so let's see whether CI likes them a second time.

comment:3 Changed 4 months ago by nickm

Reviewer: teor

comment:4 Changed 4 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

comment:5 Changed 3 months ago by teor

Status: needs_reviewneeds_revision

I (accidentally) did a review on the 0.3.5 pull request:
https://github.com/torproject/tor/pull/736

In general, the code looks good. I have one design question, and one code question.

comment:6 Changed 3 months ago by teor

I am not sure if we need to fix #29689 as part of this ticket.

It seems like it might be annoying if someone uses tor/src/test/test-network.sh from an unusual path, or with an unusual directory structure.

comment:7 Changed 3 months ago by nickm

I think we can do #29689 separately, as it's logically independent of this?

comment:8 Changed 3 months ago by nickm

Status: needs_revisionneeds_review

Hi! Answered your questions on the tickets; please let me know which way you think I should proceed.

comment:9 in reply to:  7 Changed 3 months ago by teor

Replying to nickm:

I think we can do #29689 separately, as it's logically independent of this?

I think you're right, let's do it some other time.

I'll put it on my fast-fix backlog.

comment:10 in reply to:  8 ; Changed 3 months ago by teor

Status: needs_reviewneeds_revision

Replying to nickm:

Hi! Answered your questions on the tickets; please let me know which way you think I should proceed.

I think we should add new chutney-related code to the chutney repository: that's been our policy since 0.2.9. And in this particular case, we will need test-network-forgiving.sh to have useful CI for the chutney repository.

There's a longer explanation on the pull request.

I opened #29703 to backport some test-network.sh fixes from 0.3.0 to 0.2.9 so that we can reliably call chutney's test-network.sh from tor's test-network.sh.

It's also worth being aware of #29060: I think one of its commits may break tor's chutney CI on master.

comment:11 in reply to:  10 Changed 3 months ago by teor

Replying to teor:

I opened #29703 to backport some test-network.sh fixes from 0.3.0 to 0.2.9 so that we can reliably call chutney's test-network.sh from tor's test-network.sh.

Hmm, there's another option if you don't want to backport:

  1. put test-network-forgiving.sh in 0.2.9, and call it from the 0.2.9 Makefile
  2. delete test-network-forgiving.sh in 0.3.4 and later
  3. implement a test-network-forgiving mode in chutney's test-network.sh, and call it from 0.3.4 and later Makefiles

This seems like the worst of both worlds: we'll be making changes in 0.2.9 and chutney if there are any bugs. Historically, we're not good at keeping tor and chutney in sync.

It's also worth being aware of #29060: I think one of its commits may break tor's chutney CI on master.

I've asked for a revert on that ticket.

comment:12 Changed 3 months ago by teor

nickm, #29703 is ready for you to rebase onto.

comment:13 Changed 3 months ago by nickm

Status: needs_revisionneeds_review

I've made revised branches:

comment:14 Changed 3 months ago by teor

Summary: Use Chutney for CIUse Chutney in Tor's CI

comment:15 Changed 2 months ago by teor

I've just finished #29729, and discovered a lot of interesting things about chutney, tor, and Travis in the process.

I'll have some feedback on these pull requests tomorrow (~18 hours).

comment:16 Changed 2 months ago by teor

Points: 1
Status: needs_reviewneeds_revision

From a commit message:

Previously we had "make check" launched whenever DISTCHECK was
false. Now we'd like to turn it off in a few other circumstances,
like running chutney. Maybe stem too?

Yes, let's turn off "make check" for stem.

The default network in chutney is bridges+hs-v2, but I discovered in #29729 that it is really unstable in chutney's CI. (Sometimes all the jobs work, and sometimes multiple jobs fail. I think it might be due to network or CPU load issues on machines.)

So let's change to a subset of the CI tests that work in chutney?

So for 0.2.9, let's use:

  • bridges-min
  • hs-v2-min

And for 0.3.4 and later, let's add:

  • hs-v3-min

Each test should only take 30-60 seconds, so I think it's worth running these 3 tests. (That's about 2 extra minutes per branch, or an 18 extra minutes for a full merge-forward from 0.2.9.)

Here's the full list of all the tests in chutney from #29729:
https://github.com/torproject/chutney/pull/25/files#diff-354f30a63fb0907d4ad57269548329e3R29

We're missing:

  • basic-min
  • single-onion-v23
  • and all the IPv6 tests
    • We can only run IPv6 tests in macOS Travis, because Linux Travis doesn't support IPv6.

I opened #30212 to add them later.

Then let's let CI run for a while to evaluate the chutney tests.

If we think chutney is too unstable in Tor's Chutney job, or Chutney's CI, we could:

  • run chutney in offline mode (#30183), or
  • add a sleep after failure in test-network --forgiving (#30210).

comment:17 Changed 2 months ago by teor

Oops, I forgot to review 0.3.4 and later:

I am not so sure about this part of the merge commit to 0.3.4:

++    - env: DISTCHECK="yes" RUST_OPTIONS="--enable-rust --enable-cargo-online-mode" HARDENING_OPTIONS="" SKIP_MAKE_CHECK="yes"

I don't think we should skip make check on the only online rust build. We could move online to the dirauth build, and skip make check on distcheck?

Also, in future, let's try to do semantic changes like this in a separate commit?
https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/MergePolicy?action=diff&version=11

In 0.3.5, you dropped the SKIP_MAKE_CHECK you added in 0.3.4:

 -    ## But without hardening (see above)
 -    - env: DISTCHECK="yes" RUST_OPTIONS="--enable-rust --enable-cargo-online-mode" HARDENING_OPTIONS="" SKIP_MAKE_CHECK="yes"
 +    - env: DISTCHECK="yes" RUST_OPTIONS="--enable-rust --enable-cargo-online-mode"

We should also re-run all these branches with the latest chutney.

comment:18 Changed 2 months ago by teor

Keywords: reviewer-was-teor-20190422 added
Reviewer: teor

If these tickets go back in to needs_review, and I am on leave, they will need another reviewer.

comment:19 Changed 2 months ago by teor

I'd like to block merging the chutney job in Tor's CI (#29280) until I get back.
(This ticket is in needs_revision, so feel free to do revisions on it!)

I think that's the wisest course of action: I want to be around to fix any CI failures (or instability) after we merge.

comment:20 Changed 5 weeks ago by nickm

Keywords: tor-ci added

comment:21 Changed 3 weeks ago by nickm

Keywords: 041-deferred-20190530 added

Marking these tickets as deferred from 041.

comment:22 Changed 3 weeks ago by nickm

Milestone: Tor: 0.4.1.x-finalTor: 0.4.2.x-final

comment:23 Changed 10 days ago by teor

Hi nickm, would you like to do the revisions in comments 16 and 17, or would you like me to do them?

I'm going to merge #29729 soon, so we can merge #29024 and #30459.

comment:24 Changed 10 days ago by nickm

Sure, I'd be happy to make these changes.

comment:25 Changed 10 days ago by nickm

Summarizing for my benefit, here are the changes I need to make:

  • Make a new test-network-ci target that behaves like test-network-all, but only runs the selected networks for CI (bridges-min, hs-v2-min, and hs-v3-min where supported). This should happen after #30459 is merged, so that we can have chutney handle the skip logic.
  • Open a ticket to disable "make check" for stem. Do this after stem is allow_failure, to avoid conflects.
  • Re-do the merges to 0.3.5 and forward; drop plans to support 0.3.4. Make sure that I get the use of SKIP_MAKE_TEST right.

I think that I am currently blocked on the allow_failure change in stem and the merge of #30459.

comment:26 in reply to:  25 Changed 9 days ago by teor

Replying to nickm:

Summarizing for my benefit, here are the changes I need to make:

  • Make a new test-network-ci target that behaves like test-network-all, but only runs the selected networks for CI (bridges-min, hs-v2-min, and hs-v3-min where supported). This should happen after #30459 is merged, so that we can have chutney handle the skip logic.

Let's just use test-network-all?
All of the networks in test-network-all work fine in chutney's CI.

Let's add a Linux job, and a macOS job (for IPv6).
Once we're sure chutney is stable in Tor's CI, and we know how much time it takes, we can think about adding it to the end of one of the other jobs.
(Or we can remove or combine some of the existing jobs.)

  • Open a ticket to disable "make check" for stem. Do this after stem is allow_failure, to avoid conflects.
  • Re-do the merges to 0.3.5 and forward; drop plans to support 0.3.4. Make sure that I get the use of SKIP_MAKE_TEST right.

I think that I am currently blocked on the allow_failure change in stem and the merge of #30459.

I think they've both been merged, do you need stem allow_failures backported to 0.3.5?

comment:27 Changed 9 days ago by nickm

Keywords: 034-backport removed

Removing 034-backport from all open tickets: 034 has reached EOL.

comment:28 Changed 8 days ago by teor

I talked to nickm on IRC today, here's how we're going to move forward:

In this ticket:

  • start on 0.2.9
  • add a Linux job with CHUTNEY_ALLOW_FAILURES=2 make test-network-all
  • implement SKIP_MAKE_TEST for the chutney job
  • create merge branches from 0.2.9 to master
  • merge to master and see how it goes, then backport all the way

In #30859:

  • start on 0.3.5
  • SKIP_MAKE_TEST for stem jobs
  • create merge branches from 0.3.5 to master
  • merge to master and see how it goes, then backport all the way

In #30860:

  • start on 0.2.9
  • add a macOS job with CHUTNEY_ALLOW_FAILURES=2 make test-network-all
  • implement SKIP_MAKE_TEST for the chutney job
  • create branches from 0.2.9 to master
  • merge to master and see how it goes
  • if it's taking too much time, stick it on the end of one of the other macOS jobs instead
  • backport all the way

comment:29 Changed 8 days ago by nickm

Actual Points: 11.5
Status: needs_revisionneeds_review

I'll check this again in a little while to see whether CI is passing.

comment:30 Changed 6 days ago by teor

Reviewer: teor
Status: needs_reviewneeds_revision

The CI and config look good.
So do the merges.

0.3.5 and later is going to conflict with #30591, let's try to merge this change, then rebase #30591. (#29280 is more complex than #30591, which is a simple patch on 0.3.5 and a clean merge forward.)

I like that we are re-using "make test-network-all", there's no need to mess around with chutney networks vs tor versions.

We're almost there!

There's a few tweaks that we need to do for good diagnostics:

  • cat test_network_log on failure
  • show the python version in 0.2.9 and 0.3.5, like we do in 0.4.0 and later
    • if we cherry-pick c31346ffb4, the merges might be eaiser
    • but the comment about stem is wrong:
      • in 0.2.9, we don't run stem
      • in 0.3.5 and later, that line just shows the stem version, it doesn't actually run stem
  • show the chutney commit hash after we check it out

(We've needed all this information to diagnose stem and test failures in the past, so let's put it in now.)

There's one fix that I would do, but feel free to do whatever you think is easiest to maintain and read:

  • remove the redundant os: linux line (see my comment in the pull request)

Edit: fix ticket number

Last edited 6 days ago by teor (previous) (diff)

comment:31 Changed 33 hours ago by teor

Owner: set to teor
Reviewer: teornickm
Status: needs_revisionassigned

I'll do the remaining changes, and nickm will review.

comment:32 Changed 33 hours ago by teor

Status: assignedneeds_revision
Note: See TracTickets for help on using tickets.