Opened 6 months ago

Last modified 5 months ago

#30759 new task

Create (or edit) the wiki page for the CI role

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: process roles
Cc: gaba Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Here is part of an email I sent about the CI role.

I want to turn it into a proposed process, by editing the CI wiki page.

One of the failure modes of this role is that the CI people end up fixing
a lot of failing tests.

But it's best practice for the original developer to fix the tests that they
wrote: it's more efficient, and people learn from their mistakes.

So I'd like to restrict the scope of this role to "make CI pass, quickly".

Usually that means:

  • reverting a failing commit,
  • marking a failing job as "allow failures", or
  • skipping a failing test.

And then logging a bug for a longer-term fix.

We need a separate process to make sure longer-term fixes happen.

We typically have 3 categories of CI bugs:

  • consistent failures from a recent commit,
  • intermittent failures, which can be from old commits,
  • environmental failures from CI infrastructure changes.

We can assign recent failures to the person who wrote the code.
(Or a paid staff member, if that person is an occasional volunteer.)

I think the CI people should assign the other two categories of
bugs evenly across the team. It's too much for one or two people
to fix all the CI bugs.

If we use this scope, the CI role is similar to the review assigner,
backport decider/merger, and bug triage roles. It's not our job
to fix the bugs, just to triage them, and get CI into a usable state.

Child Tickets

Change History (4)

comment:1 Changed 6 months ago by gaba

Cc: gaba added
Keywords: process roles added

It would makes sense for people to comment in this ticket so there is progress on the discussion but we can do a finally decision in the monthly retrospective.

comment:2 Changed 5 months ago by nickm

This seems good to me. Comments and clarifications:

I'd like to suggest that if the failure can be fixed within some number of hours, it doesn't require any of the workarounds (reverting, allow-failures, skipping) to actually happen.

We should specify that every revert should come with a re-opened ticket. Every skip or allow-failures should come with a ticket.

We should specify a keyword for all the tickets that represent tests that are disabled for CI. We should have open tickets with this keyword tracked on the CI page.

comment:3 in reply to:  2 Changed 5 months ago by teor

Replying to nickm:

This seems good to me. Comments and clarifications:

I'd like to suggest that if the failure can be fixed within some number of hours, it doesn't require any of the workarounds (reverting, allow-failures, skipping) to actually happen.

I would suggest "one working day", or "24 hours". Anything more than that, and we're not treating CI failures as urgent.

We should specify that every revert should come with a re-opened ticket. Every skip or allow-failures should come with a ticket.

We should specify a keyword for all the tickets that represent tests that are disabled for CI. We should have open tickets with this keyword tracked on the CI page.

comment:4 Changed 5 months ago by nickm

Milestone: Tor: unspecified
Note: See TracTickets for help on using tickets.