Opened 8 months ago

Last modified 8 days ago

#31957 reopened project

automate upgrades

Reported by: anarcat Owned by: hiro
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords: tpa-roadmap-may
Cc: Actual Points:
Parent ID: Points: 0.5
Reviewer: Sponsor:

Description (last modified by anarcat)

upgrades take up a significant chunk of time every week and distract sysadmins (or at least me) from focusing on other projects.

upgrades should be therefore automated, as much as possible.

see also #31239 about auomated installs and this is part of the wider "ops card questionnaire", where we answered no to a question about this, see #30881.

checklist:

  • [x] install needrestart everywhere, in interactive mode
  • [x] switch needrestart to automatic mode
  • [x] install unattended-upgrades everywhere
  • [ ] fix major upgrades docs to disable unattended-upgrades during the upgrade run
  • [ ] automate reboots see #33406 instead

Child Tickets

Change History (27)

comment:1 Changed 8 months ago by anarcat

Parent ID: #30881

i setup needrestart everywhere, using a puppet module. it's currently in "interactive" mode, which means it will do nothing during automated upgrades and will prompt during manual ones. my hope is to use needrestart manually for a while to see if it works well and, when it does, deploy it automated everywhere.

i also eventually want to run unattended-upgrades everywhere.

between those two tools, we should get rid of 50-75% of the manual work involved here, the remaining being reboots. those could also be automated, if we find a way for the servers to coordinate among themselves.

comment:2 Changed 7 months ago by anarcat

Parent ID: #30881

remove from parent "ops report card" thing, as i want to close that ticket and it will be open forever if it depends on all the tickets generated from it.

Last edited 4 months ago by anarcat (previous) (diff)

comment:3 Changed 6 months ago by anarcat

Description: modified (diff)

link to the auto install and questionnaire bits

comment:4 Changed 4 months ago by anarcat

Description: modified (diff)
Owner: changed from anarcat to hiro

i discussed this with hiro as part of our 2020 roadmap work. she volunteered to followup on this.

i made a checklist in the ticket summary: the next step is to enable needrestart automatically everywhere, which we should look at doing soon. then we deploy unattended-upgrades everywhere, making sure we update the buster major upgrade docs to disable unattended-upgrades while we do major upgrades, on step 4.

so, TL;DR: next step is needrestart auto everywhere.

comment:5 Changed 4 months ago by gaba

Keywords: tpa-roadmap-february added

comment:6 Changed 4 months ago by anarcat

Owner: changed from hiro to anarcat
Points: 0.5

comment:7 Changed 4 months ago by anarcat

Description: modified (diff)

switched needrestart to automatic mode in puppet now and forcibly deployed everywhere.

comment:8 Changed 4 months ago by anarcat

Status: assignedneeds_review

hiro made a patch in the unattended-upgrades patch, which i need to review

comment:9 Changed 4 months ago by anarcat

Status: needs_reviewneeds_revision

quick patch review

  1. the first patch should be the raw import of the upstream repo, *without* the proposed patches... that way it's easier to see what we have done on top of the upstream repo... as things stand now it's unclear to me which patch exactly was used - the commit log points to https://github.com/voxpupuli/puppet-unattended_upgrades/issues/145 but that's an issue, not a patch... i'll go under the assertion that the patch merged is https://github.com/voxpupuli/puppet-unattended_upgrades/pull/148 instead
  1. please do provide a review of the upstream pull request. if you think it's good, just say so in the pull request so I can officially merge it upstream. (note that I *can* merge it upstream without your approval, but i just think it's more transparent that way, plus it gives you some public credits on github and introduces you to the folks paying attention in the org)
  1. i haven't audited the upstream module's source code and will assume you have done due dilligence here :)
  1. did you test the deployment somewhere? how do you plan to do the deployment? just dropping it in hiera/common.yaml is a rather... bold move, I would say... ;) i have written instructions on how to do a progressive deployment here: https://help.torproject.org/tsa/howto/puppet/#Progressive_deployment

note that the progressive deployments notes seem a bit dated now, these days I deploy classes as includes in a role instead of directly in hiera, because hiera includes classes in a non-deterministic way, which can be confusing sometimes. see the way profile::jumphost was progressively deployed for an example (commits 8c1d3087 c2439c7f dd3a1d7b c57b446c cdcc8576, etc)

let me know if I can help any further! :)

comment:10 Changed 4 months ago by anarcat

Owner: changed from anarcat to hiro
Status: needs_revisionassigned

comment:11 Changed 4 months ago by hiro

Hi,
I have pushed a new branch addressing all your comments: unattended-upgrades.
Regarding comment 3. I have audited the upstream code to the best of my knowledge.
I was in the process of updating and commenting on https://github.com/voxpupuli/puppet-unattended_upgrades/pull/148 but I see this has been already merged. Nice.

comment:12 Changed 4 months ago by hiro

Owner: changed from hiro to anarcat

comment:13 Changed 4 months ago by hiro

Status: assignedneeds_review

comment:14 Changed 4 months ago by anarcat

I was in the process of updating and commenting on ​https://github.com/voxpupuli/puppet-unattended_upgrades/pull/148 but I see this has been already merged. Nice.

oh i took that comment as your update:

https://github.com/voxpupuli/puppet-unattended_upgrades/pull/148#issuecomment-584213376

i merged right after I saw that... did i miss something?

going to review the new branch now, thanks!

comment:15 Changed 4 months ago by anarcat

Owner: changed from anarcat to hiro
Status: needs_reviewassigned

I have pushed a new branch addressing all your comments: unattended-upgrades.

it seems we now have three branches for this... i think it would have been preferable to force-push to the topic branch instead of creating new ones... please do cleanup the old ones to leave only the current one.

after you merge, do remove the good branch as well, of course. :)

now as for the review of the unattended-upgrades branch...

I don't think this is necessary:

+# a host that is monitored
+class roles::unattended_upgrades {
+  include profile::unattended_upgrades
+}

we don't need a role at all, we can include the profile in the relevant roles. for example, this:

--- a/hiera/nodes/chives.torproject.org.yaml
+++ b/hiera/nodes/chives.torproject.org.yaml
@@ -1,2 +1,3 @@
 classes:
   - roles::ircbox
+  - roles::unattended_upgrades

... could be turned into an include profile::unattended_upgrades inside the roles::ircbox.

that said, that's how the progressive deployment docs look right now, so I can't really blame you for following it. :)

anyways this looks good and I'd say go ahead with it. you are correctly including the functionality only in one node in that way, that's the important part to get right and it looks like you've done it. :)

(if you're curious about why i'm now hesitant in adding roles to hiera there: it's because those classes get added as prometheus labels which creates needless noise in the prometheus time series and confuses grafana...)

comment:16 Changed 4 months ago by anarcat

Status: assignedmerge_ready

comment:17 in reply to:  15 Changed 4 months ago by hiro

Replying to anarcat:

I have pushed a new branch addressing all your comments: unattended-upgrades.

it seems we now have three branches for this... i think it would have been preferable to force-push to the topic branch instead of creating new ones... please do cleanup the old ones to leave only the current one.

I thought it was easier to just reapply the patches cleanly. My plan was to delete the old branches after merging, but since you have mentioned I have now deleted the other branches.

after you merge, do remove the good branch as well, of course. :)

Sure

now as for the review of the unattended-upgrades branch...

I don't think this is necessary:

+# a host that is monitored
+class roles::unattended_upgrades {
+  include profile::unattended_upgrades
+}

we don't need a role at all, we can include the profile in the relevant roles. for example, this:

--- a/hiera/nodes/chives.torproject.org.yaml
+++ b/hiera/nodes/chives.torproject.org.yaml
@@ -1,2 +1,3 @@
 classes:
   - roles::ircbox
+  - roles::unattended_upgrades

... could be turned into an include profile::unattended_upgrades inside the roles::ircbox.

that said, that's how the progressive deployment docs look right now, so I can't really blame you for following it. :)

anyways this looks good and I'd say go ahead with it. you are correctly including the functionality only in one node in that way, that's the important part to get right and it looks like you've done it. :)

(if you're curious about why i'm now hesitant in adding roles to hiera there: it's because those classes get added as prometheus labels which creates needless noise in the prometheus time series and confuses grafana...)

Ok I'll try to merge this as see how it goes for chives.

comment:18 Changed 4 months ago by anarcat

I thought it was easier to just reapply the patches cleanly. My plan was to delete the old branches after merging, but since you have mentioned I have now deleted the other branches.

i think it's fine to overwrite topic branches like this during the review process. right now it means we lose the older version, but that's fine because if I have cloned the repo, that branch is still technically available locally (through the reflog). if we'd use gitlab for review, it would be even visible through the web UI as well...

also, i think you forgot one branch (auto-updates) :)

thanks for the cleanup!

Ok I'll try to merge this as see how it goes for chives.

excellent! don't forget about --noop ;) (now also patn)

comment:19 Changed 3 months ago by anarcat

before we close this ticket, we should consider automated reboots as well, either as a new ticket, or by documenting the procedure to automate reboots now.

at least the process is not clear to me at all right now: it's taking a long time and it's error-prone.

comment:20 Changed 3 months ago by anarcat

Description: modified (diff)

move automated reboots to another ticket, #33406

comment:21 Changed 3 months ago by hiro

Keywords: tpa-roadmap-march added

comment:22 Changed 2 months ago by anarcat

Description: modified (diff)

hiro deployed unattended upgrades everywhere last week, so this is almost done.

next step here is to fix the docs to mention that we *have* this running (and deprecate the tor-prepare-upgrades stuff) and fix the major upgrades docs, as documented in the summary.

we should also make sure that unattended-upgrades follows backports upgrade. i installed smartd from backports everywhere in #33684 (on new servers) and it would be important to have that work.

comment:23 Changed 4 weeks ago by anarcat

we need to add one more label here, because upgrades to packages in the tpo repo do not get automatically done:

root@scw-arm-par-01:~# dsa-check-packages 
WARNING: 1 updates, 551 ok
1 out of date packages: tor-nagios-checks
551 packages current.
|obs_loc=0;1;5;0 outdated=1;1;5;0 current=551;;;0 obs_ign=0;;;0 rm_unprg=0;;;0 hold=0;;;0 prg_conf=0;1;;0
root@scw-arm-par-01:~# apt-cache policy tor-nagios-checks
tor-nagios-checks:
  Installed: 31
  Candidate: 31
  Version table:
     31 500
        500 https://db.torproject.org/torproject-admin tpo-all/main armhf Packages
 *** 29 500
        500 https://db.torproject.org/torproject-admin tpo-all/main armhf Packages
        100 /var/lib/dpkg/status
     28 500
        500 https://db.torproject.org/torproject-admin tpo-all/main armhf Packages
     27 500
        500 https://db.torproject.org/torproject-admin tpo-all/main armhf Packages

that repository is:

 500 https://db.torproject.org/torproject-admin tpo-all/main armhf Packages
     release o=torproject-admin@torproject.org,a=stable,n=tpo-all,c=main,b=armhf
     origin db.torproject.org

comment:24 Changed 4 weeks ago by hiro

Resolution: fixed
Status: merge_readyclosed

comment:25 Changed 4 weeks ago by hiro

Keywords: tpa-roadmap-may added; tpa-roadmap-february tpa-roadmap-march removed
Resolution: fixed
Status: closedreopened

comment:26 Changed 3 weeks ago by anarcat

there was a problem with ganeti during the last buster point release upgrade. it could be with needrestart or unattended-upgrades, but it should be fixed. i documented what i know in #34185.

comment:27 Changed 8 days ago by hiro

I have started reviewing the documentation.

Note: See TracTickets for help on using tickets.