Opened 9 months ago

Closed 9 months ago

Last modified 9 months ago

#32998 closed task (fixed)

Upgrade metrics hosts to buster

Reported by: karsten Owned by: anarcat
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

It looks like upgrading meronense from stretch to buster would solve #30351. The reason is that there's a bug in one of the R libraries that we get from stretch-backports and that is fixed in a newer version.

Is there a reason why meronense is still on stretch? If not, can we try updating it to buster? Similarly, are there plans to upgrade the other metrics hosts from stretch to buster? Thanks!

Child Tickets

Change History (13)

comment:1 Changed 9 months ago by anarcat

Is there a reason why meronense is still on stretch?

No, just time I guess. I also like to checkin with the service admins to make sure they're okay with the upgrade beforehand. A ticket like this does wonders!

If not, can we try updating it to buster?

Yes! Absolutely!

Similarly, are there plans to upgrade the other metrics hosts from stretch to buster?

Our current timeline is to upgrade *all* TPO hosts to buster by the end of may, or, worst case, before the stretch EOL (summer 2020), but we're working on a TPA-wide roadmap that will confirm this.

comment:2 in reply to:  1 Changed 9 months ago by karsten

Replying to anarcat:

Is there a reason why meronense is still on stretch?

No, just time I guess. I also like to checkin with the service admins to make sure they're okay with the upgrade beforehand. A ticket like this does wonders!

If not, can we try updating it to buster?

Yes! Absolutely!

Cool! How about tomorrow? If so, I'd make some backups and turn off the daily updater. Ideally, the upgrade would happen between 8:00 and 12:00 UTC or between 13:00 UTC and 17:00 UTC.

Similarly, are there plans to upgrade the other metrics hosts from stretch to buster?

Our current timeline is to upgrade *all* TPO hosts to buster by the end of may, or, worst case, before the stretch EOL (summer 2020), but we're working on a TPA-wide roadmap that will confirm this.

Sounds good. Having a heads-up of one or two workdays should be sufficient in most cases.

Thanks!

comment:3 Changed 9 months ago by anarcat

Cool! How about tomorrow? If so, I'd make some backups and turn off the daily updater. Ideally, the upgrade would happen between 8:00 and 12:00 UTC or between 13:00 UTC and 17:00 UTC.

Tomorrow is a little rushed for me, but maybe I could squeeze that in. Could we coordinate on this tomorrow morning?

This ticket mentions "metrics hosts", but in the description you talk only of meronense... are there other boxes you were thinking of?

Sounds good. Having a heads-up of one or two workdays should be sufficient in most cases.

Duly noted! (in https://help.torproject.org/tsa/howto/upgrades/ :))

comment:4 in reply to:  3 Changed 9 months ago by karsten

Replying to anarcat:

Cool! How about tomorrow? If so, I'd make some backups and turn off the daily updater. Ideally, the upgrade would happen between 8:00 and 12:00 UTC or between 13:00 UTC and 17:00 UTC.

Tomorrow is a little rushed for me, but maybe I could squeeze that in. Could we coordinate on this tomorrow morning?

Sure. I'll be on IRC starting at around 8:00 UTC.

This ticket mentions "metrics hosts", but in the description you talk only of meronense... are there other boxes you were thinking of?

I was referring to the other hosts with services run by the metrics team: CollecTor on colchicifolium and corsicum, Onionoo on omeiense and oo-hetzner-03, and ExoneraTor on materculae. These are all unrelated to #30351, and we can do them one by one over the next weeks or months.

In any case, there is just one host for the Metrics website, and that's meronense. That's the one I'd like to get upgraded first.

Sounds good. Having a heads-up of one or two workdays should be sufficient in most cases.

Duly noted! (in https://help.torproject.org/tsa/howto/upgrades/ :))

Thanks! :)

comment:5 Changed 9 months ago by anarcat

Owner: changed from tpa to anarcat
Status: newassigned

ready to go on this when you are.

our checklist:

https://help.torproject.org/tsa/howto/upgrades/buster/

i'm at step 2 and i checked the backups. they are surprisingly old (full is from december 26th) but there's a recent diff (jan 21th) so we should be good on that step.

comment:6 Changed 9 months ago by anarcat

Status: assignedneeds_information

the upgrade is partially complete: the main operating system was upgraded, but the PostgreSQL still need to be migrated to the new version. i also wonder if I can remove the old JDK 8 packages. specifically, those packages should be examined:

root@meronense:~# apt-forktracer 
postgresql-client-9.6 (9.6.15-0+deb9u1)
tor-nagios-checks (28) [torproject-admin@torproject.org: 28 27]
openjdk-8-jdk-headless (8u232-b09-1~deb9u1)
postgresql-9.6 (9.6.15-0+deb9u1)
openjdk-8-jre-headless (8u232-b09-1~deb9u1)
userdir-ldap (0.3.93~20181104.1) [torproject-admin@torproject.org: 0.3.93~20181104.1 0.3.90~tpo.20170622 0.3.87~tpo.3 0.3.87~tpo.2 0.3.87~tpo.1 0.3.76pre.tpo.2]
linux-image-4.9.0-11-amd64 (4.9.189-3+deb9u2)

the userdir-ldap and tor-nagios-checks are normal, and I can take care of that kernel, but the OpenJDK-8 and PostgreSQL packages should be removed before this upgrade is completed.

do we have a go from you for that?

thanks!

comment:7 Changed 9 months ago by karsten

The Java upgrade wouldn‘t be problematic, but I‘d like to do a dry run of the PostgreSQL upgrade tonight. Can I tell you in ~3-4 hours?

comment:8 Changed 9 months ago by anarcat

The Java upgrade wouldn‘t be problematic, but I‘d like to do a dry run of the PostgreSQL upgrade tonight. Can I tell you in ~3-4 hours?

Sure, I should be around. I've removed the Java packages, let me know if something blew up.

comment:9 Changed 9 months ago by karsten

Ah, I just remembered that I'm using PostgreSQL 11 locally, which works just fine with metrics-web. So, yes, upgrading from PostgreSQL 8 to 11 should be good.

comment:10 Changed 9 months ago by anarcat

alright, so we had a bit of trouble with the upgrade because of a change in the pg_proc internal table. that table format changed, which broke a view used by the metrics team for internal test.

after a bit of wrangling, karsten was able to figure out those views were not in use in production and dropped them from the 9.6 cluster. then the upgrade went along smoothly.

i've made the cluster v11 online. the 9.6 cluster is still around in case we need it, and we have full backups of that cluster before the upgrade. i've also launched a new full/base backup of the new cluster right now, which should take about an hour to complete.

i documented the upgrade procedure here as well:

https://help.torproject.org/tsa/howto/upgrades/buster/#PostgreSQL

we'll wait ~12h for the new batch or results to come into the database and then, if the metrics team is happy with the results, we can remove the old cluster and (maybe after a delay?) the 9.6 backups.

comment:11 Changed 9 months ago by karsten

Everything looks good now. Feel free to proceed with removing the old cluster. Maybe keep the backups a few more days just in case. Thanks for your support yesterday!

comment:12 Changed 9 months ago by anarcat

Resolution: fixed
Status: needs_informationclosed

great! removed the old cluster and scheduled backup removals in one week from now:

root@bungei:~# echo 'rm -r /srv/backups/pg/meronense-9.6/' | at now + 7day
warning: commands will be executed using /bin/sh
job 17 at Wed Jan 29 19:18:00 2020

closing now. other metrics hosts will be done as part of the regular buster upgrade schedule and your team will receive a 2 day advance notice, as requested. :)

comment:13 Changed 9 months ago by anarcat

i opened a ticket for the upgrade of the other boxes, please provide feedback on it in #33111, thanks!

Note: See TracTickets for help on using tickets.