Opened 4 months ago

Last modified 4 weeks ago

#29399 new task

Retire host and services for tordnsel and check

Reported by: ln5 Owned by: tpa
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: metrics-team, gaba Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Metrics team will re-implement the tordnsel and check services and have them deployment ready by end of March 2019. Once up on a new host, retire chiwui.tpo.

Child Tickets

Change History (4)

comment:1 Changed 4 weeks ago by weasel

chiwui is no longer able to successfully communicate with our backup infrastructure since it's running ancient Debian.

Karsten, irl, what's the status of your reimplementation?

comment:2 Changed 4 weeks ago by ln5

History, for the record:

  • Metrics team asked for and got an extension until mid April.
  • They couldn't get the new implementation done in time for that deadline and looked into porting TorDNSEL to a more recent haskell. That failed.
  • Vegas team meeting early April said that "letting it die is not a good option"

comment:3 Changed 4 weeks ago by anarcat

Okay, so what's the plan then?

Supporting chuiwi is going to get harder and harder. We can probably afford to do so a little longer, but things are going to progressively break as we go along. Mow it's backups: there *might* be a way to backport things to chiwui to make them work, but it will be a waste of time if we get this fixed otherwise later anyways. But other things might break in the future as well...

For now, I've "acknowledged" the backups warnings in Nagios for this host, which means we will not fix backups for this host in the short term. I assume this is okay-ish: the older backups (from april 23rd) are still there and from what I understand the contents on that host are not changing (it's the problem we're trying to solve!).

Could the problem be split in two? Maybe "check" can be upgraded and not the other? Or are the two services as critical and inter-dependent?

For the record, someone mentioned "Docker" as a solution here, and I somehow disagree: it would certainly shift the burden of maintaining the jessie box away from us (TPA) but we would *still* have to maintain *some* environment with the older Haskell, which is the problem we're trying to solve in the first place.

It would allow us to upgrade the box and resume backups, so it's a possible alternative in the mid term, but it just shifts the upgrade problem under a container veil. I'm worried it would make us just forget about it and create another liability.

comment:4 Changed 4 weeks ago by irl

Cc: metrics-team gaba added

I am working on the reimplementation, with completion expected before the end of LTS.

We are doing the reimplementation properly instead of rushing it, which will avoid us having to panic again later.

Gaba indicated that I should not give the reimplementation high priority for now, as we may seek funding for it, and should do the work once it is funded.

There is no critical data on the machine as far as I know, if it dies then I doubt we would be able to bring it back from backups anyway due to not knowing how it works, so lack of backups is not really an issue.

We can't split the code as it is currently as the parts communicate with each other via the filesystem.

Docker sounds like an awful idea for this case.

Note: See TracTickets for help on using tickets.