Opened 12 months ago

Last modified 6 days ago

#29399 new task

Retire host and services for tordnsel and check (chiwui)

Reported by: ln5 Owned by: tpa
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: metrics-team, gaba Actual Points:
Parent ID: #31686 Points:
Reviewer: Sponsor:


Metrics team will re-implement the tordnsel and check services and have them deployment ready by end of March 2019. Once up on a new host, retire chiwui.tpo.

Child Tickets

#29650newmetrics-teamRewrite exit scanner to produce exit lists according to new formatMetrics/Exit Scanner
#32553closedanarcatGroup details for exit scannerInternal Services/Tor Sysadmin Team
#32999closedanarcatAdd irl to the "check" and "tordnsel" LDAP groupsInternal Services/Tor Sysadmin Team

Change History (14)

comment:1 Changed 9 months ago by weasel

chiwui is no longer able to successfully communicate with our backup infrastructure since it's running ancient Debian.

Karsten, irl, what's the status of your reimplementation?

comment:2 Changed 9 months ago by ln5

History, for the record:

  • Metrics team asked for and got an extension until mid April.
  • They couldn't get the new implementation done in time for that deadline and looked into porting TorDNSEL to a more recent haskell. That failed.
  • Vegas team meeting early April said that "letting it die is not a good option"

comment:3 Changed 9 months ago by anarcat

Okay, so what's the plan then?

Supporting chuiwi is going to get harder and harder. We can probably afford to do so a little longer, but things are going to progressively break as we go along. Mow it's backups: there *might* be a way to backport things to chiwui to make them work, but it will be a waste of time if we get this fixed otherwise later anyways. But other things might break in the future as well...

For now, I've "acknowledged" the backups warnings in Nagios for this host, which means we will not fix backups for this host in the short term. I assume this is okay-ish: the older backups (from april 23rd) are still there and from what I understand the contents on that host are not changing (it's the problem we're trying to solve!).

Could the problem be split in two? Maybe "check" can be upgraded and not the other? Or are the two services as critical and inter-dependent?

For the record, someone mentioned "Docker" as a solution here, and I somehow disagree: it would certainly shift the burden of maintaining the jessie box away from us (TPA) but we would *still* have to maintain *some* environment with the older Haskell, which is the problem we're trying to solve in the first place.

It would allow us to upgrade the box and resume backups, so it's a possible alternative in the mid term, but it just shifts the upgrade problem under a container veil. I'm worried it would make us just forget about it and create another liability.

comment:4 Changed 9 months ago by irl

Cc: metrics-team gaba added

I am working on the reimplementation, with completion expected before the end of LTS.

We are doing the reimplementation properly instead of rushing it, which will avoid us having to panic again later.

Gaba indicated that I should not give the reimplementation high priority for now, as we may seek funding for it, and should do the work once it is funded.

There is no critical data on the machine as far as I know, if it dies then I doubt we would be able to bring it back from backups anyway due to not knowing how it works, so lack of backups is not really an issue.

We can't split the code as it is currently as the parts communicate with each other via the filesystem.

Docker sounds like an awful idea for this case.

comment:5 Changed 6 months ago by gaba

To clarify: we have this in the metrics team roadmap. We will try to discuss a more concrete plan and ETA and let you know.

comment:6 Changed 5 months ago by karsten

This is in our current roadmap. We're going to start in October and expect to be done by end of December.

comment:7 Changed 5 months ago by anarcat

awesome karsten, thanks! should we create a separate ticket for that or assign this one to someone or something?

comment:8 Changed 5 months ago by karsten

We're tracking work related to retiring the host in #29650 and its children.

comment:9 Changed 5 months ago by anarcat

awwwweeeesssooooooome! :)

comment:10 Changed 4 months ago by anarcat

Parent ID: #31686

comment:11 Changed 4 months ago by anarcat

Summary: Retire host and services for tordnsel and checkRetire host and services for tordnsel and check (chiwui)

comment:12 Changed 6 weeks ago by irl

Summary as we come to the end of the year:

  • We have an exitmap based scanner that produces comparable results to the current exit scanner.
  • We can (untested) run a cron job to fetch the output of this scanner to power check.tpo.
  • We do not currently have a replacement for the DNSBL portion of the service, which will block this for now.
  • In the new year, one of the first things I'd like to do is deploy the new exit scanner software to a TPA host. I will file a new ticket to request that host seperately.

comment:13 Changed 6 weeks ago by anarcat

whoohoo! thanks for the updates!

We do not currently have a replacement for the DNSBL portion of the service, which will block this for now.

What's the plan for that part?

comment:14 Changed 6 days ago by anarcat

we have a hard deadline of june 2020 here, at which point this host *will* be shutdown, along with the services hosted on it.

Note: See TracTickets for help on using tickets.