wiki:org/teams/SysadminTeam

About us

The sysadmin team is responsible for managing machines under the torproject.org domain. It does not operate the Tor network in any form nor is it responsible for all services running on torproject.org: that is the job of the various service admins responsible of those services.

Most of the documentation of the sysadmin team is in a a different wiki for now.

Roadmap

This page documents a possible roadmap for the TPA team for the year 2020.

Items should be SMART, that is:

  • specific
  • measurable
  • achievable
  • relevant
  • time-bound

Main objectives (need to have):

  • decommissining of old machines (moly in particular)
  • move critical services in ganeti
  • buster upgrades before LTS
  • within budget

Secondary objectives (nice to have):

  • new mail service
  • conversion of the kvm* fleet to ganeti for higher reliability and availability
  • buster upgrade completion before anarcat vacation

Non-objective:

  • service admin roadmapping?
  • kubernetes cluster deployment?

Assertions:

  • new gnt-fsn nodes with current hardware (PX62-NVMe, 118EUR/mth), cost savings possible with the AX line (-20EUR/mth) or by reducing disk space requirements (-39EUR/mth) per node
  • cymru actually delivers hardware and is used for moly decom
  • gitlab hardware requirements covered by another budget
  • we absorb the extra bandwidth costs from the new hardware design (currently 38EUR per month but could rise when new bandwidth usage comes in) - could be shifted to TBB team or at least labeled as such

TODO

  • nextcloud roadmap
  • identify critical services and realistic improvements #31243 (done)
  • (anarcat & gaba) sort out each month by priority (mostly done for feb/march)
  • (gaba) add keywords #tpa-roadmap- for each month (doing for february and march to test how this would work) (done)
  • (anarcat) create missing tickets for february/march (partially done, missing some from hiro)
  • (at tpa meeting) estimate tickets! (1pt = 1 day)
  • (gaba) reorganize budget file per month
  • (gaba) create a roadmap for gitlab migration
  • (gaba) find service admins for gitlab (nobody for trac in services page) - gaba to talk with isa and alex and look for service admins (sent a mail to las vegas but nobody replied... I will talk with each team lead)
    • have a shell account in the server
    • restart/stop service
    • upgrade services
    • problems with the service

January

  • [x] catchup after holidays
  • [x] agree internally on a roadmap for 2020
  • [x] first phase of installer automation (setup-storage and friends) #31239
  • [x] new FSN node in the Ganeti cluster (fsn-node-03) #32937
  • [x] textile shutdown and VM relocation, 2 VMs to migrate #31686 (+86EUR)
  • [x] enable needrestart fleet-wide (#31957)
  • [x] review website build errors (#32996)
  • [x] evaluate if discourse can be used as comments platform for the blog (#33105) <-- can we move this further down the road (not february) until gitlab is migrated? -->
  • [x] communicate buster upgrade timeline to service admins DONE
  • [x] buster upgrade 63% done: 48 buster, 28 stretch machines

February

capacity around 15 days (counting 2.5 days per week for anarcat and 5 days per month for hiro)

  • 2020 roadmap officially adopted - done
  • second phase of installer automation #31239 (esp. puppet automation, e.g. #32901, #32914) - done
  • new gnt-fsn node (fsn-node-04) -118EUR=+40EUR (#33081) - done
  • storm shutdown #32390 - done
  • unifolium decom (after storm), 5 VMs to migrate, #33085 +72EUR=+158EUR - not completed
  • buster upgrade 70% done: 53 buster (+5), 23 stretch (-5) - done: 54 buster (+6), 22 stretch (-6), 1 jessie
  • migrate gitlab-01 to a new VM (gitlab-02) and use the omnibus package instead of ansible (#32949) - done
  • migrate CRM machines to gnt and test with Giant Rabbit #32198 (priority) - not done
  • automate upgrades: enable unattended-upgrades fleet-wide (#31957 ) - not done
  • anti-censorship monitoring (external prometheus setup assistance) #31159 - not done

Owner: anarcat (9 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#33081 new gnt-fsn node (fsn-node-04) closed 2 High Normal 3 months ago
#33141 migrate sysadmin roadmap in trac wiki closed High Minor 4 months ago
#32283 fix up /etc/aliases with puppet closed 0.5 0.1 Medium Normal 3 months ago
#32390 decomission storm / bracteata on February 11, 2020 closed Medium Normal 3 months ago
#32914 review the puppet bootstrapping process closed 1 Medium Minor 3 months ago
#33143 ferm: convert BASE_SSH_ALLOWED rules into puppet exported rules closed Medium Normal 4 months ago
#33441 decomission savii closed Medium Normal 3 months ago
#33442 decomission build-x86-07 closed Medium Normal 3 months ago
#33277 adopt puppetlabs apt module closed 1 0.5 Low Major 3 months ago

Owner: hiro (2 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#32949 Migrate dip from gitlab-01 to gitlab-02 closed 5 High Normal 3 months ago
#32198 upgrade CRM* machines to buster closed Medium Normal 6 weeks ago

March

capacity around 15 days (counting 2.5 days per week for anarcat and 5 days per month for hiro)

High possibility of overload here (two major decoms and many machines setup). Possible to push moly/cymru work to april?

  • 2021 budget proposal?
  • possible gnt-cymru cluster setup (~6 machines) #29397
  • moly decom #29974, 5 VMs to migrate
  • kvm3 decom, 7 VMs to migrate (inc. crm-int and crm-ext), #33082 +72EUR=+112EUR
  • new gnt-fsn node (fsn-node-05) #33083 -118EUR=-6EUR
  • eugeni VM migration to gnt-fsn #32803
  • buster upgrade 80% done: 61 buster (+8), 15 stretch (-8)
  • solr deployment (#33106)
  • anti-censorship monitorining (external prometheus setup assistance) #31159
  • nc.riseup.net cleanup #32391
  • SVN shutdown? #17202

Owner: anarcat (10 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#33085 decomission unifolium/kvm2, 6 VMs to migrate closed 20 High Normal 6 weeks ago
#33446 migrate cupani/git-rw to the ganeti cluster, triggering an IP address change closed High Major 2 months ago
#33447 migrate omeiense to the ganeti cluster, triggering an IP change closed High Major 2 months ago
#33587 puppet certificate revocation anomaly closed 0.2 High Major 3 months ago
#33729 forestii IP address change planned for Ganeti migration closed High Major 6 weeks ago
#33730 vineale IP address change planned for Ganeti migration closed High Major 2 months ago
#33731 troodi IP address change planned for Ganeti migration closed High Major 6 weeks ago
#33834 nevii IP address change planned for Ganeti migration closed High Major 7 weeks ago
#33083 new gnt-fsn node (fsn-node-05) closed Medium Normal 2 months ago
#33448 Migrate IP address of polyanthum.torproject.org (BridgeDB) closed Medium Normal 3 months ago

Owner: hiro (1 match)

Ticket Summary Status Points Actual Points Priority Severity Modified
#32198 upgrade CRM* machines to buster closed Medium Normal 6 weeks ago

Owner: micah (1 match)

Ticket Summary Status Points Actual Points Priority Severity Modified
#32391 Purge test accounts and data from riseup in February 4, 2020 closed Medium Normal 3 months ago

April

  • kvm4 decom, 9 VMs to migrate #32802 (w/o eugeni), +121EUR=+115EUR
  • new gnt-fsn node (fsn-node-06) -118EUR=-3EUR
  • buster upgrade 90% done: 68 buster (+7), 8 stretch (-7)
  • solr configuration

Owner: anarcat (20 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#34098 crm-int-01 running out of disk space closed Very High Major 2 weeks ago
#33776 Please change email forwarding for jon@ closed High Normal 2 weeks ago
#33869 update spreadsheet after migrations closed High Trivial 2 weeks ago
#29399 Retire host and services for tordnsel and check (chiwui) closed Medium Normal 2 weeks ago
#32275 Close down tor-reports list, or at least remove the postfix lines closed Medium Normal 2 weeks ago
#33823 Can we block phishing emails when forwarding emails? closed Medium Normal 2 weeks ago
#33829 Please refresh my key certifications closed Medium Normal 2 weeks ago
#33940 Please remove irl from tordeb LDAP group (and any other deb.tpo related group) closed Medium Normal 2 weeks ago
#33951 mtail floods its logs with garbage closed Medium Normal 2 weeks ago
#33967 Add phw to tordnsel and check groups closed Medium Normal 2 weeks ago
#33970 Really remove irl from deb.tpo closed Medium Normal 2 weeks ago
#34016 Please add DNS entries for new OnionPerf hosts closed Medium Normal 2 weeks ago
#34019 Please remove irl from groups closed Medium Normal 2 weeks ago
#34020 Please remove the DNS entry for op-ab.onionperf.torproject.net closed Medium Normal 2 weeks ago
#34047 Allow sysrqb access to majus closed Medium Normal 2 weeks ago
#34056 Please remove irl from gitolite LDAP group closed Medium Normal 2 weeks ago
#34074 [RT-admin] Redirect email alias giving@ and newsletter@ to RT closed Medium Normal 2 weeks ago
#34114 Please give gk access to staticiforme and torwww groups closed Medium Normal 2 weeks ago
#16214 Upgrade Sphinx for Stem's site closed Low Minor 2 weeks ago
#33868 fabric (incorrectly) asumes User root ssh_config closed Low Major 2 weeks ago

Owner: hiro (2 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#33082 decomission kvm3 AKA macrum, 7 VMs to migrate closed Medium Normal 3 weeks ago
#33539 retire gitlab-01 closed Medium Normal 4 weeks ago

May

  • kvm5 decom, 9 VMs to migrate #33084, +111EUR=+108EUR
  • new gnt-fsn node (fsn-node-07) -118EUR=-10EUR
  • buster upgrade 100% done: 76 buster (+8), 0 stretch (-8)
  • current planned completion date of Buster upgrades
  • start ramping down work, training and documentation
  • solr text updates and maintenance

Owner: anarcat (9 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#32802 retire kvm4, 8 VMs to migrate merge_ready High Major 32 minutes ago
#31243 TPA-RFC-2: define how users get support, what's an emergency and what is supported assigned Medium Normal 2 weeks ago
#31244 long term prometheus metrics closed Medium Normal 7 days ago
#33084 decomission kvm5, 9 VMs to migrate new Medium Normal 2 months ago
#33387 establish tmpfs policy closed Medium Normal 45 hours ago
#33907 new gnt-fsn node (fsn-node-06) closed Medium Normal 2 weeks ago
#33911 oo-hetzner-03 retirement closed Medium Normal 2 days ago
#34304 new gnt-fsn node (fsn-node-07) accepted Medium Normal 2 days ago
#33406 automate reboots accepted Low Major 4 weeks ago

Owner: hiro (8 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#34185 ganeti clusters don't like automatic upgrades assigned High Major 23 hours ago
#31159 Monitor anti-censorship www services with prometheus needs_information 1 Medium Normal 2 weeks ago
#31957 automate upgrades reopened 0.5 Medium Normal 22 hours ago
#33922 hardware requirements planning for gitlab launch closed Medium Normal 22 hours ago
#33941 Nagios checks for op-??.onionperf.torproject.net closed Medium Normal 2 weeks ago
#34123 Provide secrets/passwords management for Tor Browser Nightly signing assigned Medium Normal 3 weeks ago
#33332 move root passwords to trocla? assigned Low Major 3 weeks ago
#33921 gitlab monitoring assigned Low Normal 2 weeks ago

Owner: weasel (7 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#32803 migrate eugeni to the gnt-fsn cluster closed High Major 2 days ago
#33908 migrate alberti to the ganeti cluster closed Medium Normal 2 days ago
#33909 meronense IP address change planned for Ganeti migration closed Medium Normal 2 days ago
#33910 migrate neriniflorum to the ganeti cluster closed Medium Normal 2 days ago
#33912 migrate pauli to the ganeti cluster closed Medium Normal 2 days ago
#33913 migrate rouyi to the ganeti cluster closed Medium Normal 9 days ago
#33914 migrate weissii to the ganeti cluster closed Medium Normal 2 days ago

June

  • Debian jessie LTS EOL, chiwui forcibly shutdown #29399
  • finish ramp-down, final bugfixing and training before vacation
  • search.tp.o soft launch

Owner: anarcat (2 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#29397 Make use of some donated hardware assigned Medium Normal 2 weeks ago
#29974 move critical services off, and then replace, moly assigned Medium Normal 2 weeks ago

Owner: gaba (2 matches)

Ticket Summary Status Points Actual Points Priority Severity Modified
#33537 audit SVN accesses assigned Very High Major 2 weeks ago
#17202 Shut down SVN and decomission the host (gayi) assigned Medium Normal 2 weeks ago

July

  • Debian stretch EOL, final deadline for buster upgrades
  • anarcat vacation
  • tor meeting?
  • hiro tentative vacations

Ticket Summary Status Points Actual Points Priority Severity Modified
No tickets found

August

  • anarcat vacation
  • web metrics R&D (investigate a platform for web metrics) (#32996)

Ticket Summary Status Points Actual Points Priority Severity Modified
No tickets found

September

  • plan contingencies for christmas holidays
  • catchup following vacation
  • web metrics deployment

Ticket Summary Status Points Actual Points Priority Severity Modified
No tickets found

October

  • puppet work (finish prometheus module development, puppet environments, trocla, Hiera, publish code #29387)
  • varnish to nginx conversion #32462
  • web metrics soft launch (in time for eoy campaign)
  • submit service R&D #30608

Owner: hiro (1 match)

Ticket Summary Status Points Actual Points Priority Severity Modified
#33115 Migrating the blog to a static web site with Lektor new 10 Medium Normal 2 weeks ago

Owner: tpa (1 match)

Ticket Summary Status Points Actual Points Priority Severity Modified
#33588 migrate to puppetserver and Puppet 6 before EOL new Low Major 3 months ago

November

  • first submit service prototype? #30608

Owner: anarcat (1 match)

Ticket Summary Status Points Actual Points Priority Severity Modified
#31239 automate installs assigned Low Normal 2 weeks ago

Owner: tpa (1 match)

Ticket Summary Status Points Actual Points Priority Severity Modified
#33276 decomission listera new 1 High Major 2 weeks ago

December

  • stabilisation & bugfixing
  • 2021 roadmapping
  • one or two week xmas holiday
  • CCC?

Ticket Summary Status Points Actual Points Priority Severity Modified
No tickets found

2021 preview

Objectives:

  • complete puppetization
  • experiment with containers/kubernetes?
  • close and merge more services
  • replace nagios with prometheus? #29864
  • new hire?

Monhtly goals:

  • january: roadmap approval
  • march/april: anarcat vacation
Last modified 3 months ago Last modified on Mar 10, 2020, 8:32:04 PM