Opened 3 months ago

Closed 3 months ago

#33834 closed task (fixed)

nevii IP address change planned for Ganeti migration

Reported by: anarcat Owned by: anarcat
Priority: High Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Major Keywords: tpa-roadmap-march
Cc: Actual Points:
Parent ID: #33082 Points:
Reviewer: Sponsor:

Description

I'm migrating nevii, our primary DNS server, to the Ganeti cluster. this implies an IP address change, and therefore all sorts of shenanigans.

after inspection, the changes are fairly "minimal": glue records should not change as the primary DNS server is not publicly exposed. we will need to change all secondary servers, but most of those are in Puppet.

we did have to request extra address space from Hetzner, but this was done in ticket 2020032503025825.

Child Tickets

Change History (6)

comment:1 Changed 3 months ago by anarcat

Owner: changed from tpa to anarcat
Status: newaccepted

this is the current DNS landscape:

  • ns1.torproject.org, 38.229.72.12, fallax.torproject.org, cymru, in Puppet
  • ns2, 95.216.159.212, hetzner-hel1-02.torproject.org, hetzner cloud, in Puppet
  • ns3, 95.216.159.212, *also* hetzner-hel1-02.torproject.org.
  • ns4, 94.130.28.203, neriniflorum.torproject.org., kvm4@hetzner, in puppet
  • nsp.dnsnode.net, 194.58.198.32, DNSNODE, needs manual configuration

So I will, in theory, only need to update the latter nameserver by hand: all others should follow happily without a problem.

That is, of course, unless some other server hardcodes the IP address, but I couldn't find anything nasty in /etc so far so we should be good.

comment:2 Changed 3 months ago by anarcat

first test adoption tells me:

Mon Apr  6 19:24:35 2020  - INFO: Chose IP 49.12.57.130 from network gnt-fsn13-02

i also had a crash in renumber-instances because the new network (gnt-fsn13-02) did not have an IPv6 configuration. i hacked around that problem by reassigning the same /64 than inside the gnt-fsn network, but that seems totally gross right now. i also improved error handling (in cd31b77) so that it would warn the user instead of just crashing on such situations in the future.

still, it worked as far as ganeti/debian is concerned and yielded a diff like this:

--- /mnt/etc/network/interfaces.bak	2020-04-06 19:30:55.884709093 +0000
+++ /mnt/etc/network/interfaces	2020-04-06 19:30:57.076699317 +0000
@@ -6,11 +6,11 @@
 iface lo inet loopback
 
 # The primary network interface
-allow-hotplug eth0
+auto eth0
 iface eth0 inet static
-    address 138.201.212.229/28
-    gateway 138.201.212.225
+    address 49.12.57.130/27
+    gateway 49.12.57.129
 iface eth0 inet6 static
     accept_ra 0
-    address 2a01:4f8:172:39ca:0:dad3:5:1/96
-    gateway 2a01:4f8:172:39ca:0:dad3:0:1
+    address 2a01:4f8:fff0:4f:266:37ff:febd:dd6/64
+    gateway 2a01:4f8:fff0:4f::1
copying /mnt/etc/hosts to /mnt/etc/hosts.bak on fsn-node-05.torproject.org
rewriting host file /mnt/etc/hosts on <Connection host=fsn-node-05.torproject.org user=root>
--- /mnt/etc/hosts.bak	2020-04-06 19:31:00.600670406 +0000
+++ /mnt/etc/hosts	2020-04-06 19:31:02.688653278 +0000
@@ -3,7 +3,7 @@
 ##
 
 127.0.0.1       localhost
-138.201.212.229        nevii.torproject.org nevii
+49.12.57.130 nevii.torproject.org nevii
 
 # The following lines are desirable for IPv6 capable hosts
 ::1     localhost ip6-localhost ip6-loopback
@@ -12,3 +12,4 @@
 ff02::1 ip6-allnodes
 ff02::2 ip6-allrouters
 ff02::3 ip6-allhosts
+2a01:4f8:fff0:4f:266:37ff:febd:dd6 nevii.torproject.org nevii

ipv6 networking seems to work, so i think we're all clear for a live migration next.

comment:3 Changed 3 months ago by anarcat

lowered TTL, step 7.

comment:4 Changed 3 months ago by anarcat

final migration and renumbering (step 8) done:

--- /mnt/etc/network/interfaces.bak	2020-04-08 20:00:41.040845499 +0000
+++ /mnt/etc/network/interfaces	2020-04-08 20:00:42.292835366 +0000
@@ -1,16 +1,20 @@
 # This file describes the network interfaces available on your system
 # and how to activate them. For more information, see interfaces(5).
 
+source /etc/network/interfaces.d/*
+
 # The loopback network interface
 auto lo
 iface lo inet loopback
 
 # The primary network interface
-allow-hotplug eth0
+auto eth0
 iface eth0 inet static
-    address 138.201.212.229/28
-    gateway 138.201.212.225
+    address 49.12.57.130/27
+    gateway 49.12.57.129
+
+# IPv6 configuration
 iface eth0 inet6 static
     accept_ra 0
-    address 2a01:4f8:172:39ca:0:dad3:5:1/96
-    gateway 2a01:4f8:172:39ca:0:dad3:0:1
+    address 2a01:4f8:fff0:4f:266:37ff:fee9:5df8/64
+    gateway 2a01:4f8:fff0:4f::1
copying /mnt/etc/hosts to /mnt/etc/hosts.bak on fsn-node-05.torproject.org
rewriting host file /mnt/etc/hosts on <Connection host=fsn-node-05.torproject.org user=root>
--- /mnt/etc/hosts.bak	2020-04-08 20:00:45.304810995 +0000
+++ /mnt/etc/hosts	2020-04-08 20:00:47.636792120 +0000
@@ -3,7 +3,7 @@
 ##
 
 127.0.0.1       localhost
-138.201.212.229        nevii.torproject.org nevii
+49.12.57.130 nevii.torproject.org nevii
 
 # The following lines are desirable for IPv6 capable hosts
 ::1     localhost ip6-localhost ip6-loopback
@@ -12,3 +12,4 @@
 ff02::1 ip6-allnodes
 ff02::2 ip6-allrouters
 ff02::3 ip6-allhosts
+2a01:4f8:fff0:4f:266:37ff:fee9:5df8 nevii.torproject.org nevii

drbd in progress.

comment:5 Changed 3 months ago by anarcat

Status: acceptedneeds_review

step 10 done.

step 11, final renumbering:

  1. changed IP in LDAP, DNS TTL kept low in case of problems.
  2. changed in puppet (in the secondaries zonefile), ran puppet on puppet
  3. not present in DNS (!), will be changed in puppet for our secondaries, changed on the nsnode DNS server
  4. changed in Nagios
  5. reverse DNS added in hetzner
  6. no traces left in /etc/ on host, present in nodes because ud-replicate hadn't ran, fixed
  7. ran puppet everywhere
  8. grepped for the old IPs in all of /etc everywhere, found an hardcoded from on pauli that didn't come from puppet, fixed by hand.

i'm going to do a few more tests (mostly creating a new entry and check if SOAs follow everywhere, along with let's encrypt magic) and this can be considered done.

comment:6 Changed 3 months ago by anarcat

Resolution: fixed
Status: needs_reviewclosed

DNS tests worked: i created a random record (railingarmlessremnant.torproject.org) in DNS and let's encrypt and it was properly propagated to DNS and certified by let's encrypt *and* propagated to puppet.

so:

step 12 done: scheduled removal on macrum.

also sending a notice to TPA.

we're all done here. whoohoo!

Note: See TracTickets for help on using tickets.