Opened 8 weeks ago

Last modified 8 weeks ago

#33785 assigned defect

cannot create new machines in ganeti cluster

Reported by: anarcat Owned by: anarcat
Priority: High Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Major Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

for some reason, I can't create new instances in the ganeti cluster:

root@fsn-node-01:~# gnt-instance add   -o debootstrap+buster   -t drbd --no-wait-for-sync   --disk 0:size=10G   --disk 1:size=2G,name=swap --backend-parameters memory=2g,vcpus=2   --net 0:ip=pool,network=gnt-fsn   --no-name-check   --no-ip-check test-01.torproject.org 

Failure: prerequisites not met for this operation:
error type: insufficient_resources, error details:
Can't compute nodes using iallocator 'hail': Request failed: Group default (preferred): No valid allocation solutions, failure reasons: FailMem: 8, FailN1: 12

The gnt-fsn network is getting full, but it had one spare IP when that command was run. I see the same behavior with gnt-fsn13-02, the new network created to cover the new IP allocation from hetzner which has plenty of room as well.

The nodes do have plenty of disk and memory space to respond to the demand:

root@fsn-node-01:~# gnt-node list
Node                       DTotal  DFree MTotal MNode MFree Pinst Sinst
fsn-node-01.torproject.org 893.1G 451.9G  62.8G 38.5G 23.7G     7    14
fsn-node-02.torproject.org 893.1G 561.9G  62.8G 22.8G 39.6G     6    15
fsn-node-03.torproject.org 893.6G 151.4G  62.8G 18.2G 43.6G     5    22
fsn-node-04.torproject.org 893.6G 450.2G  62.8G 24.0G 38.4G     6    12
fsn-node-05.torproject.org 893.6G 232.1G  62.8G  832M 60.8G     3     6

It's not clear to me why the allocator is failing.

Note that I've been *adopting* new instances without problems for the past few weeks, so this could be specifically about *creating* new disks.

Child Tickets

Change History (3)

comment:1 Changed 8 weeks ago by anarcat

note that allocating the instance to specific nodes works properly:

root@fsn-node-01:~# gnt-instance add   -o debootstrap+buster   -t drbd --no-wait-for-sync   --disk 0:size=10G   --disk 1:size=2G,name=swap --backend-parameters memory=2g,vcpus=2   --net 0:ip=pool,network=gnt-fsn   --no-name-check   --no-ip-check -n fsn-node-05.torproject.org:fsn-node-04.torproject.org test-01.torproject.org 
Wed Apr  1 20:11:54 2020  - INFO: NIC/0 inherits netparams ['br0', 'openvswitch', '4000']
Wed Apr  1 20:11:54 2020  - INFO: Chose IP 116.202.120.188 from network gnt-fsn
Wed Apr  1 20:11:55 2020 * creating instance disks...
Wed Apr  1 20:12:07 2020 adding instance test-01.torproject.org to cluster config
Wed Apr  1 20:12:07 2020 adding disks to cluster config
Wed Apr  1 20:12:07 2020 * checking mirrors status
Wed Apr  1 20:12:07 2020  - INFO: - device disk/0:  2.20% done, 3m 47s remaining (estimated)
Wed Apr  1 20:12:07 2020  - INFO: - device disk/1:  1.00% done, 2m 6s remaining (estimated)
Wed Apr  1 20:12:07 2020 * checking mirrors status
Wed Apr  1 20:12:08 2020  - INFO: - device disk/0:  2.40% done, 4m 16s remaining (estimated)
Wed Apr  1 20:12:08 2020  - INFO: - device disk/1:  1.80% done, 1m 8s remaining (estimated)
Wed Apr  1 20:12:08 2020 * pausing disk sync to install instance OS
Wed Apr  1 20:12:08 2020 * running the instance OS create scripts...

creating a solo (not-DRBD) instance on the new network also works fine:

root@fsn-node-01:~# gnt-instance add   -o debootstrap+buster   -t plain --no-wait-for-sync   --disk 0:size=10G   --disk 1:size=2G,name=swap --backend-parameters memory=2g,vcpus=2   --net 0:ip=pool,network=gnt-fsn13-02   --no-name-check   --no-ip-check -n fsn-node-05.torproject.org test-02.torproject.org 
Wed Apr  1 20:17:03 2020  - INFO: NIC/0 inherits netparams ['br0', 'openvswitch', '4000']
Wed Apr  1 20:17:03 2020  - INFO: Chose IP 49.12.57.130 from network gnt-fsn13-02
Wed Apr  1 20:17:04 2020 * disk 0, size 10.0G
Wed Apr  1 20:17:04 2020 * disk 1, size 2.0G
Wed Apr  1 20:17:04 2020 * creating instance disks...
Wed Apr  1 20:17:05 2020 adding instance test-02.torproject.org to cluster config
Wed Apr  1 20:17:05 2020 adding disks to cluster config
Wed Apr  1 20:17:05 2020 * running the instance OS create scripts...
Wed Apr  1 20:17:18 2020 * starting instance...

so this is strictly a problem related to the allocator.

It also seems that there are ways of debugging the allocator, as explained here:

https://github.com/ganeti/ganeti/wiki/Common-Issues#htools-debugging-hailhbal

most notably, it suggests using the hspace -L command which, in our case, gives us worrisome warnings:

root@fsn-node-01:~# hspace -L
Warning: cluster has inconsistent data:
  - node fsn-node-05.torproject.org is missing -3049 MB ram and 470 GB disk
  - node fsn-node-04.torproject.org is missing -5797 MB ram and 2 GB disk
  - node fsn-node-03.torproject.org is missing -14155 MB ram and 162 GB disk

The cluster has 5 nodes and the following resources:
  MEM 321400, DSK 4574256, CPU 60, VCPU 240.
There are 27 initial instances on the cluster.
Tiered (initial size) instance spec is:
  MEM 32768, DSK 1048576, CPU 8, using disk template 'drbd'.
Tiered allocation results:
  -   1 instances of spec MEM 19200, DSK 460800, CPU 8
  -   1 instances of spec MEM 19200, DSK 154880, CPU 8
  - most likely failure reason: FailDisk
  - initial cluster score: 7.92595903
  -   final cluster score: 7.26099873
  - memory usage efficiency: 50.50%
  -   disk usage efficiency: 85.56%
  -   vcpu usage efficiency: 57.08%
Standard (fixed-size) instance spec is:
  MEM 128, DSK 1024, CPU 1, using disk template 'drbd'.
Normal (fixed-size) allocation results:
  -  44 instances allocated
  - most likely failure reason: FailDisk
  - initial cluster score: 7.92595903
  -   final cluster score: 20.56542169
  - memory usage efficiency: 40.30%
  -   disk usage efficiency: 60.61%
  -   vcpu usage efficiency: 68.75%

i also tried creating a tracing allocator that shows its input, in /usr/lib/ganeti/iallocators/hail-trace:

#!/bin/sh

cp "$1" /tmp/allocator-input.json
/usr/lib/ganeti/iallocators/hail "$1"

then it can be used with the -I hail-trace parameter:

gnt-instance add   -o debootstrap+buster   -t drbd --no-wait-for-sync   --disk 0:size=10G   --disk 1:size=2G,name=swap --backend-parameters memory=2g,vcpus=2   --net 0:ip=pool,network=gnt-fsn   --no-name-check   --no-ip-check -I hail-trace test-01.torproject.org

that allows us to run the allocator by hand:

root@fsn-node-01:~# /usr/lib/ganeti/iallocators/hail --verbose /tmp/allocator-input.json  
Warning: cluster has inconsistent data:
  - node fsn-node-05.torproject.org is missing -3046 MB ram and 470 GB disk
  - node fsn-node-04.torproject.org is missing -5801 MB ram and 2 GB disk
  - node fsn-node-03.torproject.org is missing -14158 MB ram and 162 GB disk

Received request: Allocate (Instance {name = "test-01.torproject.org", alias = "test-01.torproject.org", mem = 2048, dsk = 12544, disks = [Disk {dskSize = 10240, dskSpindles = Nothing},Disk {dskSize = 2048, dskSpindles = Nothing}], vcpus = 2, runSt = Running, pNode = 0, sNode = 0, idx = -1, util = DynUtil {cpuWeight = 1.0, memWeight = 1.0, dskWeight = 1.0, netWeight = 1.0}, movable = True, autoBalance = True, diskTemplate = DTDrbd8, spindleUse = 1, allTags = [], exclTags = [], dsrdLocTags = fromList [], locationScore = 0, arPolicy = ArNotEnabled, nics = [Nic {mac = Just "00:66:37:8b:0a:ba", ip = Just "pool", mode = Nothing, link = Nothing, bridge = Nothing, network = Just "f96e8644-a473-43db-874b-99f90e20af7b"}], forthcoming = False}) (AllocDetails 2 Nothing) Nothing
{"success":false,"info":"Request failed: Group default (preferred): No valid allocation solutions, failure reasons: FailMem: 8, FailN1: 12","result":[]}

which, interestingly, gives us the same warning.

still not sure where that warning is coming from, but i can't help but wonder if the problem would go away after re-balancing the cluster.

comment:2 Changed 8 weeks ago by anarcat

Owner: changed from tpa to anarcat
Status: newassigned

it looks like gnt-cluster verify agrees with the allocator in that some nodes are not quite setup properly:

root@fsn-node-01:~# gnt-cluster verify
Submitted jobs 70114, 70115
Waiting for job 70114 ...
Wed Apr  1 20:43:00 2020 * Verifying cluster config
Wed Apr  1 20:43:00 2020 * Verifying cluster certificate files
Wed Apr  1 20:43:00 2020 * Verifying hypervisor parameters
Wed Apr  1 20:43:00 2020 * Verifying all nodes belong to an existing group
Waiting for job 70115 ...
Wed Apr  1 20:43:00 2020 * Verifying group 'default'
Wed Apr  1 20:43:00 2020 * Gathering data (5 nodes)
Wed Apr  1 20:43:00 2020 * Gathering information about nodes (5 nodes)
Wed Apr  1 20:43:03 2020 * Gathering disk information (5 nodes)
Wed Apr  1 20:43:03 2020 * Verifying configuration file consistency
Wed Apr  1 20:43:03 2020 * Verifying node status
Wed Apr  1 20:43:03 2020 * Verifying instance status
Wed Apr  1 20:43:03 2020 * Verifying orphan volumes
Wed Apr  1 20:43:03 2020   - WARNING: node fsn-node-05.torproject.org: volume vg_ganeti/troodi.torproject.org-root is unknown
Wed Apr  1 20:43:03 2020   - WARNING: node fsn-node-05.torproject.org: volume vg_ganeti/srv-tmp is unknown
Wed Apr  1 20:43:03 2020   - WARNING: node fsn-node-05.torproject.org: volume vg_ganeti/troodi.torproject.org-swap is unknown
Wed Apr  1 20:43:03 2020   - WARNING: node fsn-node-05.torproject.org: volume vg_ganeti/troodi.torproject.org-lvm is unknown
Wed Apr  1 20:43:03 2020   - WARNING: node fsn-node-03.torproject.org: volume vg_ganeti/srv-tmp is unknown
Wed Apr  1 20:43:03 2020 * Verifying N+1 Memory redundancy
Wed Apr  1 20:43:03 2020 * Other Notes
Wed Apr  1 20:43:03 2020   - NOTICE: 3 non-redundant instance(s) found.
Wed Apr  1 20:43:04 2020 * Hooks Results
root@fsn-node-01:~# 

the key part here being:

Wed Apr  1 20:43:03 2020   - NOTICE: 3 non-redundant instance(s) found.

It doesn't say which instances those are, but i suspect it is the three nodes hspace -L has identified.

The solution here might simply be to rebalance the cluster. I don't want to do this right now because it takes time and would throw a lot of machines on fsn-node-05, which i'm going to fill with boxes from macrum first.

But that might be solvable that way. That, and documenting this entire process for the next time I stumble upon it.

comment:3 Changed 8 weeks ago by anarcat

some feedback from a ganeti maintainer:

03:40:48 <apoikos> failure reasons: FailMem: 1, FailN1: 4
03:41:18 <apoikos> part indicates that there's no N+1 redundancy, probably due to not enough memory being available on the cluster to accommodate it
03:42:05 <apoikos> You can try a manual allocation, or passing flags like --ignore-soft-errors and --no-capacity-checks to hail
[...]
10:36:12 <apoikos> I doubt rebalancing will fix it
10:36:31 <apoikos> The thing is, the whole htools logic was built around Xen which does hard commit on memory
[...]
10:37:08 <apoikos> That's the -14GB of RAm you're seeing
10:37:11 <anarchat> so what you're saying is that i *am* effectively using too much memory
10:37:13 <anarchat> oh weird
10:37:24 <anarchat> like the memory use from /proc doesn't match what ganeti expects?
10:37:28 <apoikos> no, I'm saying you're using less memory than Ganeti thinks
10:37:34 <apoikos> exactly
10:37:41 <apoikos> because KVMs VSZ != RSS
[...]
10:38:17 <apoikos> Let's say it computes the worst-case scenario
10:38:46 <apoikos> And in the worst-case scenario, where each instance will indeed use all of its configured memory and KSM won't save you, you don't have N+1
10:39:08 <apoikos> As for the 162GB of disk, these are probably your root LVs, if they live on the same LVM VG as the Ganeti instance disks
10:39:39 <anarchat> well there's also a secondary VG (vg_ganeti_hdd) for spinning rust that we don't see in gnt-node-list
10:39:48 <anarchat> i wonder if that's related
10:39:52 <apoikos> nope
10:40:08 <apoikos> If your primary VG has anything else than Ganeti VMs on it, you'll see that message
10:40:20 <anarchat> darn
10:40:27 <anarchat> so i'd need to rebuild my nodes to fix this
10:40:34 <apoikos> the good news is, you can tell ganeti to ignore specific LVs using gnt-cluster modify --reserved-lvs
10:40:41 <anarchat> oh cool
10:41:19 <anarchat> so i'd ignore what... vg_ganeti/root and vg_ganeti/swap i guess
10:41:29 <apoikos> I guess
10:41:50 <apoikos> The option --reserved-lvs specifies a list (comma-separated) of logical volume group names (regular expressions) that will be ignored by the 
                   cluster verify operation
10:41:53 <anarchat> i alreayd have   lvm reserved volumes: vg_ganeti/root, vg_ganeti/swap
10:42:19 <anarchat> oh but maybe i have extra LVs on those nodes, that's true
10:43:47 <anarchat> on fsn-node-03 and fsn-node-05, but not fsn-node-04

they also noted upstream issue 1399 which is that the Sinst field is incorrect in gnt-node list.

Note: See TracTickets for help on using tickets.