the VMs set up on the gnt-fsn cluster seem to have a broken ping.

ping is not suid root nor is the cap set on at least web-fsn-02 and loghost01:

weasel@loghost01:~$ /sbin/getcap /usr/bin/ping 

meaning ping does not work as an unprivileged user:

weasel@loghost01:~$ ping localhost
ping: socket: Operation not permitted

This could likely be fixed with re-installing iputils-ping or setting cap_net_raw+ep manually, but we should figure out why the instance-debbotstrap installed machines are broken this way.

comment:1 Changed 8 months ago by anarcat



I'm not sure how this happened on loghost, but it didn't happen on bacula-director-01, as created in #31786 (which has the detailed commandline used in the creation).

root@bacula-director-01:~# /sbin/getcap /usr/bin/ping 
/usr/bin/ping = cap_net_raw+ep
root@bacula-director-01:~# stat /usr/bin/ping
  Fichier : /usr/bin/ping
   Taille : 65272     	Blocs : 128        Blocs d'E/S : 4096   fichier
Périphérique : 801h/2049d	Inœud : 137498      Liens : 1
Accès : (0755/-rwxr-xr-x)  UID : (    0/    root)   GID : (    0/    root)
 Accès : 2019-10-03 20:31:00.957049432 +0000
Modif. : 2018-08-03 16:53:09.000000000 +0000
Changt : 2019-10-03 20:30:49.057153189 +0000
  Créé : -

On the other hand, gettor-01, created as part of #31785, *does* have the problem so I'm not sure what's going on here.

root@gettor-01:~# getcap /usr/bin/ping
root@gettor-01:~# stat /usr/bin/ping
  Fichier : /usr/bin/ping
   Taille : 65272     	Blocs : 128        Blocs d'E/S : 4096   fichier
Périphérique : 801h/2049d	Inœud : 393547      Liens : 1
Accès : (0755/-rwxr-xr-x)  UID : (    0/    root)   GID : (    0/    root)
 Accès : 2019-10-04 11:56:47.972263612 +0000
Modif. : 2018-08-03 16:53:09.000000000 +0000
Changt : 2019-10-02 17:16:26.600316666 +0000
  Créé : -

It seems to me we don't have a clear way of reproducing this. From here on, maybe we need VM creators to *explicitly* document exactly which steps were taken to create the box? Maybe there's a slight change in the way partitions are created or something...

comment:2 Changed 8 months ago by anarcat

I reinstalled iputils-ping on gettor-01 and confirms it fixes the problem. I have not done that on bacula-director-01, just to be clear.

comment:3 Changed 8 months ago by anarcat




i confirm the problem occurs when the filesystem is cached. filed the bug in Debian in and wrote a patch that fixes the problem, deployed on both nodes.

comment:4 Changed 8 months ago by anarcat



i tested this and it seems to work. next step is to have another person confirm that, to get this merged in the debian package, have an upload to fix it in unstable and then stable, and then deploy this debian package on the nodes.

in the meantime, we should at least add a note in the new-machine docs before this ticket is marked as resolved.

comment:5 Changed 8 months ago by anarcat

I audited all the machines accessible through Cumin, and found the following machines without proper caps set:


It's strange, because they also didn't have the getcap binary (from the libcap2-bin) package was also missing. So I ran this everywhere:

apt install libcap2-bin; getcap /bin/ping | grep -q . || apt install --reinstall iputils-ping; getcap /bin/ping

Particularly interesting are and because they were setup using the Hetzner cloud stuff, so our install procedure is broken there as well.

So, long story short, remaining todo is:

  1. document the workaround in the ganeti installer (done, in ganeti, will need to be reverted when the package is fixed)
  2. ship the new package in Debian, get it to stable (in progress: announced intention to NMU)
  3. fix the hetzner-cloud installer or at least add a node to check it
  4. check the other installers
comment:6 Changed 7 months ago by anarcat

filed bug 944538 in the debian bts to get stable updated...

comment:7 Changed 5 months ago by anarcat

Summary: ping on new VMsping fails as a regular user on new VMs

comment:8 Changed 4 months ago by anarcat

ping'd bts again, will upload the hotfix on our debian repo monday because we are missing the fix on new ganeti machines in the cluster.

comment:9 Changed 4 months ago by anarcat

uploaded the hotfix to our own debian repo so that changes get deployed on new boxes automatically. hot-fixed two new machines (tbb-nightlies-master and onionoo-backend-01) because they were deployed from fsn-node-03, which didn't have the fix then.

