Opened 8 weeks ago

Last modified 3 days ago

#31781 needs_review defect

ping on new VMs

Reported by: weasel Owned by: anarcat
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

the VMs set up on the gnt-fsn cluster seem to have a broken ping.

ping is not suid root nor is the cap set on at least web-fsn-02 and loghost01:

weasel@loghost01:~$ /sbin/getcap /usr/bin/ping 
weasel@loghost01:~$ 

meaning ping does not work as an unprivileged user:

weasel@loghost01:~$ ping localhost
ping: socket: Operation not permitted
e2:weasel@loghost01:~$ 

This could likely be fixed with re-installing iputils-ping or setting cap_net_raw+ep manually, but we should figure out why the instance-debbotstrap installed machines are broken this way.

Child Tickets

Change History (6)

comment:1 Changed 5 weeks ago by anarcat

Status: newneeds_information

I'm not sure how this happened on loghost, but it didn't happen on bacula-director-01, as created in #31786 (which has the detailed commandline used in the creation).

root@bacula-director-01:~# /sbin/getcap /usr/bin/ping 
/usr/bin/ping = cap_net_raw+ep
root@bacula-director-01:~# stat /usr/bin/ping
  Fichier : /usr/bin/ping
   Taille : 65272     	Blocs : 128        Blocs d'E/S : 4096   fichier
Périphérique : 801h/2049d	Inœud : 137498      Liens : 1
Accès : (0755/-rwxr-xr-x)  UID : (    0/    root)   GID : (    0/    root)
 Accès : 2019-10-03 20:31:00.957049432 +0000
Modif. : 2018-08-03 16:53:09.000000000 +0000
Changt : 2019-10-03 20:30:49.057153189 +0000
  Créé : -
root@bacula-director-01:~# 

On the other hand, gettor-01, created as part of #31785, *does* have the problem so I'm not sure what's going on here.

root@gettor-01:~# getcap /usr/bin/ping
root@gettor-01:~# stat /usr/bin/ping
  Fichier : /usr/bin/ping
   Taille : 65272     	Blocs : 128        Blocs d'E/S : 4096   fichier
Périphérique : 801h/2049d	Inœud : 393547      Liens : 1
Accès : (0755/-rwxr-xr-x)  UID : (    0/    root)   GID : (    0/    root)
 Accès : 2019-10-04 11:56:47.972263612 +0000
Modif. : 2018-08-03 16:53:09.000000000 +0000
Changt : 2019-10-02 17:16:26.600316666 +0000
  Créé : -
root@gettor-01:~# 

It seems to me we don't have a clear way of reproducing this. From here on, maybe we need VM creators to *explicitly* document exactly which steps were taken to create the box? Maybe there's a slight change in the way partitions are created or something...

comment:2 Changed 5 weeks ago by anarcat

I reinstalled iputils-ping on gettor-01 and confirms it fixes the problem. I have not done that on bacula-director-01, just to be clear.

comment:3 Changed 5 weeks ago by anarcat

Owner: changed from tpa to anarcat
Status: needs_informationassigned

i confirm the problem occurs when the filesystem is cached. filed the bug in Debian in https://bugs.debian.org/942114 and wrote a patch that fixes the problem, deployed on both nodes.

comment:4 Changed 5 weeks ago by anarcat

Status: assignedneeds_review

i tested this and it seems to work. next step is to have another person confirm that, to get this merged in the debian package, have an upload to fix it in unstable and then stable, and then deploy this debian package on the nodes.

in the meantime, we should at least add a note in the new-machine docs before this ticket is marked as resolved.

comment:5 Changed 5 weeks ago by anarcat

I audited all the machines accessible through Cumin, and found the following machines without proper caps set:

  • crm-ext-01.torproject.org
  • crm-int-01.torproject.org
  • gitlab-01.torproject.org
  • hetzner-hel1-01.torproject.org
  • hetzner-nbg1-01.torproject.org
  • oo-hetzner-03.torproject.org

It's strange, because they also didn't have the getcap binary (from the libcap2-bin) package was also missing. So I ran this everywhere:

apt install libcap2-bin; getcap /bin/ping | grep -q . || apt install --reinstall iputils-ping; getcap /bin/ping

Particularly interesting are hetzner-hel1-01.torproject.org and hetzner-nbg1-01.torproject.org because they were setup using the Hetzner cloud stuff, so our install procedure is broken there as well.

So, long story short, remaining todo is:

  1. document the workaround in the ganeti installer (done, in ganeti, will need to be reverted when the package is fixed)
  2. ship the new package in Debian, get it to stable (in progress: announced intention to NMU)
  3. fix the hetzner-cloud installer or at least add a node to check it
  4. check the other installers
Last edited 3 weeks ago by anarcat (previous) (diff)

comment:6 Changed 3 days ago by anarcat

filed bug 944538 in the debian bts to get stable updated...

Note: See TracTickets for help on using tickets.