Opened 5 months ago

Closed 5 months ago

Last modified 5 months ago

#34098 closed defect (fixed)

crm-int-01 running out of disk space

Reported by: anarcat Owned by: anarcat
Priority: Very High Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Major Keywords: tpa-roadmap-april
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We're at 92% disk use on crm-int-01 suddenly... It seems that disk usage grew suddenly three days ago:

https://grafana.torproject.org/d/ER3U2cqmk/node-exporter-server-metrics?panelId=31&fullscreen&orgId=1&var-node=crm-ext-01.torproject.org:9100&var-node=crm-int-01.torproject.org:9100&from=1585834161059&to=1588426161060

The mariadb server was stopped this morning as well. It seems innodb crashes with the following assertion:

2020-05-02  9:08:31 41 [ERROR] InnoDB: preallocating 65536 bytes for file ./torcrm_prod/civicrm_acl_contact_cache.ibd failed with error 28
2020-05-02  9:08:31 41 [Warning] InnoDB: Cannot create table `torcrm_prod`.`civicrm_acl_contact_cache` because tablespace full
2020-05-02 09:08:31 0x7f6c50252700  InnoDB: Assertion failure in file /build/mariadb-10.3-qB78gy/mariadb-10.3-10.3.22/storage/innobase/dict/dict0dict.cc line 491

There are also warnings on startup:

2020-05-02 13:23:17 0 [Note] InnoDB: Ignoring data file './torcrm_prod/#sql-ib1158381.ibd' with space ID 1137566. Another data file called ./torcrm_prod/civicrm_acl_contact_cache.ibd exists with the same space ID.
2020-05-02 13:23:17 0 [Note] InnoDB: Ignoring data file './torcrm_prod/civicrm_acl_contact_cache.ibd' with space ID 1137566. Another data file called ./torcrm_prod/#sql-ib1158381.ibd exists with the same space ID.

We have ~1.5GB left on the server:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        20G   18G  1.6G  92% /

Most of that 18G is in /var with 5GB split between the two databases:

    2.4 GiB [##########] /torcrm_prod
    2.3 GiB [######### ] /torcrm_staging

Here's the top 10 tables in disk usage:

  528.0 MiB [##########]  civicrm_mailing_event_queue.ibd
  432.0 MiB [########  ]  civicrm_mailing_recipients.ibd 
  340.0 MiB [######    ]  civicrm_activity_contact.ibd  
  172.0 MiB [###       ]  civicrm_contact.ibd         
  168.0 MiB [###       ]  civicrm_log.ibd    
  148.0 MiB [##        ]  civicrm_mailing_event_delivered.ibd
   80.0 MiB [#         ]  civicrm_activity.ibd               
   60.0 MiB [#         ]  civicrm_group_contact.ibd
   52.0 MiB [          ]  civicrm_subscription_history.ibd
   52.0 MiB [          ]  civicrm_email.ibd               

Backups take up the most space, however, at about 10GB. I am not familiar with how the backup system works on that host, but there are about 7.5GB of SHA256-* files in there:

root@crm-int-01:/var/backups/local/mysql# du -sch SHA256-* | tail -1
7.6G	total

Some of those are fairly old too:

root@crm-int-01:/var/backups/local/mysql# ls -alt SHA256-*  | tail -2
-rw-r----- 2 root root 16130992 Feb  8  2019 SHA256-f6810ff0245807455347d88a1a0d7eaf29368e64188e7c1766b64c0cc143570e
-rw-r----- 2 root root   130308 Feb  8  2019 SHA256-c30b262677ed796d892b888f7f417690ca55dd83bf17fa421fdd32438ca2203a

It also seems that the database grew quite a bit -- doubled in size -- in the last few months, according to the backup sizes:

  800.9 MiB [##########]  20200415-190301-torcrm_prod                                                                                                                                          
  670.4 MiB [########  ]  20200109-190301-torcrm_prod
  398.9 MiB [####      ]  20191002-190301-torcrm_prod

Obviously, a database server running out of disk space is an... undesirable condition, to say the least. :) Should we expand the disk usage for that server (which I would rather avoid doing during the weekend) or is there something you can do on your end to clean stuff up?

Alternatively, maybe we should improve the backup system here so it doesn't take up twice as much disk space as the production server. Or, even better, not be on the same partition as the prod...

Child Tickets

Change History (5)

comment:1 Changed 5 months ago by weasel

I removed a few failed mysql backups from /var/backups.

We did hourly backups of mysql and we keep them for a day or so. (and then we keep dailies for a week or two, and weekly for months etc)

Changed puppet to only do dumps every four hours instead.

comment:2 Changed 5 months ago by peterh

I think we should also up the disk space. The size of the mailings has really increased since the second half of last year, so I think it'll probably maintain the pace it's been going at over the last six months.

comment:3 Changed 5 months ago by anarcat

Owner: changed from tpa to anarcat
Status: newaccepted

I think we should also up the disk space. The size of the mailings has really increased since the second half of last year, so I think it'll probably maintain the pace it's been going at over the last six months.

Okay, that's the confirmation I needed. We have enough space in the cluster, i'll just double it.

comment:4 Changed 5 months ago by anarcat

Resolution: fixed
Status: acceptedclosed

actually, there was enough space *inside the server* already, just on a different partition. i have moved the mysql database to /srv/mysql and bind-mounted it in /var/lib/mysql, which gives a much more reasonable disk usage:

root@crm-int-01:~# df -h  / /srv
Filesystem                       Size  Used Avail Use% Mounted on
/dev/sda1                         20G  9.2G  9.5G  50% /
/dev/mapper/vg_crm--int--01-srv   20G   13G  6.5G  66% /srv

... compared to previously:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        20G   18G  1.6G  92% /

(i don't have numbers on hand right now of the /srv disk usage before the switch, but it was minimal, something around 2GB.)

so this should give us a few months, if not years.

there was a short downtime (about 5 minutes) during the switchoever, and everything should be back online.

comment:5 Changed 5 months ago by anarcat

Keywords: tpa-roadmap-april added
Note: See TracTickets for help on using tickets.