Opened 7 months ago

Last modified 5 months ago

#33921 assigned task

gitlab monitoring

Reported by: anarcat Owned by: hiro
Priority: Low Milestone:
Component: Internal Services/Services Admin Team Version:
Severity: Normal Keywords: tpa-roadmap-june
Cc: Actual Points:
Parent ID: #29400 Points:
Reviewer: Sponsor:

Description

we need to have gitlab metrics in grafana. we also need to make sure things work (nagios).

this implies monitoring the webserver (nginx i guess?) and it would also be nice to have internal gitlab metrics. there's a builtin prometheus exporter in gitlab so we should be able to reuse that. see:

https://docs.gitlab.com/ee/administration/monitoring/prometheus/
https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html

Child Tickets

Change History (9)

comment:1 Changed 7 months ago by anarcat

Keywords: tpa-roadmap-april added
Owner: set to hiro
Status: newassigned

comment:2 Changed 6 months ago by hiro

Keywords: tpa-roadmap-may added; tpa-roadmap-april removed

comment:3 Changed 6 months ago by anarcat

Priority: MediumLow

i deprioritized that work because we already have alerting setup. this is about hooking up gitlab's Prometheus endpoint into prometheus, so it's less a priority than other work we have.

comment:4 Changed 5 months ago by hiro

This should start enabling gitlab exporters and connecting them to our prometheus server.

https://share.riseup.net/#66tKNKzMwngju3kXXjLfIg

Currently WIP.

comment:5 Changed 5 months ago by anarcat

+        monitoring_whitelist      => [
+          '195.201.139.202',
+        ],

this should have a # XXX MAGIC-IP-ADDRESS comment over it, in the short term. but even better, it should not be hardcoded at all and somehow guessed correctly. weasel had some tricks to fetch the IP address of another node, but I forgot how he pulled it off.

@@ -38,6 +38,10 @@ class profile::prometheus::server::internal (
     { 'job_name' => 'postfix' },
     { 'job_name' => 'postgres' },
     { 'job_name' => 'mtail' },
+    { 'job_name' => 'gitlab_exporter' },
+    { 'job_name' => 'redis_exporter' },
+    { 'job_name' => 'gitaly' },
+    { 'job_name' => 'gitlab_workhorse' },
   ]
   class { 'profile::prometheus::server::common':
     vhost_name          => $vhost_name,

that, in itself, won't be sufficient for the server to talk to the exporters. you'd need to export (as in @@rule in Puppet) the right resource for that to work. see how profile::prometheus::blackbox_exporter (and, really, whatever is going on in prometheus::blackbox_exporter) for an example on how to do this.

@@ -57,5 +61,9 @@ class profile::prometheus::server::internal (
     'postgres': port => 9187;
     'bind': port => 9119;
     'mtail': port => 3903;
+    'gitlab_exporter': port => 9168;
+    'redis_exporter': port => 9121;
+    'gitaly': port => 9236;
+    'gitlab_workhorse': port => 9229;
   }
 }

inversely, this will not do anything either unless you create a rule like the one in prometheus::blackbox_exporter, e.g.

  # realize the allow rules defined on the prometheus server(s)
  Ferm::Rule <<| tag == 'profile::prometheus::server-blackbox-exporter' |>>

comment:6 Changed 5 months ago by hiro

I should have had specified that my first comment wasn't a request for a review, but rather me logging how I was going about solving this. As a matter of fact I did not see your comments till today when I came to update this ticket with the grafana dashboard.

As you mentioned there are many things that should be improved. Not hard coding prometheus IP in gitlab config is one of them.

Anyways, this is the first iteration. Dashboards available at:

Gitlab dashboard: https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus?orgId=1&refresh=5m

Gitaly dashboard: https://grafana.torproject.org/d/x6Z50y-iz/gitlab-gitaly?orgId=1&refresh=1m

Gitlab Node dashboard: https://grafana.torproject.org/d/Z7T7Cfemz/node-exporter-full?orgId=1&var-job=gitlab&var-node=gitlab-02.torproject.org&var-port=9101

comment:7 Changed 5 months ago by anarcat

Anyways, this is the first iteration. Dashboards available at:

That looks awesome, thanks! :)

comment:8 Changed 5 months ago by anarcat

As a matter of fact I did not see your comments till today when I came to update this ticket with the grafana dashboard.

as an aside, I'm a bit concerned by that... don't you get email notification on ticket updates? :) or is there another way i should notify you when i update tickets?

comment:9 Changed 5 months ago by hiro

Keywords: tpa-roadmap-june added; tpa-roadmap-may removed

There are a few things missing yet on this ticket.

Firstly the prometheus configuration should be cleaned up.
We shouldn't have the prometheus server IP hardcoded in gitlab.rb.
Also nginx isn't currently monitored.

Note: See TracTickets for help on using tickets.