Opened 9 months ago

Closed 5 months ago

Last modified 5 months ago

#33027 closed defect (fixed)

Add blackbox exporter to prometheus ext

Reported by: hiro Owned by: tpa
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: anarcat Actual Points:
Parent ID: #30152 Points:
Reviewer: Sponsor:

Description (last modified by hiro)

I have started working on adding the possibility for our external prometheus to target endpoints via the blackbox exporter.

Patch: https://share.riseup.net/#wd8W-oiIFU3_FRkKBp_TEg

What is missing is the possibility for the service admins to add more targets without having to send patches to our puppet.

Child Tickets

Change History (12)

comment:1 Changed 9 months ago by hiro

Description: modified (diff)

comment:2 Changed 9 months ago by anarcat

the grafana dashboard is, as expected, kind of undecipherable. :) so much JSON! so we can review this once it's pulled in the GUI... no harm done their either way, so that's okay.

you might have missed it, but modules/profile/files/grafana/dashboards is its own repository. it's one of those dark corners of the Puppet repository, which is currently a huge monorepo, but that does have some subrepos somehow. i'm looking into how to deal with that problem in #30770, in general. in particular, my remote for this repo right now is git@gitlab.com:anarcat/grafana-dashboards.git because i didn't want to go through the trouble of creating a project in gitolite and all that stuff. i could either give you access to that, or we could move it somewhere else, or i could just push the changes myself.

in any case, it would be great if you could document the dashboard's in the README file next to it so we know where it comes from and so on.

as for the puppet work, the blackbox exporter class looks sane. i don't see it included anywhere in the patch though, so i'm not sure how it should work.

lower down, the scrape_configs array looks a bit strange to me. normally, the targets there look like example.com:9091, ie. with a port number at least. unless https://bridges.torproject.org/metrics actually gives out prometheus-compatible metrics (aka "OpenMetrics"). that doesn't seem to be the case right now (it's a 404).

What is missing is the possibility for the service admins to add more targets without having to send patches to our puppet.

That's going to be the tough part. Because the blackbox exporter is managed through puppet, in this configuration, we forcibly control the config file and local changes would be lost. So it's really one of three things:

  1. they send us the target and so we need to intervene when that needs to be changed
  2. we patch the module to allow us to *not* manage the blackbox config (yet another patch to the module, we can take it)
  3. we just *not* manage the blackbox exporter through puppet at all, and add it manually to the scrape_configs target.

I would do option 2, eventually, and start with option 1. makes sense?

comment:3 in reply to:  2 Changed 9 months ago by hiro

Replying to anarcat:

the grafana dashboard is, as expected, kind of undecipherable. :) so much JSON! so we can review this once it's pulled in the GUI... no harm done their either way, so that's okay.

Sure. I pulled it from the dashboard database. But I have should have added the link to the dashboard itself. I am going to do this.

you might have missed it, but modules/profile/files/grafana/dashboards is its own repository. it's one of those dark corners of the Puppet repository, which is currently a huge monorepo, but that does have some subrepos somehow. i'm looking into how to deal with that problem in #30770, in general. in particular, my remote for this repo right now is git@gitlab.com:anarcat/grafana-dashboards.git because i didn't want to go through the trouble of creating a project in gitolite and all that stuff. i could either give you access to that, or we could move it somewhere else, or i could just push the changes myself.

I can create a git repository or we can just leave it like it is. Whatever works better for you in this case.

in any case, it would be great if you could document the dashboard's in the README file next to it so we know where it comes from and so on.

as for the puppet work, the blackbox exporter class looks sane. i don't see it included anywhere in the patch though, so i'm not sure how it should work.

I haven't included it yet. This will come in the next iteration.

lower down, the scrape_configs array looks a bit strange to me. normally, the targets there look like example.com:9091, ie. with a port number at least. unless https://bridges.torproject.org/metrics actually gives out prometheus-compatible metrics (aka "OpenMetrics"). that doesn't seem to be the case right now (it's a 404).

What is missing is the possibility for the service admins to add more targets without having to send patches to our puppet.

That's going to be the tough part. Because the blackbox exporter is managed through puppet, in this configuration, we forcibly control the config file and local changes would be lost. So it's really one of three things:

  1. they send us the target and so we need to intervene when that needs to be changed
  2. we patch the module to allow us to *not* manage the blackbox config (yet another patch to the module, we can take it)
  3. we just *not* manage the blackbox exporter through puppet at all, and add it manually to the scrape_configs target.

I would do option 2, eventually, and start with option 1. makes sense?

Yep makes sense.

comment:4 in reply to:  2 Changed 9 months ago by hiro

Replying to anarcat:

lower down, the scrape_configs array looks a bit strange to me. normally, the targets there look like example.com:9091, ie. with a port number at least. unless https://bridges.torproject.org/metrics actually gives out prometheus-compatible metrics (aka "OpenMetrics"). that doesn't seem to be the case right now (it's a 404).

Uhm I read through the documentation of the blackbox expoter: https://github.com/prometheus/blackbox_exporter#prometheus-configuration

As far as I can understand it's about the returned status code?

comment:5 Changed 9 months ago by anarcat

uh! so then you're right i guess! :) move along, sorry :)

comment:7 Changed 9 months ago by anarcat

looks good. you're missing a bit in the README however - that's just the reference for a link, that link should also be added in the table above, IIRC. :)

comment:8 Changed 9 months ago by hiro

comment:9 Changed 9 months ago by anarcat

LGTM.

comment:10 Changed 9 months ago by hiro

Ok PR for the grafana dashboard repository: https://github.com/hiromipaw/grafana-dashboards/pull/1

comment:11 Changed 5 months ago by hiro

Resolution: fixed
Status: newclosed

comment:12 Changed 5 months ago by anarcat

just a note that this conversation carries into #31159 as well.

Note: See TracTickets for help on using tickets.