Can Prometheus help with multiple checks turning into one single alarm?

Trac:
Parent Ticket: #29681 (moved)

added component::internal services/tor sysadmin team owner::tpa parent::29681 priority::medium resolution::fixed severity::normal status::closed type::task labels

it depends what you mean by "multiple checks" or "duplicates". prom's alerting system is designed to be highly available (HA) so even if multiple alerting nodes need to alert, the user will receive only one (thanks to $magic).

but i suspect that if you have multiple slightly different checks, it will do multiple alerts. the key here, in my experience with nagios anyways, is to setup alerting dependencies so that if service A (e.g. http://tpo) fails because of service B (e.g. ICMP to tpo or ICMPv6 to tpo), you only get warned once (e.g. for service B). but it's tricky to setup, easy to get wrong, and i'm not sure prometheus support such dependency chains.

I found prometheus to be somewhat lacking in monitoring: the HA design is good, but there's nothing like the vast diversity of service checks that nagios has. in particular, you can set threshold and there are many plugins to monitor a lot of things, but they don't come with predefined limits, like "90% disk usage or load of NCPU*2 is WARNING", so you need to define those on your own.

i think I answered this question, otherwise reopen.

Trac:
Resolution: N/A to fixed
Status: new to closed
Parent: N/A to #29681 (moved)

further discussion on the topic of alerting should be held in #29864 (moved)

closed

mentioned in issue #29864 (moved)

mentioned in issue #29681 (moved)

Can Prometheus help with multiple checks turning into one single alarm?

Child items 0

Activity