BridgeDB needs Nagios checks for the Email Distributor

added actualpoints::1 anti-censorship-roadmap-2020 bridgedb-email component::circumvention/bridgedb nagios owner::phw parent::30152 points::5 priority::high resolution::implemented reviewer::cohosh severity::normal sponsor::30-must status::closed type::enhancement labels

Trac:
Cc: isis, dawuud, sysrqb to isis, dawuud, sysrqb, Lunar

I think what is needed here is a passive style service check. This check is runs on it's own schedule via cron or something; it sends e-mail to the Email Distributor and then periodically checks it's e-mail inbox via IMAP... If we don't receive an email with the heuristics we are looking for in X minutes then send an alert to the nagios server.

I marked #10916 (moved) as a duplicate of this ticket. The pertinent points made there were:

Replying to sysrqb:

After chatting with lunar about it we began discussing additional monitoring for the email distributor. The check_email_delivery nagios plugin was suggested.

and

Replying to isis:

Replying to sysrqb:

I also wondered if we should consider whitelisting tp.o addresses for use by the monitoring system (among other reasons).

We can't safely whitelist torproject.org email addresses because the torproject.org mailserver doesn't do DKIM. Because of this, I started adding a (email_address, gpg_fingerprint) whitelisting feature, requiring that such whitelisted addresses be signed with a particular key. (See #9332 (moved) and note that this feature would present a maintainability nightmare.)

Trac:
Status: new to accepted

Set all open tickets without a severity to "Normal"

Trac:
Severity: N/A to Normal

Trac:
Reviewer: N/A to N/A
Status: accepted to assigned
Owner: isis to hiro
Sponsor: N/A to N/A

Trac:
Cc: isis, dawuud, sysrqb, Lunar to isis, dawuud, sysrqb, Lunar, gaba

Trac:
Sponsor: N/A to Sponsor19

Trac:
Points: N/A to 4

Trac:
Points: 4 to 5

Trac:
Owner: hiro to dgoulet

Trac:
Parent: N/A to #30152 (moved)

For what it's worth, we're now monitoring BridgeDB's SMTP port with sysmon. We will get notified if the SMTP server disappears but we are unable to detect more subtle, application-layer breakage.

Adding the keyword to mark everything that didn't fit into the time for sponsor 19.

Trac:
Keywords: N/A deleted, ex-sponsor-19 added

Moving from Sponsor 19 to Sponsor 30.

Trac:
Sponsor: Sponsor19 to Sponsor30-must

dgoulet will assign himself to the ones he is working on right now.

Trac:
Owner: dgoulet to N/A

Trac:
Keywords: ex-sponsor-19 deleted, anti-censorship-roadmap-october added

Trac:
Keywords: anti-censorship-roadmap-october deleted, anti-censorship-roadmap-2020Q1 added

Change tickets that are assigned to nobody to "new".

Trac:
Status: assigned to new

I refactored hiro's "check for emails" script in this commit. The script writes its output to /srv/bridges.torproject.org/check/status. I can set up a cron job that runs this script every, say, six hours. We will probably encounter some more hiccups once the script is running in production. Hiro, can you remind me what will happen if nagios considers BridgeDB's email responder down? Will I be able to see this in the nagios web UI? I'm asking because there will probably be a few more hiccups with the "check email" script once it's running continuously.

Trac:
Status: new to needs_information

BridgeDB needs Nagios checks for the Email Distributor

Child items ...

Activity