#32679 closed task (duplicate)

Create VM to run monitoring software for anti-censorship team

Reported by: phw Owned by: tpa
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: cohosh Actual Points:
Parent ID: #30152 Points:
Reviewer: Sponsor:

Description

So far, the anti-censorship team's infrastructure is monitored by a sysmon instance that gman999 generously runs for us. Every five minutes, sysmon establishes TCP connections to a number of machines and if any of these checks fails twice, we get an email alert.

The problem is that we cannot directly edit its configuration file, so we email gman999 whenever it needs an update. I would like to avoid this friction. Besides, sysmon is very simple and cannot handle, say, HTTP redirects.

I think it would be best for the anti-censorship team to run its own monitoring service, on a dedicated VM. We can then add monitoring targets ourselves and don't need to block on others.

I have been experimenting with a service called monit. It's free software and lightweight, yet flexible enough to fulfill our needs. I think it would be helpful to run monit on a dedicated VM. Does this make sense?

Child Tickets

Change History (8)

comment:1 Changed 11 months ago by anarcat

my first, gut reaction to this is that we shouldn't use another tool than what we already have to do monitoring. more specifically, I think we should embrace the Prometheus infrastructure I have setup in march, and instead of setting up Monit, we should setup a Prometheus black box exporter service.

I will also note that the anti-censorship team has asked about this in the past (#30929, #29863). To be more precise, you *already* have a VM for monitoring, it has the lovely name of hetzner-nbg1-02 and is currently setup with a bare Prometheus setup, but does nothing. :)

I would be *very* happy to get more people involved in managing that thing. How about we use that infrastructure instead? :)

Or, in other words, what's missing from the Prometheus setup so that you can do your work? Tell me, I want to help! But I'm hesitant in deploying a different service that what we're trying to converge upon.

comment:2 Changed 11 months ago by anarcat

oops! i just talked with hiro and just realized she recommended you open this ticket. I'll talk with her about this and get back to you, sorry for the confusion. :)

comment:3 Changed 11 months ago by phw

Oh, I didn't know about hetzner-nbg1-02. I'm fine with using something that's already set up. I just want our team to be able to configure monitoring targets ourselves, so we don't need to block on anyone else.

comment:4 Changed 11 months ago by phw

Parent ID: #30152

comment:5 Changed 11 months ago by phw

We discussed this in today's anti-censorship meeting. Here's a summary:

  • We will use Nagios for internal services: BridgeDB, Snowflake, and GetTor.
  • We will use Prometheus's "blackbox exporter" for default bridges, which are external services.
  • Our admins will handle our Nagios config and the anti-censorship team will handle Prometheus.
  • We will experiment with Prometheus's "alertmanager", which can send notifications if a monitoring target goes offline.

For Nagios, here are our monitoring targets:

Note that the strings that should be present in the respective pages are mere suggestions. Ultimately, we just need a test that guarantees that these pages are correctly serving content.

Admins: is there anything else that you need to move forward with this?

comment:6 Changed 11 months ago by anarcat

Our admins will handle our Nagios config and the anti-censorship team will handle Prometheus.

More specifically, the anti-censor team will handle the exporter, not prometheus, which stays in TPA's hands.

We should also grant you acess to the grafana dashboard so you can play with that. :)

comment:7 Changed 10 months ago by anarcat

i still feel this ticket is a duplicate of #31159, could we close one of them so i get a little peace of mind? :)

i might not have time to look into this before the holidays, but we'll see...

comment:8 in reply to:  7 Changed 10 months ago by phw

Resolution: duplicate
Status: newclosed

Replying to anarcat:

i still feel this ticket is a duplicate of #31159, could we close one of them so i get a little peace of mind? :)


Absolutely! I'll summarise what we discussed here in #31159.

Note: See TracTickets for help on using tickets.