Opened 7 months ago

Last modified 4 weeks ago

#30023 assigned task

improve grafana authentication

Reported by: anarcat Owned by: anarcat
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: cohosh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

the grafana server is now setup (#29684) but there are still issues regarding authentication. we might want to grant access to other users than the admin one, for example.

the original idea was to do the same "anonymous authentication" setup than for Prometheus, except something came up during deployment that made me question that strategy. it was raised while considering deployment of third-party exporters:

something regarding authentication came up through a third-party scraper deployment, in #29863. there were concerns the node exporter would leak information that could be exploited for a side-channel attacks. the node exporter is firewalled, but then all that data is then made available on the prometheus server protected only by a trivial password. they will make an assessment of the exposed data and see if the additional authentication burden is worth the risk.

if we do not go with "anon" authentication, we could connect the Grafana server with LDAP, but then it means it might go down if the LDAP server crashes, which is a problem for a monitoring server, obviously.

in any case, users need to be configured through Puppet, which they currently are not. this is partly related to secrets management and generation in Puppet, which is also discussed in #30009.

Child Tickets

Change History (11)

comment:1 Changed 6 months ago by anarcat

removed #29681 as a parent because we can consider this project complete without changing authentication, which requires a wider discussion on third-party access.

comment:2 Changed 6 months ago by anarcat

Parent ID: #29681

comment:3 Changed 6 months ago by cohosh

I don't know enough about LDAP to comment on that solution, but it seems plausible. My understanding is that we will eventually have alerts? That might make LDAP going offline less of an issue IIUC.

The way this would work is we would give you an onion name and an auth cookie. You put those in HidServAuth in torrc as

HidServAuth xxxxxxxxxxxxxxxx.onion authcookieauthcookie

Then, instead of configuring prometheus to fetch from http://snowflake.bamsoftware.com:9100/, you configure it to fetch from http://xxxxxxxxxxxxxxxx.onion:9100/ with a proxy_url of socks5://127.0.0.1:9050/.

On the server side, we would add HiddenServiceAuthorizeClient to torrc:

HiddenServiceDir /var/lib/tor/prometheus_node_exporter
HiddenServicePort 9100 127.0.0.1:9100
HiddenServiceAuthorizeClient basic prometheus

and then get the auth cookie from /var/lib/tor/prometheus_node_exporter/hostname.

To pull from the conversation in #29863, how difficult would it be to go the Onion Service route?

comment:4 Changed 6 months ago by cohosh

Cc: cohosh added

comment:5 Changed 6 months ago by anarcat

i am not familiar with configuring prometheus to fetch metrics from proxy. if I understand you correctly, there's a proxy_url setting that can be added to do that? I cannot confirm that, in any case.

but sure, that could be possible. i am not sure how that relates to this ticket, however, which is specifically about Grafana authentication, not really about how Prometheus talks with its exporters, for which there is no ticket right now - I thought we had resolved that by selecting only a subset of metrics, but I haven't followed the entire conversation in #29863. What I saw on IRC was specifically that question:

In IRC it also sounded like there was little-to-no authentication on the server that displays these metrics after scraping. Is that the case?

So I thought we were talking about the "server that displays the metrics", ie. Grafana (or the Prometheus frontend), which is why I pointed you here.

But yes, we have a similar problem with the exporters: in theory, someone could do IP spoofing and bypass those firewalls to scrape the metrics off. In practice, those stunts are actually quite hard to pull because you need to do pretty hardcore stuff like BGP hijacking, because of TCP negociations (at least, as far as I understand it, correct me if I'm wrong). But I'm not sure the prize (metrics) would be worth the trouble (upsetting the internet's routing table).

As for the specific solution proposed here, I'd be tempted to simply add HTTPS and username/password authentication, as it's something I understand better and it doesn't require the Tor network to be operational. I always feel a little ackward using Tor to monitor internal infrastructure - I'm all for eating our own dogfood, but when it comes to monitoring, I feel a little less comfortable and prefer redundancy. ;)

That said, the idea of setting up a secondary server would be that other kind of tricks could be tried by other teams, so I don't want to be blocking nice initiatives like this. This is just my grain of salt which I hope will be useful!

comment:6 in reply to:  5 Changed 6 months ago by cohosh

Replying to anarcat:

i am not familiar with configuring prometheus to fetch metrics from proxy. if I understand you correctly, there's a proxy_url setting that can be added to do that? I cannot confirm that, in any case.

but sure, that could be possible. i am not sure how that relates to this ticket, however, which is specifically about Grafana authentication, not really about how Prometheus talks with its exporters, for which there is no ticket right now - I thought we had resolved that by selecting only a subset of metrics, but I haven't followed the entire conversation in #29863. What I saw on IRC was specifically that question:

In IRC it also sounded like there was little-to-no authentication on the server that displays these metrics after scraping. Is that the case?

So I thought we were talking about the "server that displays the metrics", ie. Grafana (or the Prometheus frontend), which is why I pointed you here.

But yes, we have a similar problem with the exporters: in theory, someone could do IP spoofing and bypass those firewalls to scrape the metrics off. In practice, those stunts are actually quite hard to pull because you need to do pretty hardcore stuff like BGP hijacking, because of TCP negociations (at least, as far as I understand it, correct me if I'm wrong). But I'm not sure the prize (metrics) would be worth the trouble (upsetting the internet's routing table).

Ah, you're right. I was conflating these two separate issues. We can leave this ticket for the grafana access issue only.

As for the specific solution proposed here, I'd be tempted to simply add HTTPS and username/password authentication, as it's something I understand better and it doesn't require the Tor network to be operational. I always feel a little ackward using Tor to monitor internal infrastructure - I'm all for eating our own dogfood, but when it comes to monitoring, I feel a little less comfortable and prefer redundancy. ;)

That said, the idea of setting up a secondary server would be that other kind of tricks could be tried by other teams, so I don't want to be blocking nice initiatives like this. This is just my grain of salt which I hope will be useful!

Okay, HTTPS + a stronger password to access the graphs sounds fine to me right now. I think especially with our current policy of only exporting a small amount of metrics we don't have much to risk at the moment.

comment:7 Changed 6 months ago by anarcat

Okay, HTTPS + a stronger password to access the graphs sounds fine to me right now. I think especially with our current policy of only exporting a small amount of metrics we don't have much to risk at the moment.

So if I understand you right, the blocker for snowflake being integrated in Prometheus monitoring is to make sure we have proper passwords to protect the metrics. This currently means locking down the Prometheus web interface itself which is protected only by a trivial password to keep the bots away, and opening up Grafana to external authentication so that non-TSA folks can access it.

In other words, right now we have the current situation:

  • Prometheus: trivial password, easy to guess, access to raw metrics and rough graphs possible for public
  • Grafana: single, strong, shared admin password, only accessible to TPA, full graphs and queries possible only for people with the shared password

The new situation would be:

  • Prometheus: hard, strong, shared password only accessible to TPA for debug purposes (or accessible only to localhost)
  • Grafana: LDAP authentication or some other mechanism to grant people outside TPA access to the server

There are two problems with this:

  1. we're hesitant in setting LDAP authentication in Grafana, because we don't want to have monitoring depend on LDAP to work but also because we're hesitant in putting more stuff in LDAP in general (the less stuff has access to it the better)
  2. we might *want* to provide (semi-)public (with trivial password) access to those graphs for transparency and ease-of-access reasons

What TPA is proposing now is to setup another server to monitor external resources. It would solve problem 2 above, but not problem 1. It would also conflate two distinct problems

  • "external resources should not be monitored alongside internal resources"
  • "some metrics should stay private" problem

Because, of course, maybe some internal resources should stay private and external resources should be public, and vice versa. I'm not sure how to resolve this conendrum.

I don't have a clear view of what goes where, to be honest. There were a few requests already for external monitoring and I must admit I somehow have trouble keeping track of things. :) There are at least two concurrent requests from the anti-censorship team, one of which was resolved internally (#30006) so I hope you understand I can get a little confused... It's also unclear to me what the endgame is with snowflake: there was talk of migrating it inside TPO, for example...

comment:8 Changed 4 weeks ago by anarcat

i think the final decision here was to move possibly sensitive metrics to another host, which was built in #29863. so we're free to implement the authentication we want here, and I will setup a guest account in grafana early next week to give people access.

comment:9 Changed 4 weeks ago by anarcat

started this deployment, using the auth proxy configuration (.htaccess style). disabled puppet because it can't write properly hashed .htpasswd files.

considering those as solutions:

comment:10 Changed 4 weeks ago by anarcat

Owner: changed from tpa to anarcat
Status: newassigned

comment:11 Changed 4 weeks ago by anarcat

grafana now has a standard 'basic access authentication' layer from the Apache proxy, using the common 'tor-guest' account. configuration has been done with triocla as per #30009, so i think this is all done...

... *except* that we still have that pesky third-party prometheus/grafana server out there that *does* have those requirements. so the trocla() stuff will need some refactoring to cover for that use case. maybe we'll need some hiera integration here?

Note: See TracTickets for help on using tickets.