Opened 5 months ago

Last modified 6 hours ago

#30020 accepted project

switch from our custom YAML implementation to Hiera

Reported by: anarcat Owned by: anarcat
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: #29387 Points:
Reviewer: Sponsor:

Description (last modified by anarcat)

We currently use a custom-made YAML database for assigning roles to servers and other metadata. I started using Hiera for some hosts and it seems to be working well.

Hiera is officially supported in Puppet and shipped by default in Puppet 5 and later. It's the standard way of specifying metadata and class parameters for hosts. I suspect it covers most of our needs in terms of metadata and should cover most if not all of what we're currently doing with the YAML stuff in Puppet.

We should therefore switch to using Hiera instead of our homegrown solution.

This involves converting:

  • if has_role('foo') { include foo } into classes: [ 'foo' ] in hiera
  • hardcoded macros in the ferm module's me.conf.erb into exported resources
  • templates looping over allnodeinfo into exported resources
  • the $roles array into Hiera
  • the $localinfo into Hiera (assuming all the data is there)
  • the $nodeinfo and $allnodeinfo arrays into Hiera (assuming we can switch from LDAP for host inventory)
  • basically any other stuff of the kind, including those files:
    ./modules/torproject_org/misc/hoster.yaml
    ./modules/torproject_org/misc/local.yaml
    ./modules/ipsec/misc/config.yaml
    ./modules/roles/misc/static-components.yaml
    ./modules/roles/files/spec/spec-redirects.yaml
    

Ideally, all YAML data should end up in the hiera/ directory somehow. This is the first step in making our repository public (#29387) but also using Hiera as a more elaborate inventory system (#30273).

The idea of switching from LDAP to Hiera for host inventory will definitely need to be evaluated more thoroughly before going ahead with that part of the conversion, but YAML stuff in Puppet should definitely be converted.

The general goal of this is both to allow for a better inventory system but also make it easier for people to get onboarded with Puppet. By using community standards like Hiera, we make it easier for new people to get familiar with the puppet infrastructures and do things meaningfully.

Child Tickets

Change History (9)

comment:1 Changed 4 months ago by anarcat

this has started. most of site.pp has been emptied, with the easy stuff first. the hard stuff are hosts where the $roles function is actually relevant. for example, the following ferm macros are probably actually in use:

HOST_ROLE_PUPPETMASTER HOST_ROLE_DIP HOST_ROLE_JENKINS HOST_ROLE_NAGIOSMASTER

... and probably more, namely bacula. other classes will refer to the $roles or nodeinfo lists explicitely as well and will need to be broken up in separate classes that then get properly included. but it's a great start and so far no breakage that i know of.

i documented the impact of the change in site.pp, but it might be good to add something to the wiki docs as well.

comment:2 Changed 4 months ago by anarcat

some more progress, but this time harder stuff: I converted the DNS servers to Hiera. this involved splitting some classes and exporting resources. in my travels, those are the important HOST_ROLE_ ferm rules that I found might be problematic:

HOST_ROLE_BACULA_DIRECTOR
HOST_ROLE_BACULA_STORAGE
HOST_ROLE_DIP
HOST_ROLE_DNS_SECONDARY
HOST_ROLE_JENKINS
HOST_ROLE_NAGIOSMASTER
HOST_ROLE_PUPPETMASTER

I also found HOST_NETNOD but I think that might be a static definition.

HOST_ROLE_DNS_SECONDARY is now gone, and replaced by exported ferm::rule constructs. This works well, but @weasel was somehow worried about security issues with exported resources, which I am not sure are relevant in this case.

Another problem is that the ferm module is setup to ''realize'' the virtual ferm::rule` stuff defined everywhere. This implies that the exported resources are also realized locally. That's fairly harmless, because the host allows itself access to itself, but it's noisy and annoying.

I don't know why ferm::rule entries are virtual everywhere, so that's something I'd like to explore as well in the future.

Another problem I found when working on the DNS stuff is that the DNS primary does checks on the the DNS secondaries, seemingly through NRPE, because it is in the allowed_hosts list in the NRPE config. This makes it impossible to remove the dns_primary role from local.yaml for now and I'm not sure how to work around that without creating a global variable for the DNS primary host, which would be an unfortunate regression.

So two pending questions:

  1. what is the security issue with exported resources? is the current pattern used in the bind module and prometheus profile acceptable?
  1. why are ferm::rule entries virtual?
  1. how can we export arbitrary IPs in configuration files in Hiera? specifically, how do we construct NRPE's allowed_hosts list of IPs from other hosts?

My tentative guesses on this are:

  1. impact minor, even if security issue (possibility to manipulate firewall rules between nodes)
  2. probably just an oversight?
  3. i feel dirty saying it, but a fancy sed Exec exported resource?

comment:3 Changed 4 months ago by anarcat

Another possible solution is to move from LDAP to Hiera for host metadata. That is where, after all, Puppet is getting some of those IP addresses from and it would be possible to simply do lookups in Hiera for those, if it was properly loaded and ordered.

Another case I found is roles::weblog_sink which constructs SSH keys from the YAML data. This could be generated from exported resources as well, for example with the ssh_authorized_keys builtin type.

So in other words, I think this project is doable, but it will require refactoring and lots of work.

In the end, though, we would have one YAML file per host in hiera/nodes/$FQDN.yaml. This could be made fairly human-readable if we make a good template, and be the single source of truth for all information about a host including hosting provider, cost and so on, solving our inventory problem, (partly) described in #29816.

I think this is worth it and will make it easier to get people involved in Puppet work.

comment:4 Changed 4 months ago by anarcat

site.pp is now mostly empty. all the has_role constructs are gone from there.

those two are gone as well:

HOST_ROLE_BACULA_DIRECTOR
HOST_ROLE_BACULA_STORAGE

the trickiest part, surprisingly, was the little warning added to the motd. i've hacked something together using update-motd.d but i'm actually quite unhappy about it, because it doesn't display the same way that it did before. if the machines were all running buster, this wouldn't be a problem anymore because there's /etc/motd.d there, but we're probably stuck in stretch for a while.

since this is only for *three* machines, I think we can afford the little ugliness for now.

Linux build-arm-02 4.19.0-0.bpo.4-arm64 #1 SMP Debian 4.19.28-2~bpo9+1 (2019-03-27) aarch64

 Note that this host is _NOT_ being backed up.  If you care about your
 data, run your own backups.


This device is for authorized users only.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Welcome to build-arm-02.torproject.org, used for the following services:
	buildbox
	porterbox

 If you use this as a porter/buildbox, you might find
 https://dsa.debian.org/doc/schroot/ helpful.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Last login: Fri Apr 19 20:44:31 2019 from 95.216.141.241

I have also found HOST_TPO which is basically a list of the public IP of all TPO hosts, as taken from LDAP (modules/puppetmaster/lib/puppet/parser/functions/allnodeinfo.rb). So we can keep that macro for now until we decide about the overlap between LDAP and Hiera. The motd is similarly extracted mostly from stuff in LDAP and would benefit from such a refactoring as well.

Anyways. Next up is the roles file, which has tons more fun stuff like this to clear out. :)

Note that I've had answers to my earlier questions, somehow:

  1. I don't think there's any serious security issues with exported resources, they way they're setup. At worst a host might be able to push different firewall holes than expected. If we want to fix that issue, we can make new defines with hardcoded definitions that, when collected on hosts, will only poke the holes that are expected.
  1. it's just a copy-paste historical error, that I've made myself in other occasions
  1. no solution to the NRPE allowed_hosts problem just yet, but I'm tempted to just use a hardcoded variable for now. this is what is used for bacula::bacula_director_address for example: it's hardcoded to dictyotum.torproject.org so there's prior art to hardcoding stuff like that. of course it would be hardcoded into hiera, not the class name, ideally...

comment:5 Changed 4 months ago by anarcat

Description: modified (diff)

i did more work here. the following macros have now been safely removed:

HOST_STATIC
HOST_ROLE_PEOPLE
HOST_ROLE_METRICSBOT
HOST_ROLE_JABBER_SERVER
HOST_ROLE_WEBLOG_SOURCE
HOST_ROLE_WEBLOG_SINK

This also led to the removal of a custom SSH keys generation template (modules/roles/templates/weblog_sink/webstats-authorized_keys.erb), although it hasn't been converted to the native ssh_authorized_keys because of the format difference between the custom fact we use to export the ssh keys and the one expected by the type. This could be fixed in another refactoring at some other time.

Now, I'm working on the static_* stuff, which is like weblog_* but a little more complicated because the config files are not (yet) built with config::fragment. The SSH firewall configuration was a little more complicated but it's been migrated already. Next up is the authorized_keys which should follow the same pattern as the weblog stuff and then the config::fragment conversion. There are also corner cases with more sub-roles for that one that will need to be taken into account, but those can hopefully be converted into class parameters.

There are now 36 roles left in the roles class. There were about 50 roles, split between site.pp and the roles class, when I started this, about a week ago, so i think it would be fair to assume this first part of the conversion will be done in a week or two.

comment:6 Changed 4 months ago by anarcat

i got a little tired of battling this, so I took a small break. I still migrated a few roles:

civicrm_ext_2018
civicrm_int_2018
civicrm_ext
civicrm_int
public_git
rt
svn
metrics
exonerator
bridges
trac
mandos_server

many of those were easy marks: the ssl::service stuff were just a lot of copy-paste, which might have been better implemented by having a parametrized class with the node-specific parameters in hiera, something like:

class profile::ssl_web($name, $onion = false) {
   ssl::service { $name: notify => Exec['service apache2 reload'], key => true, onion => $onion }
}

And in (say) eugeni.torproject.org.yaml, you would have:

profile::ssl_web::name: "lists.torproject.org"
profile::ssl_web::onion: true
classes:
  - profile::ssl_web

... but I didn't want to overthink this just yet. plus we might want to manage those services more closely in Puppet eventually and such a class would just make it difficult. Besides, i suspect this would belong in the Apache module, not in a profile. And we should have a role in Hiera instead of a profile, so we would end up creating the equivalent of the profile I ended up making anyways:

class profile::lists {
  ssl::service { 'lists.torproject.org':
    notify => Exec['service apache2 reload'],
    key    => true,
  }
}

So I think it's the right conversion for now. I'm not converting the entire hierarchy to R/P/M just yet anyways, just switching to Hiera is enough work as it is.

There are now 22 has_role calls left in the main roles class, down from around 50. Unfortunately, there is actually more roles in the local.yaml file (33) that I haven't considered or noticed, so we haven't crossed the magic halfway point just yet.

comment:7 Changed 3 months ago by anarcat

down to 6 has_role (down from ~50) in the main roles class, thanks to the help of hiro who joined in the effort. there are also still 18 roles (down from 57) left in local.yaml, which i'll try to tackle next. there are some leftovers of the static-* roles there that I seem to have skipped over. they are bound to SSH key propagation and internal class parameters, so it was likely deliberate.

but we have definitely cross the halfway point, and I'd say we're getting close to the finish line, at least with regards to the custom has_role stuff. there's naturally more stuff that could move to Hiera and other YAML files strewed around the codebase, but this is a huge chunk that will be done shortly.

Those are the files I am currently aware of that would benefit to be transitioned into Hiera:

./modules/torproject_org/misc/hoster.yaml
./modules/torproject_org/misc/local.yaml
./modules/ipsec/misc/config.yaml
./modules/roles/misc/static-components.yaml
./modules/roles/files/spec/spec-redirects.yaml

But I suspect many of those will be easier than the wide-ranging has_role transition, as each one of those file touches one or only a few module, as opposed to the local.yaml file which touched *everything*.

So, good progress, even if slow.

comment:8 Changed 7 days ago by anarcat

Description: modified (diff)
Status: assignedaccepted

we now have:

  • 3 has_role references
  • 4 roles left (haproxy, mail_processing, natted, no_hw_clock)
  • 2 localinfo references (in postfix, related to mail_processing)
  • 13 allnodeinfo references
  • 26 nodeinfo references

That's on the stuff that I started working on at all. The hoster.yaml stuff, in particular, is a whole other ball game. It's less work than the larger local.yaml, but still impacts a lot of things, which are mostly visible in the nodeinfo calls:

anarcat@curie:tor-puppet(master)$ git grep -c nodeinfo
manifests/site.pp:2
modules/bind/templates/named.conf.puppet-shared-keys.erb:1
modules/ferm/templates/defs.conf.erb:6
modules/hosts/templates/etc-hosts.erb:2
modules/motd/templates/motd.erb:11
modules/ntp/templates/ntp.conf:1
modules/postfix/templates/main.cf.erb:2
modules/postgres/manifests/backup_server/register_backup_clienthost.pp:1
modules/puppetmaster/lib/puppet/parser/functions/allnodeinfo.rb:2
modules/puppetmaster/lib/puppet/parser/functions/nodeinfo.rb:22
modules/resolv/templates/resolv.conf.erb:3
modules/roles/manifests/onionoo_backend.pp:2
modules/syslog_ng/templates/syslog-ng.conf.erb:1
modules/torproject_org/manifests/init.pp:2
modules/unbound/manifests/init.pp:4
modules/unbound/templates/unbound.conf.erb:4

Similarly, the allnodeinfo construct imports a lot of stuff from LDAP into Puppet, which we might want to move into Hiera. That, however, could be left for a second phase as it would significantly disrupt the current host lifetime workflow.

The status of the YAML file conversion is as follows:

  • ./modules/torproject_org/misc/hoster.yaml: not started
  • ./modules/torproject_org/misc/local.yaml: 53/57 roles done! almost finished, see below for the status of the remaining 4
  • ./modules/ipsec/misc/config.yaml: will be phased out in favor of the new exported resource system built for the new networks on fsn-node-*
  • ./modules/roles/misc/static-components.yaml: maybe easier to keep as such for now, or rewrite the static backend to read the file directly?
  • ./modules/roles/files/spec/spec-redirects.yaml: unsure

The remaining roles are:

  • haproxy: required for syslog-ng configuration, switching to rsyslog would make this easier
  • mail_processing: requires a refactoring of the postfix module
  • natted: small refactoring the hosts module, ignore the nodeinfo stuff, it's not used anywhere according to weasel
  • no_hw_clock: small refactoring of the NTP and torproject_org modules

The bulk of the work will be with mail_processing and, obviously, with the syslog transition if we go that route.

comment:9 Changed 7 hours ago by anarcat

natted, mail_processing and no_hw_clock were completed this week.

only ONE role left! whoohoo!

i also removed the has_role function, and the $roles and $localinfo variables as they were not used anywhere. (well, the roles variable was used in ferm, but that was only for the $HOST_ROLE_HAPROXY macro, and *that* wasn't used anywhere, so it was safe to remove).

we still have a handful of other $HOST_ macro references, for what it's worth. all of them are firewall related (ie. grant access to all for backups, ssh, syslog, puppet and, strangely, stunnel, grant access to primary to netnod).

Last edited 6 hours ago by anarcat (previous) (diff)
Note: See TracTickets for help on using tickets.