Opened 16 months ago

Last modified 11 months ago

#30020 accepted project

switch from our custom YAML implementation to Hiera

Reported by: anarcat Owned by: anarcat
Priority: Medium Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: #29387 Points:
Reviewer: Sponsor:

Description (last modified by anarcat)

We currently use a custom-made YAML database for assigning roles to servers and other metadata. I started using Hiera for some hosts and it seems to be working well.

Hiera is officially supported in Puppet and shipped by default in Puppet 5 and later. It's the standard way of specifying metadata and class parameters for hosts. I suspect it covers most of our needs in terms of metadata and should cover most if not all of what we're currently doing with the YAML stuff in Puppet.

We should therefore switch to using Hiera instead of our homegrown solution.

This involves converting:

  • if has_role('foo') { include foo } into classes: [ 'foo' ] in hiera (DONE!)
  • hardcoded macros in the ferm module's me.conf.erb into exported resources (DONE, except for HOST_TPO)
  • templates looping over allnodeinfo into exported resources
  • the $roles array into Hiera (DONE!)
  • the $localinfo into Hiera (assuming all the data is there) (DONE!)
  • the $nodeinfo and $allnodeinfo arrays into Hiera (assuming we can switch from LDAP for host inventory)
  • basically any other stuff of the kind, including those files:
    ./modules/torproject_org/misc/local.yaml <- DONE!

Ideally, all YAML data should end up in the hiera/ directory somehow. This is the first step in making our repository public (#29387) but also using Hiera as a more elaborate inventory system (#30273).

The idea of switching from LDAP to Hiera for host inventory will definitely need to be evaluated more thoroughly before going ahead with that part of the conversion, but YAML stuff in Puppet should definitely be converted.

The general goal of this is both to allow for a better inventory system but also make it easier for people to get onboarded with Puppet. By using community standards like Hiera, we make it easier for new people to get familiar with the puppet infrastructures and do things meaningfully.

Update: get_roles(), has_role(), yamlinfo() and local.yaml are *all* gone! The main chunks remaining are now nodeinfo(), allnodeinfo(), $nodeinfo and hoster.yaml. A plan has been laid out for that replacement below. Obviously, the ipsec, static components and redirects YAML files could use a transition into Hiera as well, but those are lower priority.

Child Tickets

Change History (13)

comment:1 Changed 16 months ago by anarcat

this has started. most of site.pp has been emptied, with the easy stuff first. the hard stuff are hosts where the $roles function is actually relevant. for example, the following ferm macros are probably actually in use:


... and probably more, namely bacula. other classes will refer to the $roles or nodeinfo lists explicitely as well and will need to be broken up in separate classes that then get properly included. but it's a great start and so far no breakage that i know of.

i documented the impact of the change in site.pp, but it might be good to add something to the wiki docs as well.

comment:2 Changed 16 months ago by anarcat

some more progress, but this time harder stuff: I converted the DNS servers to Hiera. this involved splitting some classes and exporting resources. in my travels, those are the important HOST_ROLE_ ferm rules that I found might be problematic:


I also found HOST_NETNOD but I think that might be a static definition.

HOST_ROLE_DNS_SECONDARY is now gone, and replaced by exported ferm::rule constructs. This works well, but @weasel was somehow worried about security issues with exported resources, which I am not sure are relevant in this case.

Another problem is that the ferm module is setup to ''realize'' the virtual ferm::rule` stuff defined everywhere. This implies that the exported resources are also realized locally. That's fairly harmless, because the host allows itself access to itself, but it's noisy and annoying.

I don't know why ferm::rule entries are virtual everywhere, so that's something I'd like to explore as well in the future.

Another problem I found when working on the DNS stuff is that the DNS primary does checks on the the DNS secondaries, seemingly through NRPE, because it is in the allowed_hosts list in the NRPE config. This makes it impossible to remove the dns_primary role from local.yaml for now and I'm not sure how to work around that without creating a global variable for the DNS primary host, which would be an unfortunate regression.

So two pending questions:

  1. what is the security issue with exported resources? is the current pattern used in the bind module and prometheus profile acceptable?
  1. why are ferm::rule entries virtual?
  1. how can we export arbitrary IPs in configuration files in Hiera? specifically, how do we construct NRPE's allowed_hosts list of IPs from other hosts?

My tentative guesses on this are:

  1. impact minor, even if security issue (possibility to manipulate firewall rules between nodes)
  2. probably just an oversight?
  3. i feel dirty saying it, but a fancy sed Exec exported resource?

comment:3 Changed 16 months ago by anarcat

Another possible solution is to move from LDAP to Hiera for host metadata. That is where, after all, Puppet is getting some of those IP addresses from and it would be possible to simply do lookups in Hiera for those, if it was properly loaded and ordered.

Another case I found is roles::weblog_sink which constructs SSH keys from the YAML data. This could be generated from exported resources as well, for example with the ssh_authorized_keys builtin type.

So in other words, I think this project is doable, but it will require refactoring and lots of work.

In the end, though, we would have one YAML file per host in hiera/nodes/$FQDN.yaml. This could be made fairly human-readable if we make a good template, and be the single source of truth for all information about a host including hosting provider, cost and so on, solving our inventory problem, (partly) described in #29816.

I think this is worth it and will make it easier to get people involved in Puppet work.

comment:4 Changed 16 months ago by anarcat

site.pp is now mostly empty. all the has_role constructs are gone from there.

those two are gone as well:


the trickiest part, surprisingly, was the little warning added to the motd. i've hacked something together using update-motd.d but i'm actually quite unhappy about it, because it doesn't display the same way that it did before. if the machines were all running buster, this wouldn't be a problem anymore because there's /etc/motd.d there, but we're probably stuck in stretch for a while.

since this is only for *three* machines, I think we can afford the little ugliness for now.

Linux build-arm-02 4.19.0-0.bpo.4-arm64 #1 SMP Debian 4.19.28-2~bpo9+1 (2019-03-27) aarch64

 Note that this host is _NOT_ being backed up.  If you care about your
 data, run your own backups.

This device is for authorized users only.


Welcome to, used for the following services:

 If you use this as a porter/buildbox, you might find helpful.


Last login: Fri Apr 19 20:44:31 2019 from

I have also found HOST_TPO which is basically a list of the public IP of all TPO hosts, as taken from LDAP (modules/puppetmaster/lib/puppet/parser/functions/allnodeinfo.rb). So we can keep that macro for now until we decide about the overlap between LDAP and Hiera. The motd is similarly extracted mostly from stuff in LDAP and would benefit from such a refactoring as well.

Anyways. Next up is the roles file, which has tons more fun stuff like this to clear out. :)

Note that I've had answers to my earlier questions, somehow:

  1. I don't think there's any serious security issues with exported resources, they way they're setup. At worst a host might be able to push different firewall holes than expected. If we want to fix that issue, we can make new defines with hardcoded definitions that, when collected on hosts, will only poke the holes that are expected.
  1. it's just a copy-paste historical error, that I've made myself in other occasions
  1. no solution to the NRPE allowed_hosts problem just yet, but I'm tempted to just use a hardcoded variable for now. this is what is used for bacula::bacula_director_address for example: it's hardcoded to so there's prior art to hardcoding stuff like that. of course it would be hardcoded into hiera, not the class name, ideally...

comment:5 Changed 16 months ago by anarcat

Description: modified (diff)

i did more work here. the following macros have now been safely removed:


This also led to the removal of a custom SSH keys generation template (modules/roles/templates/weblog_sink/webstats-authorized_keys.erb), although it hasn't been converted to the native ssh_authorized_keys because of the format difference between the custom fact we use to export the ssh keys and the one expected by the type. This could be fixed in another refactoring at some other time.

Now, I'm working on the static_* stuff, which is like weblog_* but a little more complicated because the config files are not (yet) built with config::fragment. The SSH firewall configuration was a little more complicated but it's been migrated already. Next up is the authorized_keys which should follow the same pattern as the weblog stuff and then the config::fragment conversion. There are also corner cases with more sub-roles for that one that will need to be taken into account, but those can hopefully be converted into class parameters.

There are now 36 roles left in the roles class. There were about 50 roles, split between site.pp and the roles class, when I started this, about a week ago, so i think it would be fair to assume this first part of the conversion will be done in a week or two.

comment:6 Changed 15 months ago by anarcat

i got a little tired of battling this, so I took a small break. I still migrated a few roles:


many of those were easy marks: the ssl::service stuff were just a lot of copy-paste, which might have been better implemented by having a parametrized class with the node-specific parameters in hiera, something like:

class profile::ssl_web($name, $onion = false) {
   ssl::service { $name: notify => Exec['service apache2 reload'], key => true, onion => $onion }

And in (say), you would have:

profile::ssl_web::name: ""
profile::ssl_web::onion: true
  - profile::ssl_web

... but I didn't want to overthink this just yet. plus we might want to manage those services more closely in Puppet eventually and such a class would just make it difficult. Besides, i suspect this would belong in the Apache module, not in a profile. And we should have a role in Hiera instead of a profile, so we would end up creating the equivalent of the profile I ended up making anyways:

class profile::lists {
  ssl::service { '':
    notify => Exec['service apache2 reload'],
    key    => true,

So I think it's the right conversion for now. I'm not converting the entire hierarchy to R/P/M just yet anyways, just switching to Hiera is enough work as it is.

There are now 22 has_role calls left in the main roles class, down from around 50. Unfortunately, there is actually more roles in the local.yaml file (33) that I haven't considered or noticed, so we haven't crossed the magic halfway point just yet.

comment:7 Changed 14 months ago by anarcat

down to 6 has_role (down from ~50) in the main roles class, thanks to the help of hiro who joined in the effort. there are also still 18 roles (down from 57) left in local.yaml, which i'll try to tackle next. there are some leftovers of the static-* roles there that I seem to have skipped over. they are bound to SSH key propagation and internal class parameters, so it was likely deliberate.

but we have definitely cross the halfway point, and I'd say we're getting close to the finish line, at least with regards to the custom has_role stuff. there's naturally more stuff that could move to Hiera and other YAML files strewed around the codebase, but this is a huge chunk that will be done shortly.

Those are the files I am currently aware of that would benefit to be transitioned into Hiera:


But I suspect many of those will be easier than the wide-ranging has_role transition, as each one of those file touches one or only a few module, as opposed to the local.yaml file which touched *everything*.

So, good progress, even if slow.

comment:8 Changed 12 months ago by anarcat

Description: modified (diff)
Status: assignedaccepted

we now have:

  • 3 has_role references
  • 4 roles left (haproxy, mail_processing, natted, no_hw_clock)
  • 2 localinfo references (in postfix, related to mail_processing)
  • 13 allnodeinfo references
  • 26 nodeinfo references

That's on the stuff that I started working on at all. The hoster.yaml stuff, in particular, is a whole other ball game. It's less work than the larger local.yaml, but still impacts a lot of things, which are mostly visible in the nodeinfo calls:

anarcat@curie:tor-puppet(master)$ git grep -c nodeinfo

Similarly, the allnodeinfo construct imports a lot of stuff from LDAP into Puppet, which we might want to move into Hiera. That, however, could be left for a second phase as it would significantly disrupt the current host lifetime workflow.

The status of the YAML file conversion is as follows:

  • ./modules/torproject_org/misc/hoster.yaml: not started
  • ./modules/torproject_org/misc/local.yaml: 53/57 roles done! almost finished, see below for the status of the remaining 4
  • ./modules/ipsec/misc/config.yaml: will be phased out in favor of the new exported resource system built for the new networks on fsn-node-*
  • ./modules/roles/misc/static-components.yaml: maybe easier to keep as such for now, or rewrite the static backend to read the file directly?
  • ./modules/roles/files/spec/spec-redirects.yaml: unsure

The remaining roles are:

  • haproxy: required for syslog-ng configuration, switching to rsyslog would make this easier
  • mail_processing: requires a refactoring of the postfix module
  • natted: small refactoring the hosts module, ignore the nodeinfo stuff, it's not used anywhere according to weasel
  • no_hw_clock: small refactoring of the NTP and torproject_org modules

The bulk of the work will be with mail_processing and, obviously, with the syslog transition if we go that route.

comment:9 Changed 12 months ago by anarcat

natted, mail_processing and no_hw_clock were completed this week.

only ONE role left! whoohoo!

i also removed the has_role function, and the $roles and $localinfo variables as they were not used anywhere. (well, the roles variable was used in ferm, but that was only for the $HOST_ROLE_HAPROXY macro, and *that* wasn't used anywhere, so it was safe to remove).

we still have a handful of other $HOST_ macro references, for what it's worth. all of them are firewall related (ie. grant access to all for backups, ssh, syslog, puppet and, strangely, stunnel, grant access to primary to netnod).

Last edited 12 months ago by anarcat (previous) (diff)

comment:10 Changed 11 months ago by anarcat

grand milestone today: local.yaml was removed from the repository, along with get_role and yamlinfo, which are all now useless.


Next step: hoster.yaml

the next chunk we need to convert would be, i think, ./modules/torproject_org/misc/hoster.yaml, which specifies those things:

  • netrange: used to create the TPO_NET macro in ferm (unused?) and determine in which hoster a given host is (through whohosts.rb, which does IP range calculations from the host's IP as seen from LDAP)
  • mirror-debian: used in torproject_org class to define the APT mirror for this host
  • mirror-debian-security: unused?
  • nameservers: used to configured upstream forwarders in unbound on each host
  • nameservers_break_dnssec : used to disable unbound forwarding in case of broken upstream DNS, unused
  • allow_dns_query: used to tell unbound to allow other network ranges (ie. generally on this site) to use *this* node as recursive DNS server (if misc.resolver-recursive is true, which is the case when the LDAP ip of the host is listed in the hoster's nameservers list) has hooked hosters.yaml into hiera, and the way they did it is to have one .yaml file per hoster, for example:

Unfortunately, the hoster.yaml is still present on d.o:

there we have the same code as on tpo:

    yamlfile = Puppet::Parser::Files.find_file('debian_org/misc/hoster.yaml', compiler.environment)

Here the file, which contains more than just ip ranges:

So their transition isn't complete, but it matches some of the ideas I had (namely to have one YAML file per hoster).

So what I would suggest we do to get rid of the hoster.yaml file is this:

  1. convert all the aforementioned variables used in hoster.yaml into class variables, defaulting to the values loaded from hoster.yaml (ie. $nodeinfo)
  2. test the variables by overriding them in (e.g.) hiera/nodes/
  3. break up the hoster.yaml file into multiple smaller files in hiera/hoster/%{hoster}.yaml
  4. add that path to hiera.yaml
  5. test that host can load its variables from the hoster search path by hardcoding a value by hand in facter
  6. create a new YAML variable that gives us a IP range -> hoster mapping
  7. create a function that looks through those to guess the hoster for a given IP address
  8. use that function to create a fact (through a template, but with a variable defined in the base class) that defines the $hoster variable that hiera will use to load the right YAML
  9. remove hoster.yaml

That's a first step. At that stage, hoster.yaml is gone, but $nodeinfo remains and still might contain host-specific configuration. Those should be extract *out* of the $nodeinfo construct and into manifest business logic. And *then* the nodeinfo.rb code can be ripped out.

There might be a better way to define a hoster per node than guess it with its IP address and drop it as a fact, but I can't think of anything right now.

comment:11 Changed 11 months ago by anarcat

Description: modified (diff)

comment:12 Changed 11 months ago by weasel

hoster.yaml now only has the networks that define which hoster a node is at. It's still used for the whohosts function and for ferm to make up what we consider tor networks.

We ship the hoster name as a fact to the node, and then include hieradata based on that fact, defining things like the debian mirror.

comment:13 Changed 11 months ago by anarcat

awesome. so we're at step 6:

  1. create a new YAML variable that gives us a IP range -> hoster mapping
  2. create a function that looks through those to guess the hoster for a given IP address (probably just fixing whohosts?)
  3. use that function to create a fact (through a template, but with a variable defined in the base class) that defines the $hoster variable that hiera will use to load the right YAML (DONE, right?)
  4. remove hoster.yaml
Note: See TracTickets for help on using tickets.