wiki:org/meetings/2017Montreal/Notes/BusFactor

Bus factor session

Date: 13/10 2017. Host: Shari

We start out by looking into what we are trying to identify: is it technical, social, org related, or all of them? We decide to focus on technical assets for now.

Roger starts out by trying to define the bigger picture:

  • Hetzner rented servers that run website and different services. Maintained by qbi and weasel.
  • We have six volunteers that are running our website servers. Weasel knows the people who runs this. SUNet runs one of them and it is hosted by Linus(?)
  • We have corp. SVN service. It's unknown where the location of this service is hosted. We believe we have backups (and we are unsure where these backups are).

Jens (qbi) is joining the session to help us figure out the technical details of our setup. The group introduces the session topic to Jens.

  • The backup setup is currently at Cymru. We discuss geographical distribution. We discuss problems related to having everything at Hetzner: what happens if the company goes bankrupt? The entire network is down in multiple data-centers. Cymru is a different place than Hetzner, so it is distributed.

Nick mentions that there is a central file of our hosts and the puppet (system management tool for configuring and maintaining multiple servers) configuration is in a Git repository.

Roger moves on to talk about the directory authority servers: we have 9 or 10 - there is one more upcoming, but we are not sure what the status is. They are currently decided based on being "friends of the other directory authorities". We should focus on geographical distribution - Roger thinks we are doing great on this compared to many others.

There is only a subset of the 9-10 directory authorities that are running bandwidth authorities, which is problematic. Nick says we are going to work on this code over the next year and it has been discussed during this meeting (Tor Dev Montreal 2017).

The Tor Browser team is having sponsored hosting by fastly to host the update files for the browser. The value of this is $20.000 to $30.000 per month which is high. The distribution website that Tor hosts is where the files are uploaded to and fastly acts as a CDN for this site. The group discusses the problem if fastly goes away that the amount of requests would most likely take down dist.torproject.org due to the amount of capacity needed.

Nick mentions a password database for things like Paypal passwords. It is PGP encrypted for Nick, Roger, Shari, and Wendy. This file is stored in the corporate SVN repository. Jon, Tommy and Sue has access to Paypal, but none of them have access to the SVN repository - they might have been rotated without the larger groups knowledge.

Jens mentions that he found out that we do have off-site backup: Cymru. Rob or Sina might be our contact for this.

PGP Keys

Roger moves over to talking about PGP keys.

  • The tor browser is signed by a key where if is lost we wouldn't be able to automatically update the tor browser.
  • The tor browser is signed by two people(?)

We need to identify if there are pins for the PGP keys for Tor browsers.

External Services

Nick mentions that there are accounts on different services, for example, Apple's App Store account that is registered for execdir@, but that multiple people have access to via different accounts.

LDAP requires that we create a ticket so that we have history on the creation of the account. Nick or Roger have to approve it. We never delete accounts right now - should maybe change?

Mailman: Nick, Damian, Roger, Jens, and Weasel have access to this. We need some documentation about how mailman is used.

Nick mentions that we could do an "audit" where we go over our puppet configuration, list our virtual machines, do port scans, etc.

We talk about the difference between service admins vs. system admins. Jens and Weasel are system admins, and hiro is a service admin.

grants.gov and nsf.gov has a set of users that receives emails. Some might not be involved with Tor anymore? Roger mentions that the financial "stuff" (ahf assumes this means 503 finances) might have sites like grants.gov that Brad might have access to.

Software Development

We have projects with one maintainer that signs releases, but its problematic if that person leaves for some reason.

Nick mentions that if Isa is going away for a longer period of time we have in trouble. Roger mentions that if this happens who is having contact to OTF for example?

Shari mentions that with staff we should be sure to have redundancy - it is also mentioned that everyone should be able to go on vacation.

Trust Bottlenecks

Roger mentions that, for example, if Nick is the only one announcing Tor releases that if somebody else did the release would anybody believe there is such release?

Weasel is a trust bottleneck for administration.

We discuss redundancy around financial stuff like what happens if Sue goes on vacation and we need to contact the auditors.

Physical Security

Nick mentions that it is good that with Git as part of the software development model that everyone have the entire history of the repository with commits, etc.

Personal Contacts

How we do we ensure that people knows different people that we depend upon? We should make sure that at least two people knows who to reach out to for the different things that we depend upon. This is focused mostly on funders.

Other groups that are related:

  • IFF (meeting in Valencia).

Shari mentions that these things are not as problematic because we can reach out to them again as friends. Shari mentions that Steph is taking the first steps for this to ensure that we know when events are taking place and where.

## What happens if something bad happened to Hetzner?

  • IRC would be up.
  • Development would work: full repositories are distributed.
  • $services would be down.

Generally things would be "OK" for most people (not the sysadmins!).

What happens if someone attacks the directory authorities?

We currently need at least 5 of them online to work. How well are these monitored? It sounds like people identify very quickly if a directory authority is down.

Nick mentions that until the Montreal hackfest (in 2016?) that the network team found out it is possible to (re)start a network via testnet, but with a lot of work.

Directory authorities should have their keys offline. This is something the directory authority people should talk about.

History in the org

We haven't documented very well when things happened historically to TPO.

We have an unwritten list of people that we are looking out for, but that a lot of people have in mind in case they pop up.

Taylor mentions that there are historians who are interested in technical "companies" and document their history. This is focused especially on oral stories that are very known by Paul, Nick, and Roger.

Shari mentions if this would be an interesting internship opportunity over a summer to write down the oral history of Tor? It is mentioned that the person shouldn't be too "journalist'y". Benjamin Mako Hill might know someone here?

Metrics single point of failures

The metrics team used to have some cron jobs that was troublesome.

## What happens if Roger goes away (for some reason?)

We don't know all the people that Roger have contact to - and it is a lot.

Roger is the only person who can interpret NSF. Matt Blaze might be able to help here.

Roger mentions when we have contracts that have key employee where if that they leave the contract might go away by the funder. This is an opportunity for the funder to get out of the contract that they might be able to use.

Collaborators on projects

This should be possible to find out by going over Tommy's list.

Hiring Tor sysadmin

This is a problem in that we cannot just send out an open letter to hiring people and then give them root on everything.

Does hiro have access to what she needs? Does she need lower level access to the systems.

The donation infrastructure

The donation infrastructure is independent of normal infrastructure (run by people outside of Tor). We are unsure about the administration of this. Giant Rabbit is running the service.

Board bottlenecks

We do not believe we have any board bottlenecks.

What happens if the ED leaves

We would go to brad and ewyatt and ask them for what to do?

What about relationships?

Who knows where these x amount of USD is now since they are not in our bank account?

Social bottlenecks

We need to be sure that if key employee leaves that things are passed on to the rest of the team.

We should go over the vegas team and see how much $stuff they have and what knowledge they have that might not be shared.

Torservers.net

If Moritz disappears what happens here? Juris and qbi are the backup persons. Colin is helping out.

We discuss the structure and mechanisms around torservers.net, how money flows to country-based NFP orgs that runs relays.

How do we handle an attack?

  • Ensure we have two people that is able to do the work.
  • Ensure that the two independent people is able to verify the work at different times.

Physical documentation storage

Nick mentions that it is possible to print documents and store them behind a physical lock.

Things that we want to store, but never want to look at, is excellent for storing on papers.

Action item

  • We need a better password management solution than the one we have in corporate SVN right now.
  • We should look over if the password's in this database should be rotated.
  • Figure out if the passwords for paypal have been rotated by Jon et al and ensure that it will be put in the password database. We should also look into the "paypal dongle" or 2-step authentication?
  • Figure out if weasel can attend the meeting in Rome or if we can meet weasel somewhere to talk about sysadmin stuff.
  • Look into 2-step authentication recovery for phones that could potentially be problematic.
  • Look into account recovery questions: "what is your first pet name?"
Last modified 13 months ago Last modified on Oct 15, 2017, 8:39:50 PM