Better communication for authority operators, core developers in emergency situations

added component::company parent::2664 priority::medium severity::normal status::new type::task labels

Trac:
Description: When in danger or in doubt, run in circles, scream and shout!

traditional motto, possibly naval.

When the bug behind #2664 (moved) happened, it took us a few hours to notice. That was bad, and #2666 (moved) is about trying to notice such situations faster. But another problem is that even after we noticed, it still took a while to sort out who knew how best to contact which operators. Probably developers should get contacted to in the

We should figure out, for each authority operator and core developer[*], the best two or three ways to contact them in the case of an emergency. If these ways are not something we want to publish (e.g., phone numbers), a few people should know them, and all Tor people should know who those people are and how to contact them in a hurry.

We should have some emergency-response mechanisms in place. If communications are security-sensitive, we should have a way to deal with it in place, rather than the current approach of "send gpg-encrypted email to those people whose keys you happen to have" or "immediately go dark, use OTR to talk pairwise to people you know". Those approaches scale badly; we can probably do better.

We should also have planned responses for emergency events like "A key server looks like it might have been compromised"; "somebody has reported a vulnerability"; "somebody has disclosed a vulnerability"; "one or more authorities have gone down strangely;" "looks like the network is crashing;" and so on.

[*] "core developer" is here defined as "a developer who is likely to needed urgently when something breaks."

to

When in danger or in doubt,

run in circles, scream and shout!

traditional motto, possibly naval.

When the bug behind #2664 (moved) happened, it took us a few hours to notice. That was bad, and #2666 (moved) is about trying to notice such situations faster. But another problem is that even after we noticed, it still took a while to sort out who knew how best to contact which operators. Probably developers should get contacted too, so they can be available to deal with bad/urgent bugs.

We should figure out, for each authority operator and core developer[*], the best two or three ways to contact them in the case of an emergency. If these ways are not something we want to publish (e.g., phone numbers), a few people should know them, and all Tor people should know who those people are and how to contact them in a hurry.

We should have some emergency-response mechanisms in place. If communications are security-sensitive, we should have a way to deal with it in place, rather than the current approach of "send gpg-encrypted email to those people whose keys you happen to have" or "immediately go dark, use OTR to talk pairwise to people you know". Those approaches scale badly; we can probably do better.

We should also have planned responses for emergency events like "A key server looks like it might have been compromised"; "somebody has reported a vulnerability"; "somebody has disclosed a vulnerability"; "one or more authorities have gone down strangely;" "looks like the network is crashing;" and so on.

[*] "core developer" is here defined as "a developer who is likely to needed urgently when something breaks."

Lots of topics here jumbled together.

I think one of our main issues is in being able to contact people who are offline.

Part of the challenge there is that Tor people have many things they do at once, and sometimes that means being offline, out of the country, etc. I don't think people would enjoy having pagers, and even if they did the pagers wouldn't work well for plenty of the locations Andrew, Jake, and I go.

So one longterm way to improve things is to make it so there are fewer bottleneck people, that is, there are several people who can actually help to solve an issue, not just several people who can contact the one person who can fix it. Easier said than done of course.

My preference would be to handle more of our "emergency" issues transparently in the open. In my opinion many of the security things we've dealt with over the past year did not need to be done secretly with pairwise OTR conversations, or even with sekrit lists of pgp-encrypted mails. They are issues, we can solve them relatively quickly, the odds that somebody will lurk around waiting to find a vulnerability and then leap on the opportunity are low. By being more open we will involve more of the community, and create more people who can help out in future cases. Talking amongst a small closed community doesn't scale as you say, and worse it doesn't fix the scaling problem. Plus it takes more energy and coordination amongst those trying to keep the secret, and we don't have enough people to waste time on that.

I don't mean to say that no event is so serious that it needs to be kept private until after it's resolved. But I think we're being too conservative on too many issues, and it's impacting both our productivity and our community growth.

wrt offline people, I am not proposing that everybody commit to being always available and findable. I'm proposing instead that we make it easier to do a best-effort attempt to find everybody who is offline, without having to waste time figuring out who knows the best current phone number for whom.

So to be clear, I agree that it would be a bad thing to require developers operators to answer their phones 24/7. But for stuff like "the network will go down if we do not fix it", it is reasonable to call everybody and see who is available to help when the issue comes up.

Replying to nickm:

We should also have planned responses for emergency events like [...] "one or more authorities have gone down strangely;" "looks like the network is crashing;" and so on.

I think the best approach for these types of emergencies is to change the system so fewer of these issues are dire emergencies. #2681 (moved) is a start at that.

More generally, we have a habit of adding features that seem to add some amount of security, but add a lot of ongoing process and hassle, and we need to reevaluate that habit going forward. For example, if I had it to do over again I would argue against periodic v3 signing key expiration. It seems like a fine idea until you actually have to get N people to deal with it on an ongoing basis.

Replying to arma:

Replying to nickm:

We should also have planned responses for emergency events like [...] "one or more authorities have gone down strangely;" "looks like the network is crashing;" and so on.

I think the best approach for these types of emergencies is to change the system so fewer of these issues are dire emergencies. #2681 (moved) is a start at that.

Agreed, but it can't be a comprehensive answer. No matter what we do on #2664 (moved), we will not be able to banish the very possibility that urgent things will sometimes happen. Given that they will sometimes happen, we should be able to deal with them quickly without thrashing so much.

More generally, we have a habit of adding features that seem to add some amount of security, but add a lot of ongoing process and hassle, and we need to reevaluate that habit going forward. For example, if I had it to do over again I would argue against periodic v3 signing key expiration. It seems like a fine idea until you actually have to get N people to deal with it on an ongoing basis.

I'm going to agree with the principle but disagree with the example. I'm not going to argue with this here, unless you think it's on-topic.

Replying to nickm:

wrt offline people, I am not proposing that everybody commit to being always available and findable. I'm proposing instead that we make it easier to do a best-effort attempt to find everybody who is offline, without having to waste time figuring out who knows the best current phone number for whom.

Perhaps a good use of the "internal svn" repository we've been meaning to set up recently? I'm not particularly enthusiastic to have my cell phone number written in an svn file somewhere, but it's probably no worse than having it written in an email that gets sent to a dozen people. Also, a centralized place is easier to keep up to date.

Replying to arma:

My preference would be to handle more of our "emergency" issues transparently in the open. In my opinion many of the security things we've dealt with over the past year did not need to be done secretly with pairwise OTR conversations, or even with sekrit lists of pgp-encrypted mails. They are issues, we can solve them relatively quickly, the odds that somebody will lurk around waiting to find a vulnerability and then leap on the opportunity are low. By being more open we will involve more of the community, and create more people who can help out in future cases. Talking amongst a small closed community doesn't scale as you say, and worse it doesn't fix the scaling problem. Plus it takes more energy and coordination amongst those trying to keep the secret, and we don't have enough people to waste time on that.

I don't mean to say that no event is so serious that it needs to be kept private until after it's resolved. But I think we're being too conservative on too many issues, and it's impacting both our productivity and our community growth.

I agree that we're being too conservative; I'd guess at least 60% of the encrypted email I get never actually needed to be encrypted.

In my opinion, it would actually help us be more transparent if we came up with some rough guidelines here. A description of how to handle what is not only a guideline for what is too sensitive to divulge before it's fixed, but also a guideline for what is not that sensitive, and therefore good to do in public. If as you think we are being too conservative, then coming to a good agreement about the boundaries here will make us less so. Let's talk about that, perhaps on one of the more public mailing lists.

But sometimes, honestly, there will be stuff that we ought not to disclose until it's fixed. And sometimes, there will be stuff that we need to triage to make sure it is safe to disclose before it's fixed. When that that happens-- and it will from time to time-- having a good means to talk about it will help us triage faster and fix stuff faster, thereby actually moving us out of the "ninjas and superspies" phase even faster.

So I take your point as implying that we should not take a better means of secure communication as license to do more things in private. And I agree! But that doesn't mean that secure communication is needless, and it doesn't mean we shouldn't do it better-- and I don't think you mean that, either.

Trac:
Component: Tor Relay to Company
Owner: N/A to phobos

Trac:
Owner: phobos to N/A
Status: new to assigned

So for handling vulnerabilities, let me summarize what I think after talking with arma today, and hearing about what some other projects do.

Let's have a broad security team comprising Tor developers that Tor pays and volunteers whom we trust who seem to be helpful with security.
Let's have that team, and that team only, have access to a separate svn repository for discussing and sharing work on undisclosed vulnerabilities.
There should be a GPG key that only a couple people have that is the official way for people without access to the svn repo to report new vulnerabilities.
We should make sure that when people report stuff, we stay in touch with them to let them know our progress. Else they tend to get angry and disillusioned, I hear.
This SVN repository should send a minimal email to the team only on commits: either encrypted to a pgp key, or giving only a notification that there were commits (maybe a filename?)
If we want to have discussions about stuff, we can do it via the svn repo or via email. Email should cc the entire security team, using gpg. We can have a regularly rotated key that the security team shares (if we're lazy) or a carefully cross-signed set of keys that everybody remembers to encrypt every message to before sending it to the list (if we're brave.)
ALL DISCUSSIONS OF EACH ISSUE SHOULD BE MADE PUBLIC WHEN WE PATCH AND ANNOUNCE. We should use this as a means to become more transparent in how we handle vulnerability reports.

Thoughts?

Trac:
Owner: N/A to nickm

Replying to nickm:

ALL DISCUSSIONS OF EACH ISSUE SHOULD BE MADE PUBLIC WHEN WE PATCH AND ANNOUNCE. We should use this as a means to become more transparent in how we handle vulnerability reports.

Trac is the wrong tool for publishing an archive of previous discussions. We need a tor-misc-archive directory on archive.tpo to hold other files that we need to keep around and have no other appropriate place for (including Git bundles of the mothballed metrics.git repo and the last state of polipo.git on our Git server), and we can put archived security discussions there, too.

Replying to rransom:

Replying to nickm:

ALL DISCUSSIONS OF EACH ISSUE SHOULD BE MADE PUBLIC WHEN WE PATCH AND ANNOUNCE. We should use this as a means to become more transparent in how we handle vulnerability reports.

Trac is the wrong tool for publishing an archive of previous discussions.

Right; I wasn't meaning to suggest trac.

Replying to nickm:

Replying to rransom:

Replying to nickm:

ALL DISCUSSIONS OF EACH ISSUE SHOULD BE MADE PUBLIC WHEN WE PATCH AND ANNOUNCE. We should use this as a means to become more transparent in how we handle vulnerability reports.

Trac is the wrong tool for publishing an archive of previous discussions.

Right; I wasn't meaning to suggest trac.

If our goal is to publish the svn directory afterwards, that means either a) the discussion should happen in svn, not in email; or b) we should stick the email thread into svn as one of our last acts.

On the theory of "don't introduce extra steps that might require thinking, especially if the steps would be done after the issue is finished", I think 'a' will be more robust.

Revised plan:

Here's the part that's basically done:

Let's have a broad security team comprising Tor developers that Tor pays and volunteers whom we trust who seem to be helpful with security.
To be on the secteam, Nick and Roger must agree that you should be on the secteam. You need to agree to practice basic data hygiene, follow responsible-disclosure practices with all Tor-related vulnerabilities you find, and help with resolving security issues. For now we are only taking volunteers whom one of us has met, and who have worked on fixing security issues in Tor in the past. Once we get up to speed we might expand this.
Let's have that team, and that team only, have access to a separate Git repository for discussing and sharing work on undisclosed vulnerabilities.
ALL DISCUSSIONS OF EACH ISSUE SHOULD BE MADE PUBLIC WHEN WE PATCH AND ANNOUNCE. We should use this as a means to become more transparent in how we handle vulnerability reports.
There should be a GPG key that only a couple people have that is the official way for people without access to the git repo to report new vulnerabilities, and an official email address for it.
We should make sure that when people report stuff, we stay in touch with them to let them know our progress. Else they tend to get angry and disillusioned, I hear.

We have not decided about :

This git repository should probably notify team members of new commits somehow. It should either use pgp-enrypted mail, or give a notification only saying "There was a commit by personname". (Branch names and file names are not a great thing to leak.)
If there should be some kind of encrypted mailing list for the whole team. I am leaning to no.
How best to actually do stuff in the repo
Where to publish resolved issues, on what schedule.

Set all open tickets without a severity to "Normal"

Trac:
Severity: N/A to Normal

I am not actually working on these tickets, so they shouldn't be assigned to me.

Trac:
Owner: nickm to N/A

Change tickets that are assigned to nobody to "new".

Trac:
Status: assigned to new

Better communication for authority operators, core developers in emergency situations

Child items ...

Activity