DoS and failure resistence improvements
We just had a near-catastrophe today when an IPv6 relay descriptor took out all of the Tor directory authorities. It took us ~10hrs to correct this issue. The maximum we had before the network breaks for everyone is 28hrs. We need to consider implementing some procedures to both reduce the amount of turnaround time it takes to diagnose and fix cases like this, and also enhance the network's ability to function if we can't bring the authorities back online within 28hrs.
This ticket is the parent ticket for a series of child tickets that have been created to remind us to create actual proposals and procedures.