wiki:org/teams/NetworkHealthTeam

Network Health Team

About us

Welcome to the Network Health page! There are several people in the Tor community taking care of the network's health.

The five areas that we focus on are:

(1) track community standards about what makes a good relay

  • publish up-to-date expectations for relay operators
  • set best practices for how to set relay families
  • detect and resolve bad relays
    • exitmap, sybil detection, hsdir traps

(2) anomaly analysis / network health engineer [with network team]

  • establish baselines of expected network behavior
  • look for and resolve denial of service issues
  • track connectivity issues between relays
  • look for relays hitting resource limits

(3) make sure usage/growth stats are collected and accurate

  • track network performance, relay diversity by various metrics
  • count users [with network team and metrics team]
  • monitor bridge growth and usage [with censorship team]

(4) relay advocacy [with community team]

  • maintain docs for setting up and running relays and bridges
  • grow a cohesive community of relay operators so they have peers
    • keep relays on the right tor versions
  • relaunch a gamification / badge system for lauding good relay progress
  • strengthen relationships with non-profit orgs that run relays
  • help companies that want to offset their tor network load

(5) maintain the components of the network

  • maintain directory authority relationships
  • keep bandwidth authorities working (including setting the right balance between speed and location diversity)
  • have enough tor browser default bridges, and keep them running smoothly [with censorship team]
  • update the fallbackdirs list

Communication Channels

We do have a public and archived mailing list though: https://lists.torproject.org/cgi-bin/mailman/listinfo/network-health

Resources

PRIORITIES

  1. detect and resolve bad relays
    • exitmap, sybil detection, hsdir traps
  1. anomaly analysis / network health engineer [with network team]
    • establish baselines of expected network behavior
    • monitor network disruption or problems
  1. relay advocacy [with community team]
    • strengthen relationships with non-profit orgs that run relays
    • maintain docs for setting up and running relays and bridges
  1. make sure usage/growth stats are collected and accurate [with metrics team]
    • track network performance, relay diversity by various metrics
  1. maintain the components of the network to keep it healthy
    • keep bandwidth authorities working (including setting the right balance between speed and location diversity)

PRIORITIES FOR 2020

As our capacity has been reduced, in 2020 we are going to focus on maintaining essential services.

  1. Get all critical sbws bugs fixed so we can replace Torflow.
  2. Run the "bad hsdir" hunter scripts and other exitmap scripts.
  3. Surprise 'anomaly analysis' on the network as needed.
  4. Keep moderating and answering the tor-relays mailing list.
  5. Maintain the relay operation documentation.
  6. Maintain the list of fallbackdirs.
  7. Maintain the set of default bridges in Tor Browser.

Tickets

So far there is no own network health component in our Trac system and we likely won't create a new one as we are about to migrate to an own Gitlab instance. For now we use the network-health keyword on tickets that should be on the radar of the people caring about Tor's network health.

Owner: ahf (2 matches)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#33411 Make DirCache default to 0 on Windows relays, if we can't fix the mmap issues assigned ahf Medium Normal 5 months ago
#24857 Tor uses 100% CPU when accessing the cache directory on Windows assigned ahf High Normal 5 months ago

Owner: asn (1 match)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#33844 Do next iteration of proposal by folding in comments from dgoulet/mike assigned asn Medium Normal 7 months ago

Owner: cohosh (1 match)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#32545 Perform measurements to concretely understand snowflake throughput and network health assigned cohosh Medium Normal 6 months ago

Owner: dgoulet (3 matches)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#33072 When under load, give 503 aggressively for dirport requests without compression needs_revision dgoulet teor Medium Normal 5 months ago
#33843 Write detailed priority queue scheduler design on the proposal assigned dgoulet Medium Normal 7 months ago
#33361 relay: Warn about the lack of ContactInfo and the consequence merge_ready dgoulet gk Medium Normal 8 months ago

Owner: ggus (5 matches)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#34009 update legacy TorRelayGuide and Exit Notice HTML page w/r/t DNSEL changes new ggus Medium Normal 6 months ago
#33499 Create contact to Tor-friendly folks at Akamai new ggus Medium Normal 7 months ago
#33695 Setup process for repeatedly check for exits that can't handle DNS queries new ggus Medium Normal 7 months ago
#32915 Cloudflare alt-svc failures cause spurious "DNS resolution error" in Tor Browser new ggus Medium Normal 7 months ago
#33174 We want to have som automation to detect relay problems (both malicious and accidental) new ggus Medium Normal 9 months ago

Owner: gk (11 matches)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#33500 Figure out what we want to learn about the Akamai interface for Tor exit discrimination assigned gk Medium Normal 7 months ago
#33067 DocTor should fetch microdesc consensus assigned gk Medium Normal 7 months ago
#33180 Fix issues with bad relay scanners assigned gk Medium Normal 7 months ago
#33181 We should look over all our bad relay scanners we have and document them assigned gk Medium Normal 7 months ago
#33696 Integrate badexiting into the badconf-entry.py script assigned gk Medium Normal 7 months ago
#33758 Fix exitmap related bad relay tests assigned gk Medium Normal 7 months ago
#33699 Create an exitmap module for DNS exit checks assigned gk Medium Normal 7 months ago
#33466 Create contact list of Tor-friendly people at large sites assigned gk Medium Normal 8 months ago
#33179 Make a more fine-grained test for Arthur's exit scanning for detecting DNSSEC issues assigned gk Medium Normal 8 months ago
#33158 Make DocTor Python 3 compatible new gk Medium Normal 9 months ago
#20969 Detect relays that don't update their onion keys every 7 days. assigned gk Medium Normal 10 months ago

Owner: metrics-team (15 matches)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#33010 Monitor cloudflare captcha rate: do a periodic onionperf-like query to a cloudflare-hosted static site new metrics-team Medium Normal 5 months ago
#33178 Figure out specific baselines we are interested in from a network health perspective new metrics-team Medium Normal 7 months ago
#33176 Check whether all of our growth stats we want are collected and accurate new metrics-team Medium Normal 7 months ago
#33663 Check checktest.py related errors shown by our network-health scanners new metrics-team Medium Normal 7 months ago
#5830 Write tool to automate web queries to Tor; and use Stem to track stream/circ allocation and results assigned metrics-team Medium Normal 9 months ago
#29343 Run arthur's DNS timeout scanner, archive it in CollecTor, and add it to Onionoo new metrics-team Medium Normal 9 months ago
#23509 Implement family-level pages showing aggregated graphs assigned metrics-team Medium Normal 9 months ago
#26124 Bring​ back Tor​ Weather new metrics-team Medium Normal 10 months ago
#12131 Measure connectivity patterns between relays assigned metrics-team Medium Normal 10 months ago
#26089 collect and archive DNS resolver data of tor exits new metrics-team Medium Normal 10 months ago
#29344 Consider heartbeat frequency, logging and extra-info statistics new metrics-team Very High Normal 10 months ago
#27235 add route_origin_rpki_validity field new metrics-team Medium Normal 10 months ago
#27155 Include BGP prefix information in details documents new metrics-team Medium Normal 10 months ago
#26585 improve AS number and name coverage (switch maxmind to RIPE Stat) new metrics-team Medium Normal 10 months ago
#28529 Confirm that the strange onionoo flood is resolved new metrics-team Medium Normal 17 months ago

Owner: neel (1 match)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#32672 Reject 0.2.9 and 0.4.0 in dirserv_rejects_tor_version() merge_ready neel teor Medium Normal 8 months ago

Owner: tbb-team (2 matches)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#33457 Twitter shows "Something went wrong." with a "Try again" button new tbb-team Medium Normal 7 months ago
#19119 Repurpose block-malicious-sites-checkbox on TLS error page in Tor Browser new tbb-team Medium Normal 8 months ago

Owner: tom (1 match)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#33649 Show progress towards flags on consensus health new tom Medium Normal 7 months ago

Owner: (none) (28 matches)

Ticket Summary Status Owner Reviewer Priority Severity Modified
#7193 Tor's sybil protection doesn't consider IPv6 needs_revision nickm Medium Normal 5 months ago
#8163 It is no longer deterministic which Sybils we omit new Medium Normal 8 months ago
#5565 MyFamily should provide an alternate non-idhex subscription mechanism reopened Medium Normal 10 months ago
#33530 Dir auths should notice relays with wrong clocks and act somehow (BadClock flag, withhold Guard) new Medium Normal 5 months ago
#33350 Is sbws weighting some relays too high? new Medium Normal 5 months ago
#33712 Design a PoW scheme for HS DoS defence new Medium Normal 7 months ago
#31223 Research approaches for improving the availability of services under DoS new Medium Normal 7 months ago
#26769 We should make HSv3 desc upload less frequent needs_information asn Medium Normal 8 months ago
#11207 Sybil selection should be trickier to game new High Normal 8 months ago
#25884 add support for exitmap requirements new Medium Normal 8 months ago
#26094 increase minimal bandwidth requirements, update the manpage, relay guide and FAQ new arma Medium Normal 8 months ago
#33175 Build a roadmap/brainstorm all the future things we might automate measuring new Medium Normal 8 months ago
#33351 Are bandwidth authorities concentrating too much bandwidth in one area? new Medium Normal 8 months ago
#33150 Allow to connect to an external control port new Medium Normal 9 months ago
#15060 Decide the fate of MyFamily new Medium Normal 10 months ago
#28860 Increased DNS failure rate when using ServerDNSResolvConfFile with tor 0.3.4.9 (as opposed to 0.3.3.x) new Medium Normal 10 months ago
#26691 add 'working DNS' to the list of mandatory requirements for the 'exit' flag new Medium Normal 10 months ago
#24014 Make exits check DNS periodically, and disable exit traffic if it fails new Medium Normal 10 months ago
#12389 Should we warn when exit nodes are using opendns or google dns? needs_revision High Normal 10 months ago
#20055 Remove relays that fail to rotate onion keys from the consensus new Medium Normal 10 months ago
#28969 Onion Service v3 connection status update event new Medium Normal 11 months ago
#28968 Onion Service v2 connection status update event new Medium Normal 11 months ago
#28967 Tor control command to connect to Onion Service new Medium Normal 11 months ago
#19068 Write and run a clique reachability test. new Medium Normal 12 months ago
#31291 non-public relay health metrics for operators new Medium Normal 14 months ago
#31290 provide DNS health metrics for tor exit relay operators new Medium Normal 15 months ago
#30487 dirmngr goes berserk making tor requests after gpg --recv-key attempt ends new Medium Normal 18 months ago
#30420 Should we recommend that relay operators turn on tcp bbr? new Medium Normal 18 months ago

Last modified 6 months ago Last modified on May 5, 2020, 6:26:42 PM