wiki:doc/MonitoringFramework

Monitoring Infrastructure

Overall Design

  • What will alarm scripts look like?
    • A bunch of Python scripts.
  • What sort of capabilities this would need?
    • <Insert-capabilities>
  • What would we not need?
    • Historical Data
      • We'll want to keep our own state files
      • Don't really need to have access to past descriptors.
      • Independent of the metrics database.
  • How are notifications done?
    • Email? IRC?
  • Where are you getting your data?(passive/active)
    • I think if active requests are in scope, then the scope is too broad. I'm worried that we fall into the kitchen-sink trap again. - Karsten
  • Library dependancies?
    • Stem
    • Metrics API - onionoo?

Alarms

  • Geolocation
    • For instance, an alarm if there's a spike in the relays from a specific region (Syria and Iran come to mind...).
  • Congestion attack is skewing the bandwidth authority heuristics to favor bad relays
  • Malicious guards are rejecting circuits through non-malicious middle and exit hops. This would be a noisy type of attack since, with 2k relays, that would take very roughly 2k2 circuit shutdowns making it more of a DoS (it's heuristic based, but you get the idea). But still something good to check for.
  • Replace old scripts
    • Atagar's Consensus tracker script - Possible
    • Karsten's consensus health checker - Possible except for the parts where we check if a directory authority tells us a recent consensus or not
    • SoAT - No (Soat is an active scanner)
    • I'd like to merge it with the consensus-health script and the script that checks whether a relay's bandwidth history timestamps are totally off - Karsten
  • Entropy of bandwidth authority weights, so we know when the authority heuristics radically change.

  • Notice about new especially large big relays.
  • Tor weather notice for when people should get a shirt. Ideally we'd then reach out to them to figure out how their experience as a relay operator was going.

  • Sybil attacks
    • We should also analyze past network statuses to see how many false positives we'd have and whether there might have been Sybil attacks in the past. Obviously, we won't detect all such attacks, in particular when making the detection code public and allowing smart attackers to adapt. But we can make sure that the dumbest attacks don't go unnoticed. - Karsten
Last modified 6 years ago Last modified on Mar 23, 2012, 1:20:07 PM