Opened 13 months ago

Last modified 12 months ago

#31291 new enhancement

non-public relay health metrics for operators

Reported by: nusenu Owned by:
Priority: Medium Milestone:
Component: Core Tor/Tor Version:
Severity: Normal Keywords: network-health
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


Compared to other server daemons (webserver, DNS server, ..)
tor provides little data for operators to detect operational issues
and anomalies.

I'd suggest to provide the following stats via an prometheus compatible HTTP endpoint with authentication support
(most of the data is already written to logfiles by default)

  • total amount of memory used by the tor process
  • amount of currently open circuits
  • circuit handshake stats (TAP / NTor)

DoS mitigation stats

  • amount of circuits killed with too many cells
  • amount of circuits rejected
  • marked addresses
  • amount of connections closed
  • amount of single hop clients refused
  • amount of closed/failed circuits broken down by their reason value

  • amount of closed/failed OR connections broken down by their reason value

If this causes a significant performance impact this feature should be disabled
by default.

cell stats

  • extra info cell stats

as defined in:

Child Tickets

Change History (2)

comment:1 Changed 13 months ago by gk

Keywords: network-health added

comment:2 Changed 12 months ago by irl

I have argued against this before, and I will continue to do so. These metrics are not of use to individual relay operators. The average operator will not know what these things mean. Currently these are written to log files, if that is enabled, which by default it is not in Debian at least.

Providing network access to these values could allow for deanonymisation attacks, especially at the intervals with which Prometheus is expecting updates.

If you really wanted to do this, write something that parses Tor logs for the heartbeat messages and have an HTTP endpoint on that, but it should not be something that is generally used, only in test networks or for short term debugging.

If you capture information and store it on disk then that information is legally discoverable which means you can be compelled to hand it over, or face consequences for not handing it over. If there is an expectation that most operators collect this data then you can cause problems for the operators that collect no logs, because it's harder to prove that you don't have something than that you have it.

Instead of monitoring individual relays, we monitor the wider network using collected metrics. Once PrivCount is deployed we will have network wide aggregates (safely) for all of the heartbeat metrics and be able to see anomalies there.

I would be interested in something that worked only on extra info stats though. We have already determined it is safe to collect those stats for individual relays and safe to publish them.

Note: See TracTickets for help on using tickets.