Opened 5 years ago

Last modified 3 years ago

#18163 new enhancement

Consensus health doesn't track direct connection timings

Reported by: micah Owned by: tom
Priority: Medium Milestone:
Component: Metrics/Consensus Health Version:
Severity: Normal Keywords: consensus, faravahar
Cc: atagar, tjr, sina, metrics-team Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


The directory authority munin graphs (ygzf7uqcusp4ayjs.onion) that track direct download timeouts show a significant problem with Faravahar. The timeouts are so bad that it was impossible to directly fetch network documents from Faravahar because it would fail almost all of the time and cause the graphs to be useless.

Sina was notified about this problem, and he pointed out that the connectivity was fine because shows that Faravahar is doing good there, has no timeouts and sometimes is even better than others.

It seems like consensus-health is only using client timings (where the client requests the consensus via a one-hop tor circuit tunneled
connection). For the case of Faravahar, this works fine, and thus in consensus health we see no issues with it.

The problem is that Faravahar is dying during the direct connections. These direct connections are what all tor relays do
(and not the tunneled connections).

consensus-health should track these direct connections in addition to the tunneled connections, so that these network issues can be exposed better.

Child Tickets

Change History (3)

comment:1 Changed 5 years ago by teor

Faravahar could be dying during direct connections because of the ISP's transparent proxying of requests on port 80.

I know Sina said it's been turned off, but I wonder if the proxy is still interposed in those connections somehow.

comment:2 Changed 3 years ago by dgoulet

Cc: dgoulet removed
Component: Core Tor/DirAuthMetrics/Consensus Health
Owner: set to tom

comment:3 Changed 3 years ago by irl

Cc: metrics-team added

Adding metrics-team to cc

Note: See TracTickets for help on using tickets.