#28841 closed project (fixed)

Write tool for onion service health assessment

Reported by: asn Owned by: dgoulet
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs, reachability, research, network-health, network-team-roadmap-september
Cc: rl1987, s7r, metrics-team Actual Points:
Parent ID: #30200 Points: 7
Reviewer: Sponsor: Sponsor27-must


We've been getting lots of reports about bad reachability of onion services (e.g. #28730) and in particular the v3 ones.

We need a tool that we can use to evaluate and monitor the health of onion services. We should use it to verify how reachable and stable onions are, and also as a benchmark for how their stability changes over time.

A relevant ticket here is #13209 which we can leverage in the future.

One way to write such a tool is to provide it with an onion service, and the tool fetches its desc from every HSDir, then introduces itself to all the intro points, and make sure that rendezvous can occur. Then it monitors this over time to find issues with reachability.

Child Tickets

#25417closedneelHSFETCH support for v3 Hidden ServicesCore Tor/Tor

Change History (16)

comment:1 Changed 23 months ago by meejah

FWIW, I've received a bug-report related to this feature too: https://github.com/meejah/txtorcon/issues/327

If there are experiments that I could run (from controller code) to help with understanding this, let me know (I can use the txtorcon documentation hidden-services -- there's a v2 and a v3 one -- as test-subjects)

comment:2 Changed 23 months ago by arma

Yes, I want this tool: we need it as a building block to debugging and nagios-style monitoring.

I'd suggest one slight change to the original description by asn: step one is to make the tool that does the checking, and then step two, separately, would be to run the tool on some sort of schedule. By separating these, anybody who says "why isn't my onion service working" can grab the tool and find out. Heck, somebody might even set up a web page that runs it for people.

comment:3 Changed 23 months ago by meejah

I assume using HSFETCH would be at the core of such a tool .. but also that doesn't work with v3 onions. (That would be: https://trac.torproject.org/projects/tor/ticket/25417)

comment:4 Changed 23 months ago by rl1987

Cc: rl1987 added

comment:5 Changed 22 months ago by rl1987

Status: newneeds_information
Type: defectproject

Some questions/thoughts about this:

  1. There are two parts of Onion Service reachability: a) Ability of Tor network to communicate it's HSDesc reliably and do the introduction/rendezvous procedure when user tries to reach the Onion Service and b) Ability of server software at Onion Service side to properly listen for incoming connection and respond to requests. We care very much about the former, but do we care about the latter? I think not really, as that is responsibility of whoever is running Onion Service and they can use tools like Nagios to monitor things on their side. Also it is trivial to just torify curl .... Any comments on this?
  2. Part a) from above can be further split into: a1) Ability of HSDirs to reliably inform user about the (latest) HSDesc and a2) Ability of Tor network to establish the final circuit between user and Onion Service. I suppose we want to measure both of these, and log some metrics about them? That would be timing information, as well success/failure for each try. We also want to detect cases of Tor network failing to perform any of the connection establishment steps.
  3. Do we want this to be based on stem? Can we currently do introduction/rendezvous stuff with Tor Control Port and get progress information that is fine-grained enough for this tool? Are there things we need to implement for Tor Control interface (beyond making HSFETCH support v3 descriptors) to make it ready?
  4. What would be the UI/API of such tool? Do we want JSON output for easier integration with other stuff? Do we want some API over HTTP?

comment:6 Changed 22 months ago by arma

Keywords: network-health added
Summary: Write tool for onion service health assesmentWrite tool for onion service health assessment
  1. Agreed, this health assessment tool should be entirely about the "within the Tor protocol" side of things. People can use nagios or whatever to make sure that their service is running well -- but only if the onion protocols are reliable and consistent.
  1. Yes. It's not just timing, and not just success/failure, but another piece is trying to identify what went wrong if one of the steps went wrong.
  1. Basing it on stem is fine with me. I think the answer might be "no" for whether all of this stuff is exposed properly in the control protocol though. I think we've had tickets about extending the control protocol in that direction open for a very long time. Or maybe nobody even made the tickets.
  1. I imagine the first way of using the tool would be that we, the developers of the tool, run it consistently against some known-stable onion service. The goal would be to look for patterns in the failures. So the better the tool can be at identifying where the failure is and why the failure is, the more useful it will be. And then the second use of the tool would be when people say their onion service isn't working right -- we can tell them to run the tool and see what it says. Then I could imagine a third way, which is somebody sets up a web interface to run the tool on behalf of anybody who interacts with the website. Then it would become the sort of thing all sorts of people could easily run. But, one step at a time -- let's start with that 'first way'. :)

comment:7 Changed 22 months ago by s7r

Cc: s7r added

I run two v3 onion services that fulfill the criteria and could give us relevant statistics for our questions. commenting here so I get the updates on this ticket, I would be thrilled to use a tool like this to monitor their health and report back here.

I have no other way to know about their health, all I can do is check the Tor daemon for relevant messages in the log file or try connecting to them myself. But this "solution" is obviously orders of magnitude less reliable and exact than a tool that will check the health against all HSDirs and try introduction to all IPs.

comment:8 Changed 20 months ago by irl

Cc: metrics-team added

comment:9 Changed 19 months ago by asn

Sponsor: Sponsor27-can

comment:10 Changed 19 months ago by asn

Points: 23
Sponsor: Sponsor27-canSponsor27-must

comment:11 Changed 19 months ago by asn

Parent ID: #29995

comment:12 Changed 18 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

Add keyword for tickets in the network team roadmap.

comment:13 Changed 16 months ago by gaba

Keywords: network-team-roadmap-september added; network-team-roadmap-2019-Q1Q2 removed

comment:14 Changed 15 months ago by dgoulet

Owner: set to dgoulet
Points: 237
Status: needs_informationassigned

Points changed at the Stockholm meeting.

comment:15 Changed 12 months ago by dgoulet

Parent ID: #29995#30200

This has nothing to do with s27 O1A1.1. It is instrumental but the activity should not depend on this.

I did built that tool based on tor HS tracing but it is not upstream nor ready for upstream. Heck, it might live its life outside of mainline tor, who knows.

comment:16 Changed 11 months ago by dgoulet

Resolution: fixed
Status: assignedclosed

At this point, the general idea of stable tracepoints in tor is being discussed so merging this tool upstream depends on the decisions coming out of the discussions with the network team.

For now, this lives outside of tor and hopefully one day, the tracing part will be put upstream. In the meantime, the rest is out of tree.

For reference, tor tracing is here: https://gitweb.torproject.org/user/dgoulet/tor.git/?h=lttng-hs

The scripts to analyze the traces and output useful data is here until we find a better place:


Closing this as "Done" since the work has been done but upstream merge requires more discussions. But for the sponsored work, it is considered done.

Note: See TracTickets for help on using tickets.