Opened 10 months ago

Last modified 5 months ago

#32910 needs_revision enhancement

trace: Add tracepoints and userspace tracer support

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.4.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tracing 044-can
Cc: Actual Points:
Parent ID: Points: 3
Reviewer: nickm Sponsor:

Description

Tor-dev email: https://lists.torproject.org/pipermail/tor-dev/2019-December/014111.html

These are what needs to be done but they sorta need to be together to make sense thus this one ticket:

  1. Add a series of tracepoints in tor code base. I propose to start with circuit and cell level tracepoints for now.
  1. Add USDT (User Statically-Defined Tracing) probes support which is used by SystemTap, DTrace and perf.
  1. Add LTTng support which if enable also emits USDT.
  1. Integrate all this to our build system.

About(d), the consensus among the network team is that it should NEVER be enabled in production and should be a configure switch.

I believe if we add on top a torrc option, it might not be that useful in the end considering the configure switch but mainly it will degrade performance since the check needs to be at runtime for every tracepoint.

Child Tickets

Change History (18)

comment:1 Changed 9 months ago by dgoulet

Status: assignedneeds_review

Branch: ticket32910_044_01
PR: https://github.com/torproject/tor/pull/1664

comment:2 Changed 9 months ago by dgoulet

Reviewer: nickm

comment:3 Changed 9 months ago by nickm

Quick note: CI is failing because of gnu variadic macros. Instead we favor using C99 variadic macros, which are a little different.

Now for the rest of the review :)

comment:4 Changed 9 months ago by nickm

Status: needs_reviewneeds_revision

I've added some comments to the PR.

comment:5 Changed 9 months ago by dgoulet

Status: needs_revisionneeds_review

Round 2! I've addressed everything in the PR. The big change is that the tracing probes are within the subsystem instead of libtor-trace.a. Only the tracing API (tor_trace()) is in the library now.

Branch: ticket32910_044_02
PR: https://github.com/torproject/tor/pull/1721

comment:6 Changed 9 months ago by teor

Looks like ISO C doesn't like empty files:

ISO C requires a translation unit to contain at least one declaration [-Werror,-Wempty-translation-unit]

We can do the same thing as the dirauth and relay modules, and avoid compiling the file at all?

comment:7 Changed 8 months ago by nickm

Status: needs_reviewneeds_revision

Looks like CI is failing in a few ways. Let's get those fixed up before review?

One of the failures seems to be that coccinelle doesn't understand the lttng stuff. The simplest fix there might be to wrap those sections in #ifndef COCCI.

comment:8 Changed 8 months ago by dgoulet

So CI is only failing because of an issue with make check-spaces:

     GUARD:./src/core/or/trace_probes_circuit.h:19: Header guard macro mismatch.

The reason for this is because of the unusual guard macro that LTTng-UST requires:

#if !defined(TOR_TRACE_PROBES_CIRCUIT_H) || defined(TRACEPOINT_HEADER_MULTI_READ)
#define TOR_TRACE_PROBES_CIRCUIT_H

I think we need to tell checkSpace.pl to allow the #if !defined( style and not only the #ifndef.

This specific macro style is required because LTTng-UST, for internal probes creation, re-include multiple time the header file and thus overrides it with TRACEPOINT_HEADER_MULTI_READ.

I've tried to patch ./scripts/maint/checkSpace.pl to accept it but I'm unable to make it work... my perl skills are very poor. Nickm, advice?

comment:9 Changed 8 months ago by nickm

Can you explain why we need TRACEPOINT_HEADER_MULTI_READ ?

comment:10 in reply to:  9 Changed 8 months ago by dgoulet

Replying to nickm:

Can you explain why we need TRACEPOINT_HEADER_MULTI_READ ?

That is LTTng specific. LTTng, at compile time, create the tracing probes (basically the C code to handle tracepoints) and uses that include file to get their declaration and type for parameters.

However, LTTng has a lot of trickery to create those probes, that are hidden from us, except this one in particular which is a way to override the header file guard macro so the header can be included multiple times for probe creation.

comment:11 Changed 8 months ago by dgoulet

Ok, per discussion with nickm, moved the LTTng specific probe code into a .inc file which should fix our problems and fix the CI.

comment:12 Changed 8 months ago by dgoulet

Status: needs_revisionneeds_review

comment:13 Changed 8 months ago by nickm

Issues we talked about on IRC:

  • Maybe we should disambiguate identifiers used as domains and event names. Prefixes, suffices, InitialCaps, and camelCase were all suggested as possibilities. So were TOR_TRACE, macros to create a single macro per event, and so on.
  • Testing. How do we verify that this is working and we haven't broken it? We need some kind of test that actually makes sure events are happening and getting recorded.
  • Documentation. We should make sure that you don't need to know the tor source code intimately to understand the event descriptions.
  • Safety. In order to make sure this feature stays safe:
    • Tor should log loudly when it is enabled.
    • There should be clear safety guidelines about how the data it generates if used on the public network should be considered NOT safe to share, should NOT be backed up, and should be deleted.
    • We should make sure we tell researchers explicitly that if they try to use this for science on the public network, they will get data that they cannot ethically share, and they should probably look for another approach if they're doing science.
    • We should talk with Roger and the research safety board, looking for more suggestions, if they have any.

Smaller issues:

  • This needs a changes file.

comment:14 Changed 8 months ago by nickm

Status: needs_reviewneeds_revision

comment:15 Changed 8 months ago by teor

Just to be clear here:

  • it's totally ethical for researchers to use tracing to profile behaviour of their own tor clients (for example, path building, or timing behaviour)
  • it is unethical for anyone to release raw, detailed tracing data of other users' activity (including relays that handle other users' traffic)
  • for most things researchers want, PrivCount or another privacy-preserving system is their best bet
Last edited 8 months ago by teor (previous) (diff)

comment:16 Changed 8 months ago by dgoulet

Status: needs_revisionneeds_review

(New rebased on latest master branch. Nothing has changed in the commit from _02 branch, just new commits have been added.)

Branch: ticket32910_044_03
PR: https://github.com/torproject/tor/pull/1790

Maybe we should disambiguate identifiers used as domains and event names. Prefixes, suffices, InitialCaps, and camelCase were all suggested as possibilities. So were TOR_TRACE, macros to create a single macro per event, and so on.

See commit cc1dd1bea88065a4

Safety. In order to make sure this feature stays safe:

See commit a311da45004ab093 for the log warning
See commit 973044b0334f9eac for safety guidelines.

Documentation. We should make sure that you don't need to know the tor source code intimately to understand the event descriptions.

I have a question on that one. Should I make a central file (maybe in Tracing.md) containing all tracepoints and a description? I'm suggesting that because of the different possible instrumentation (USDT, Lttng,...), there is not central point where a tracepoint is defined. And thus, we might want an "index" file explaining them all ?

Testing. How do we verify that this is working and we haven't broken it? We need some kind of test that actually makes sure events are happening and getting recorded.

The only way I see for now is adding a build matrix in our CI that builds all instrumentation at once.

comment:17 Changed 6 months ago by nickm

Status: needs_reviewneeds_revision

This code is looking a lot better now. I think that all we need is the documentation, testing, and some quick feedback from the research safety folks. I've also left a couple of cosmetic comments on gihub.

For testing: We may need a CI option that builds with instrumentation turned on, but it would also be necessary to have tests that make sure that the trace events are generated. I didn't see those tests here yet.

For documentation, I don't feel too strongly about _where_ we put it -- I had been thinking that it might go in the files where we define the events -- but some kind of independent document would be fine too. If we do a separate document, however, we should think about how to make sure that it stays in sync with the trace events.

comment:18 Changed 5 months ago by nickm

Keywords: 044-can added

(if revisions are done on this by the deadline, it can go in. else it should wait for 0.4.5)

Note: See TracTickets for help on using tickets.