trace: Add tracepoints and userspace tracer support

changed milestone to %Tor: 0.4.4.x-final

added 044-can component::core tor/tor milestone::Tor: 0.4.4.x-final owner::dgoulet points::3 priority::medium reviewer::nickm severity::normal status::needs-revision tracing type::enhancement labels

Branch: ticket32910_044_01 PR: https://github.com/torproject/tor/pull/1664

Trac:
Status: assigned to needs_review

Trac:
Reviewer: N/A to nickm

Quick note: CI is failing because of gnu variadic macros. Instead we favor using C99 variadic macros, which are a little different.

Now for the rest of the review :)

I've added some comments to the PR.

Trac:
Status: needs_review to needs_revision

Round 2! I've addressed everything in the PR. The big change is that the tracing probes are within the subsystem instead of libtor-trace.a. Only the tracing API (tor_trace()) is in the library now.

Branch: ticket32910_044_02 PR: https://github.com/torproject/tor/pull/1721

Trac:
Status: needs_revision to needs_review

Looks like ISO C doesn't like empty files:

ISO C requires a translation unit to contain at least one declaration [-Werror,-Wempty-translation-unit]

We can do the same thing as the dirauth and relay modules, and avoid compiling the file at all?

Looks like CI is failing in a few ways. Let's get those fixed up before review?

One of the failures seems to be that coccinelle doesn't understand the lttng stuff. The simplest fix there might be to wrap those sections in #ifndef COCCI.

Trac:
Status: needs_review to needs_revision

So CI is only failing because of an issue with make check-spaces:

     GUARD:./src/core/or/trace_probes_circuit.h:19: Header guard macro mismatch.

The reason for this is because of the unusual guard macro that LTTng-UST requires:

#if !defined(TOR_TRACE_PROBES_CIRCUIT_H) || defined(TRACEPOINT_HEADER_MULTI_READ)
#define TOR_TRACE_PROBES_CIRCUIT_H

I think we need to tell checkSpace.pl to allow the #if !defined( style and not only the #ifndef.

This specific macro style is required because LTTng-UST, for internal probes creation, re-include multiple time the header file and thus overrides it with TRACEPOINT_HEADER_MULTI_READ.

I've tried to patch ./scripts/maint/checkSpace.pl to accept it but I'm unable to make it work... my perl skills are very poor. Nickm, advice?

Can you explain why we need TRACEPOINT_HEADER_MULTI_READ ?

Replying to nickm:

Can you explain why we need TRACEPOINT_HEADER_MULTI_READ ?

That is LTTng specific. LTTng, at compile time, create the tracing probes (basically the C code to handle tracepoints) and uses that include file to get their declaration and type for parameters.

However, LTTng has a lot of trickery to create those probes, that are hidden from us, except this one in particular which is a way to override the header file guard macro so the header can be included multiple times for probe creation.

Ok, per discussion with nickm, moved the LTTng specific probe code into a .inc file which should fix our problems and fix the CI.

Trac:
Status: needs_revision to needs_review

Issues we talked about on IRC:

Maybe we should disambiguate identifiers used as domains and event names. Prefixes, suffices, InitialCaps, and camelCase were all suggested as possibilities. So were TOR_TRACE, macros to create a single macro per event, and so on.
Testing. How do we verify that this is working and we haven't broken it? We need some kind of test that actually makes sure events are happening and getting recorded.
Documentation. We should make sure that you don't need to know the tor source code intimately to understand the event descriptions.
Safety. In order to make sure this feature stays safe:
- Tor should log loudly when it is enabled.
- There should be clear safety guidelines about how the data it generates if used on the public network should be considered NOT safe to share, should NOT be backed up, and should be deleted.
- We should make sure we tell researchers explicitly that if they try to use this for science on the public network, they will get data that they cannot ethically share, and they should probably look for another approach if they're doing science.
- We should talk with Roger and the research safety board, looking for more suggestions, if they have any.

Smaller issues:

This needs a changes file.

Trac:
Status: needs_review to needs_revision

Just to be clear here:

it's totally ethical for researchers to use tracing to profile behaviour of their own tor clients (for example, path building, or timing behaviour)
it is unethical for anyone to release raw, detailed tracing data of other users' activity (including relays that handle other users' traffic)
for most things researchers want, PrivCount or another privacy-preserving system is their best bet

(New rebased on latest master branch. Nothing has changed in the commit from _02 branch, just new commits have been added.)

Branch: ticket32910_044_03 PR: https://github.com/torproject/tor/pull/1790

Maybe we should disambiguate identifiers used as domains and event names. Prefixes, suffices, InitialCaps, and camelCase were all suggested as possibilities. So were TOR_TRACE, macros to create a single macro per event, and so on.

See commit cc1dd1bea88065a4

Safety. In order to make sure this feature stays safe:

See commit a311da45004ab093 for the log warning See commit 973044b0334f9eac for safety guidelines.

Documentation. We should make sure that you don't need to know the tor source code intimately to understand the event descriptions.

I have a question on that one. Should I make a central file (maybe in Tracing.md) containing all tracepoints and a description? I'm suggesting that because of the different possible instrumentation (USDT, Lttng,...), there is not central point where a tracepoint is defined. And thus, we might want an "index" file explaining them all ?

Testing. How do we verify that this is working and we haven't broken it? We need some kind of test that actually makes sure events are happening and getting recorded.

The only way I see for now is adding a build matrix in our CI that builds all instrumentation at once.

Trac:
Status: needs_revision to needs_review

This code is looking a lot better now. I think that all we need is the documentation, testing, and some quick feedback from the research safety folks. I've also left a couple of cosmetic comments on gihub.

For testing: We may need a CI option that builds with instrumentation turned on, but it would also be necessary to have tests that make sure that the trace events are generated. I didn't see those tests here yet.

For documentation, I don't feel too strongly about where we put it -- I had been thinking that it might go in the files where we define the events -- but some kind of independent document would be fine too. If we do a separate document, however, we should think about how to make sure that it stays in sync with the trace events.

Trac:
Status: needs_review to needs_revision

(if revisions are done on this by the deadline, it can go in. else it should wait for 0.4.5)

Trac:
Keywords: tracing deleted, tracing 044-can added

changed time estimate to 24h

moved to tpo/core/tor#32910 (closed)

trace: Add tracepoints and userspace tracer support

Child items ...

Activity