These are what needs to be done but they sorta need to be together to make sense thus this one ticket:
a. Add a series of tracepoints in tor code base. I propose to start with circuit and cell level tracepoints for now.
b. Add USDT (User Statically-Defined Tracing) probes support which is used by SystemTap, DTrace and perf.
c. Add LTTng support which if enable also emits USDT.
d. Integrate all this to our build system.
About(d), the consensus among the network team is that it should NEVER be enabled in production and should be a configure switch.
I believe if we add on top a torrc option, it might not be that useful in the end considering the configure switch but mainly it will degrade performance since the check needs to be at runtime for every tracepoint.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Round 2! I've addressed everything in the PR. The big change is that the tracing probes are within the subsystem instead of libtor-trace.a. Only the tracing API (tor_trace()) is in the library now.
Looks like CI is failing in a few ways. Let's get those fixed up before review?
One of the failures seems to be that coccinelle doesn't understand the lttng stuff. The simplest fix there might be to wrap those sections in #ifndef COCCI.
I think we need to tell checkSpace.pl to allow the #if !defined( style and not only the #ifndef.
This specific macro style is required because LTTng-UST, for internal probes creation, re-include multiple time the header file and thus overrides it with TRACEPOINT_HEADER_MULTI_READ.
I've tried to patch ./scripts/maint/checkSpace.pl to accept it but I'm unable to make it work... my perl skills are very poor. Nickm, advice?
Can you explain why we need TRACEPOINT_HEADER_MULTI_READ ?
That is LTTng specific. LTTng, at compile time, create the tracing probes (basically the C code to handle tracepoints) and uses that include file to get their declaration and type for parameters.
However, LTTng has a lot of trickery to create those probes, that are hidden from us, except this one in particular which is a way to override the header file guard macro so the header can be included multiple times for probe creation.
Maybe we should disambiguate identifiers used as domains and event names. Prefixes, suffices, InitialCaps, and camelCase were all suggested as possibilities. So were TOR_TRACE, macros to create a single macro per event, and so on.
Testing. How do we verify that this is working and we haven't broken it? We need some kind of test that actually makes sure events are happening and getting recorded.
Documentation. We should make sure that you don't need to know the tor source code intimately to understand the event descriptions.
Safety. In order to make sure this feature stays safe:
Tor should log loudly when it is enabled.
There should be clear safety guidelines about how the data it generates if used on the public network should be considered NOT safe to share, should NOT be backed up, and should be deleted.
We should make sure we tell researchers explicitly that if they try to use this for science on the public network, they will get data that they cannot ethically share, and they should probably look for another approach if they're doing science.
We should talk with Roger and the research safety board, looking for more suggestions, if they have any.
Maybe we should disambiguate identifiers used as domains and event names. Prefixes, suffices, InitialCaps, and camelCase were all suggested as possibilities. So were TOR_TRACE, macros to create a single macro per event, and so on.
See commit cc1dd1bea88065a4
Safety. In order to make sure this feature stays safe:
See commit a311da45004ab093 for the log warning
See commit 973044b0334f9eac for safety guidelines.
Documentation. We should make sure that you don't need to know the tor source code intimately to understand the event descriptions.
I have a question on that one. Should I make a central file (maybe in Tracing.md) containing all tracepoints and a description? I'm suggesting that because of the different possible instrumentation (USDT, Lttng,...), there is not central point where a tracepoint is defined. And thus, we might want an "index" file explaining them all ?
Testing. How do we verify that this is working and we haven't broken it? We need some kind of test that actually makes sure events are happening and getting recorded.
The only way I see for now is adding a build matrix in our CI that builds all instrumentation at once.
This code is looking a lot better now. I think that all we need is the documentation, testing, and some quick feedback from the research safety folks. I've also left a couple of cosmetic comments on gihub.
For testing: We may need a CI option that builds with instrumentation turned on, but it would also be necessary to have tests that make sure that the trace events are generated. I didn't see those tests here yet.
For documentation, I don't feel too strongly about where we put it -- I had been thinking that it might go in the files where we define the events -- but some kind of independent document would be fine too. If we do a separate document, however, we should think about how to make sure that it stays in sync with the trace events.