Opened 3 years ago

Closed 2 months ago

#13802 closed enhancement (implemented)

Add general trace-event logging instrumentation to Tor

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-03-unspecified-201612, review-group-15
Cc: robgjansen Actual Points:
Parent ID: #13792 Points: 3
Reviewer: nickm Sponsor: SponsorR-can

Description (last modified by dgoulet)

Relevant to #13792, we (nickm and I) discussed what would be the way to collect and gather those statistics and measurements in a private network for performance analysis and debugging.

This would be completely deactivated at compile time meaning tracepoints would be NOP. You would have to explicitly enable that feature during the configure process.

We think the best way to go with that, in terms of code, is to go in a header with something that could look like this:

#ifdef TOR_USE_LTTNG
tracepoint_aname(arg1, arg2, [...]) tracepoint(aname, arg1, arg2, ...)
#else
tracepoint_aname(arg1, arg2, [...])
#endif

The really good thing about this approach is that 1) everything is centralized in one place, 2) you NOP the call if not configured thus no performance issue, 3) We can support lots of different tracers as well as shadow. It provides a simple way to hook any other tool in the trace event facility of the code.

Child Tickets

Change History (54)

comment:1 Changed 3 years ago by dgoulet

  • Parent ID set to #13792

comment:2 Changed 3 years ago by nickm

  • Parent ID #13792 deleted

I've talked to David a bit about this, and I do think it's a good way to run performance testing on private networks. It could also IMO make Shadow output easier to analyze, if the two facilities can be made to cooperate.

Preliminaries:

  • We should do a little research to make sure that there isn't some other important userspace instrumentation API that we should keep in mind for maintaining compatibility with. So far, it doesn't look like there is.
  • This should be, as noted above, controlled by a compile-time and a runtime option; both should be off by default.
  • We should make sure to document that this is for researchers and developers, not something that you should be running on a live relay.

comment:3 Changed 3 years ago by nickm

  • Keywords 026-deferrable added
  • Milestone changed from Tor: 0.2.??? to Tor: 0.2.6.x-final

Tenatively moving into 026

comment:4 Changed 3 years ago by dgoulet

  • Parent ID set to #13792

comment:5 Changed 3 years ago by robgjansen

  • Cc rob.g.jansen@… added

comment:6 Changed 2 years ago by dgoulet

  • Keywords 026-deferrable removed
  • Milestone changed from Tor: 0.2.6.x-final to Tor: 0.2.7.x-final

As of now, I have 17 *very useful* tracepoints mostly in the hidden service subsystem. I don't think they are ready to be merged right now since we have yet to prove that all of them are useful to our performance measurement and that the data they record is actually what we really need (or any missing).

I suggest we move that to 0.2.7 because I would like to see that upstream soonish but with more eyes on the usefulness of the tracepoints. I think next tor dev. meeting, we should take a couple of minutes to discuss this in person with real use case I can demonstrate.

comment:7 Changed 2 years ago by nickm

  • Status changed from new to assigned

comment:8 Changed 2 years ago by nickm

  • Keywords 027-triaged-1-deferrable added

Marking tickets as deferrable from 0.2.7 triage round-1

comment:9 Changed 2 years ago by dgoulet

  • Cc robgjansen added; rob.g.jansen@… removed
  • Description modified (diff)
  • Summary changed from Add LTTng instrumentation to tor to Add instrumentation to tor

comment:10 Changed 22 months ago by nickm

  • Milestone changed from Tor: 0.2.7.x-final to Tor: 0.2.8.x-final

comment:11 Changed 21 months ago by nickm

  • Keywords SponsorR removed
  • Sponsor set to SponsorR

Bulk-replace SponsorR keyword with SponsorR sponsor field in Tor component.

comment:12 Changed 21 months ago by dgoulet

  • Keywords 027-triaged-1-deferrable removed
  • Points set to medium

comment:13 follow-up: Changed 20 months ago by nickm

  • Severity set to Normal

The crypto-instrumentation part of this is in for Nov if we can do it.

comment:14 Changed 20 months ago by nickm

  • Keywords TorCoreTeam201511 added

comment:15 in reply to: ↑ 13 Changed 20 months ago by dgoulet

Replying to nickm:

The crypto-instrumentation part of this is in for Nov if we can do it.

ACK. I have a branch that needs cleanup which adds the basics (build system integration and templates) to add any kind of low level instrumentation that we can then hook a tracer, shadow or any thing we would like that collect the data. I'll make sure to have it available soon for the crypto work to start.

For the crypto instrumentation, I'll need a Yawning or nickm for this. Basically, if you guys can give me callsite you want to instrument that is precise location in the code and payload to record, I can start adding them to the branch so you guys can see the mechanism and design involved.

Let me know if you would like to proceed differently. FYI, I plan in mid-November to work on this with Rob to instrument the connection buffers for his experiment and for that to be upstream also.

comment:16 follow-up: Changed 20 months ago by clePYew

This would be a step that would be impossible to undo, and would be potentially very negative.
I see LTTng has pretty documentation, though. Why don't we use their documentation formatter?

comment:17 in reply to: ↑ 16 Changed 20 months ago by dgoulet

Replying to clePYew:

This would be a step that would be impossible to undo, and would be potentially very negative.

Low-level instrumentation is indeed something that once added it has to be considered long live thus quite stable. I don't see that as negative because it's useful and as long as we think correctly about which event we add, it should be fine. The pros are beating the cons imo. It will become an external "ABI/API" like our control port.

I see LTTng has pretty documentation, though. Why don't we use their documentation formatter?

Not sure to understand here. LTTng has indeed pretty documentation on their website but it's first and foremost a tracer that can be hooked to those instrumentation event we'll add, not baked into tor.

comment:18 Changed 20 months ago by nickm

  • Owner set to dgoulet

comment:19 Changed 19 months ago by nickm

  • Keywords TorCoreTeam201512 201511-deferred added; TorCoreTeam201511 removed

Bulk-move uncompleted items to december. :/

comment:20 Changed 18 months ago by nickm

  • Keywords TorCoreTeam201601 201512-deferred added; TorCoreTeam201512 removed

Perhaps in January?

comment:21 Changed 18 months ago by dgoulet

Status update:

I'm working with Rob (through #17598) on adding bunch of instrumentation points and a shadow component to hook on those so Rob can conduct KIST experiment (with and without shadow).

The goal is to submit the instrumentation and a tracing framework with some meat around the bone so we can directly start using it once merged and other developers have a concrete example on how to add more in the future.

To look at the work or follow it, see branch ticket13802_028_01 which is still in heavy development (needs cleanup) but at least can give an idea of the direction and structure of things.

Depending on Rob's time to test and review, optimistic ETA is by the end of the month.

comment:22 Changed 17 months ago by nickm

Bulk-modify: It is February 2016, and no longer possible that anything else will get done in January 2016. Time's arrow and all that.

comment:23 Changed 17 months ago by nickm

  • Keywords TorCoreTeam201602 added; TorCoreTeam201601 removed

comment:24 Changed 16 months ago by nickm

  • Milestone changed from Tor: 0.2.8.x-final to Tor: 0.2.9.x-final

These seem like features, or like other stuff unlikely to be possible this month. Bumping them to 0.2.9

comment:25 Changed 15 months ago by dgoulet

  • Sponsor changed from SponsorR to SponsorR-must

comment:26 Changed 15 months ago by nickm

  • Keywords TorCoreTeam201602 removed

comment:27 Changed 13 months ago by isabela

  • Points changed from medium to 3

comment:28 Changed 13 months ago by dgoulet

  • Keywords 201511-deferred 201512-deferred removed
  • Milestone changed from Tor: 0.2.9.x-final to Tor: 0.2.???
  • Sponsor changed from SponsorR-must to SponsorR-can

comment:29 follow-up: Changed 10 months ago by arma

Poor ticket title. Instrumentation of what?

comment:30 Changed 10 months ago by pastly

I've done some work with dgoulet's branch and have pushed it publicly here.

https://github.com/pastly/public-tor/tree/ticket13802_028_01

It is branched off of a slightly outdated release-0.2.8.

Differences from dgoulet's work

  • remove all/most of the included shadow traces
  • add shadow trace on inbuf
  • add shadow trace when cell is written to outbuf
  • add shadow trace when outbuf is flushed some

The shadow traces have been extensively tested as I'm using them. dgoulet's generalized code works wonderfully, and I don't think I really changed any of it.

IMO the code can be merged (after review) as my testing shows it's working. It's possible, though unlikely, that I'll need to change the shadow-specific traces.

comment:31 in reply to: ↑ 29 Changed 10 months ago by pastly

  • Summary changed from Add instrumentation to tor to Add general trace-event logging instrumentation to Tor

Replying to arma:

Poor ticket title. Instrumentation of what?

comment:32 Changed 10 months ago by cypherpunks

This doesn't belong in release version of Tor.

comment:33 Changed 8 months ago by teor

  • Milestone changed from Tor: 0.2.??? to Tor: 0.3.???

Milestone renamed

comment:34 Changed 6 months ago by nickm

  • Keywords tor-03-unspecified-201612 added
  • Milestone changed from Tor: 0.3.??? to Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:35 Changed 6 months ago by nickm

  • Milestone changed from Tor: unspecified to Tor: 0.3.0.x-final
  • Status changed from assigned to needs_review

Hey, there's a patch for this!

comment:36 Changed 6 months ago by nickm

  • Status changed from needs_review to needs_revision

comment:37 Changed 6 months ago by pastly

  • Status changed from needs_revision to needs_review

New public branch. Ready for review.

https://github.com/pastly/public-tor/tree/ticket13802

comment:38 Changed 6 months ago by dgoulet

  • Owner changed from dgoulet to pastly
  • Status changed from needs_review to assigned

Assigning this one to pastly so I can monitor what I need to review and what is mine waiting for review :).

comment:39 Changed 6 months ago by dgoulet

  • Status changed from assigned to needs_review

comment:40 Changed 6 months ago by dgoulet

  • Reviewer set to dgoulet

comment:41 Changed 5 months ago by nickm

  • Keywords review-group-15 added

comment:42 Changed 5 months ago by dgoulet

  • Status changed from needs_review to merge_ready

Ok so I've cleaned up a bit the branch that is squashed all the commit into one and rebased it on master. I've also improved the comments and minor fixes.

I think this is ready for merge, it is working fine. Note that this is just the "tracing infrastructure". It is not used in the code base anywhere and you need to specifically enable it at compile time for the "tracepoint to log_debug()" framework (--enable-tracing-debug).

The point of having this skeleton is that once it's in, we can start adding tracepoint to subsystems that we want to trace using a specific framework (here we only have the log_debug() framework and even there it's very basic). For instance, KIST work at NRL are using this infrastructure to route their tracepoint to log_debug() but with some extra code around it. See doc/HACKING/Tracing.md for more information.

@nickm, if you prefer having it in use in the codebase before merging, I'm fine with it that means we differ this until we actually have tracepoint upstream. I think our friend at NRL can survive without it upstream yet I think because most of their stuff is a "new framework" plugged in this infrastructure.

Branch ticket13802_030_01.

comment:43 Changed 5 months ago by nickm

  • Milestone changed from Tor: 0.3.0.x-final to Tor: 0.3.1.x-final

Let's take this early in the 031 cycle?

comment:44 Changed 4 months ago by nickm

  • Keywords review-group-16 added

comment:45 Changed 4 months ago by nickm

  • Status changed from merge_ready to needs_review

Please rename 'tracing' to 'event tracing' everywhere. Otherwise, somebody will see the word "tracing" and freak out.

(I'm sure somebody will see the word "tracing" and freak out anyway, but let's at least give them a better chance to realize that it's silly to complain about debugging code.)

comment:46 Changed 4 months ago by nickm

(dgoulet said he'd take care of the renaming)

comment:47 Changed 3 months ago by dgoulet

  • Status changed from needs_review to needs_revision

comment:48 Changed 3 months ago by nickm

  • Keywords review-group-16 removed

comment:49 Changed 3 months ago by dgoulet

  • Owner changed from pastly to dgoulet
  • Status changed from needs_revision to accepted

Taking over the ticket.

comment:50 Changed 2 months ago by dgoulet

  • Reviewer changed from dgoulet to nickm
  • Status changed from accepted to needs_review

Renamed done, see fixup commit on this 031 branch: ticket13802_031_01

(Just in case you would prefer the non master rebased branch from above with the fixup commit: ticket13802_030_01)

comment:51 Changed 2 months ago by nickm

  • Resolution set to implemented
  • Status changed from needs_review to closed

pfffew, exellent. Merged at last, with some slight documentation tweaks.

comment:52 Changed 2 months ago by ahf

  • Resolution implemented deleted
  • Status changed from closed to reopened

While testing some of the compression code on OS X I found that this patch broke tor compilation on OS X.

OS X's ar(1) utility doesn't support creating an archive from an empty list of objects.

I've added a patch in https://gitlab.com/ahf/tor/merge_requests/4/commits that fixes this by creating a small stub function in an object.

comment:53 Changed 2 months ago by ahf

  • Status changed from reopened to needs_review

comment:54 Changed 2 months ago by nickm

  • Resolution set to implemented
  • Status changed from needs_review to closed

Merged!

Note: See TracTickets for help on using tickets.