Opened 5 weeks ago

Last modified 4 days ago

#30901 new enhancement

Add control port trace logging to tor

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.4.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-ci-fail-sometimes network-team-roadmap-august
Cc: nickm, catalyst, teor Actual Points:
Parent ID: #29437 Points: 1
Reviewer: Sponsor: Sponsor31-can

Description

To diagnose #30235, we need to be able to see what tor is sending and receiving on the control port.

I don't think that tor has any logging code for control port messages: I had a quick look, and there wasn't anything obvious.

Showing each message at info level could be good, but it might also be risky?
Maybe we should safe_string_client() the actual message text.

nickm and catalyst are refactoring the controller code right now, so they might know where I should add the logging statements.

Child Tickets

TicketTypeStatusOwnerSummary
#30700defectreopenedatagarTor's Travis stem job shows debug logs from 10 minutes after the hang

Change History (7)

comment:1 Changed 5 weeks ago by nickm

This sounds like debug to me, not info. We could use the LD_CONTROL domain to select these messages if we need to.

Intercepting incoming commands should be pretty easy -- the right place to do that is probably somewhere around handle_control_command.

For outgoing responses and events, I'd like catalyst's POV: they are actively refactoring that code right now, and probably have a better idea of how to hook it than I do.

comment:2 in reply to:  1 ; Changed 3 weeks ago by teor

Hi catalyst,

I'd like to make progress on this ticket today or tomorrow:

Replying to nickm:

For outgoing responses and events, I'd like catalyst's POV: they are actively refactoring that code right now, and probably have a better idea of how to hook it than I do.

Here's one way we could move forward:

I write a patch, and then you refactor it so that it fits in the new control code.

Let me know if you'd like me to do it another way.

comment:3 in reply to:  2 ; Changed 3 weeks ago by catalyst

Replying to teor:

Hi catalyst,

I'd like to make progress on this ticket today or tomorrow:

Replying to nickm:

For outgoing responses and events, I'd like catalyst's POV: they are actively refactoring that code right now, and probably have a better idea of how to hook it than I do.

Here's one way we could move forward:

I write a patch, and then you refactor it so that it fits in the new control code.

Let me know if you'd like me to do it another way.

If you're OK with hooking control_write_reply() and missing the set of control protocol writers that don't yet funnel through there, that might be best. I think after #30889 is merged, the remaining control protocol writers that don't go through that are the async control events, GETINFO, and GETCONF. (I'd have to check again to be more sure.)

comment:4 in reply to:  3 Changed 3 weeks ago by catalyst

Replying to catalyst:

Replying to teor:

Hi catalyst,

I'd like to make progress on this ticket today or tomorrow:

Replying to nickm:

For outgoing responses and events, I'd like catalyst's POV: they are actively refactoring that code right now, and probably have a better idea of how to hook it than I do.

Here's one way we could move forward:

I write a patch, and then you refactor it so that it fits in the new control code.

Let me know if you'd like me to do it another way.

If you're OK with hooking control_write_reply() and missing the set of control protocol writers that don't yet funnel through there, that might be best. I think after #30889 is merged, the remaining control protocol writers that don't go through that are the async control events, GETINFO, and GETCONF. (I'd have to check again to be more sure.)

Looking at my notes, it seems to be GETCONF (not GETINFO, sorry for misremembering), MAPADDRESS, and the async control events that use connection_add_buf() directly. I guess you could wrap specifically the connection_add_buf() that are related to the control protocol.

The control reply refactor work that I'm working on is #30984. I think we can minimize conflicts in that code if you stick to wrapping the callers of connection_add_buf() that talk to the control protocol. Or we can talk more about coordinating more invasive changes if you think that's better.

comment:5 Changed 3 weeks ago by catalyst

It occurs to me that if we capture async control event output in these log messages, we will have to exclude log events that are sent as async control events, to prevent loops.

comment:6 in reply to:  5 Changed 3 weeks ago by teor

Replying to catalyst:

It occurs to me that if we capture async control event output in these log messages, we will have to exclude log events that are sent as async control events, to prevent loops.

Since we want to debug a controller hang, I'd like to avoid modifying the control messages entirely.
Because additional log messages could break or modify the deadlock.

There's also a module include layering issue: the logging code calls the async control code, so the async control code can't include the log functions.

I'd like a set of signal safe logs at debug level, like the error sigsafe logs from https://github.com/torproject/tor/blob/master/src/lib/err/torerr.c
But I also need a set of domains for each of those fds.
At this stage, I don't think we'll need any levels other than debug and error, so it's not worth making the code fully generic.

comment:7 Changed 4 days ago by gaba

Cc: teor added
Keywords: network-team-roadmap-august added
Note: See TracTickets for help on using tickets.