Opened 9 months ago

Closed 6 months ago

#28179 closed enhancement (implemented)

Handle output from PT processes with the event loop

Reported by: ahf Owned by: ahf
Priority: Medium Milestone: Tor: 0.4.0.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: tor-pt, 040-roadmap-subtask
Cc: Actual Points:
Parent ID: #25502 Points: 31
Reviewer: nickm Sponsor: Sponsor19-must

Description

Currently the output from stdout/stderr of a PT process is only read during the startup of the process. The reading process uses read() on a non-blocking socket, which currently seems to work, but have proved to be flaky.

We should ensure that PT processes' output can be read all the time.

On Windows we cannot attach the pipes to the main loop because of limitations of the select() API, so we have to do something slightly worse such as reading from the stdout/stderr handle via a timer as long as the processes are alive.

Child Tickets

TicketTypeStatusOwnerSummary
#26360defectclosedTransport plugins deadlock if they write too much to stderr
#28846enhancementclosedUse K/V parser for handling PT LOG and STATUS messages
#28847enhancementclosedUse new subsystems API for initializing the process subsystem

Attachments (1)

bug28179-client.go (3.8 KB) - added by dcf 8 months ago.
Client PT program that demonstrates a deadlock when writing too much to stdout/stderr.

Download all attachments as: .zip

Change History (32)

comment:1 Changed 9 months ago by ahf

Owner: set to ahf
Status: newassigned

comment:2 Changed 9 months ago by dgoulet

Keywords: tor-pt 036-roadmap-subtask added

comment:3 Changed 9 months ago by dcf

Related to this is #26360. tor should read from transport plugins' stderr, if only to prevent processes from deadlocking when they write to it too much.

comment:4 Changed 9 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:5 Changed 8 months ago by teor

Keywords: 040-roadmap-subtask added; 036-roadmap-subtask removed

0.3.6 is now 0.4.0: changing keywords

comment:6 Changed 8 months ago by ahf

comment:7 Changed 8 months ago by ahf

Status: assignedneeds_review

This is already pretty big, it's not entirely done yet, but let's start having some review of the design and code:

Known issues with the current code:

  • The Python file is missing its actual test implementations in test_unix/test_win32. Need to talk with Nick about that.
  • Missing doc strings for a lot of functions.
  • Missing process termination detection.
  • Some FIXME's still around.

comment:9 Changed 8 months ago by dgoulet

Reviewer: dgoulet

comment:10 Changed 8 months ago by dgoulet

Status: needs_reviewneeds_revision

Did a pass on it. No show stopper for now but what I would need also is some documentation on how that API is used since right now it is very "abstract" to the process.c and not used outside of it. One thing that could be done there is to do some documentation on how to use the API at the start of process.c?

comment:11 Changed 8 months ago by ahf

Status: needs_revisionneeds_review

comment:12 Changed 8 months ago by dgoulet

Status: needs_reviewneeds_revision

comment:13 Changed 8 months ago by ahf

Status: needs_revisionneeds_review

Added fixup commits.

comment:14 Changed 8 months ago by dcf

https://github.com/torproject/tor/pull/537/files#diff-08bd8435ecfd8690789652091567ff47R1151

  /* The format is 'LOG <transport> <message>'. We accept empty messages. */

It looks like this comment is out of date; I thought the <transport> part was removed. Also "We accept empty messages" seems to contradict the "Managed proxy sent us a LOG line with missing arguments" check above: I suppose that LOG (with a space) is acceptable but LOG is not, not sure if that's what's intended.

https://github.com/torproject/tor/pull/537/files#diff-6fdba7a022e591369dcf7054e5f2700aR7396

/** A pluggable transport called <b>transport_name</b> has emitted a log
 * message found in <b>message</b>. */
void
control_event_transport_log(const char *transport_name, const char *message)
{
  send_control_event(EVENT_TRANSPORT_LOG,
                     "650 TRANSPORT_LOG %s %s\r\n",
                     transport_name,
                     message);
}

Where does <transport_name> come from? Is it the path the executable? In general, tor won't know a specific transport name corresponding to any LOG message. I don't see where control_event_transport_log is called other than in tests.

comment:15 Changed 8 months ago by dcf

Cross-referencing #9957, an older ticket about reading and logging PTs' stderr.

Changed 8 months ago by dcf

Attachment: bug28179-client.go added

Client PT program that demonstrates a deadlock when writing too much to stdout/stderr.

comment:16 Changed 8 months ago by dcf

This program may be useful for testing: attachment:bug28179-client.go

It's a modified version of dummy-client from goptlib. The differences are that after PT initialization, it starts writing to stdout and stderr at 4 KB/s. Also, in its core proxying loop, it tries to write a line to stdout/stderr every time it downloads a chunk of data. Eventually the stdout and stderr buffers fill up, and the proxy loop halts because it cannot write its line. The program copies everything it writes to stdout/stderr to a file called mirror.log, so you can see how much was written before it deadlocks.

  1. Download and put in a directory called bug28179-client.
  2. go get and go build.
  3. Put in torrc:
    DataDirectory datadir
    UseBridges 1
    ClientTransportPlugin dummy exec bug28179-client
    Bridge dummy 128.31.0.61:443
    
  4. tor -f torrc SOCKSPort 10000
  5. In another terminal, tail -F mirror.log. You will see a mixture of hello tor world and received XXX bytes lines.
  6. For me, the system deadlocks after 8 seconds; apparently the stdout/stderr buffers are 64 KB.
    $ ls -l mirror.log
    -rw-r--r-- 1 david david 65425 Nov 27 14:09 mirror.log
    
    If tor was in the middle of bootstrapping, it will stop here. If tor finished bootstrapping, you can verify that it stopped working with curl -x socks5h://127.0.0.1:10000/ https://example.com/.

comment:18 Changed 8 months ago by dgoulet

Status: needs_reviewmerge_ready

PR in previous comment. Branch is in ahf:bugs/28179_pr.

@nickm, this is rather large branch. It had a lots of back and forth and testing with me and ahf so it is in merge_ready for your consideration and upstream merge.

The spec changes of this branch are in #28180.

comment:19 Changed 8 months ago by nickm

Status: merge_readyneeds_revision

Hi! I've got some requests in the branch. I don't know if the windows code works or not, but if it's well-tested both manually and with automated tests, I'll believe it.

I had thought that we'd be doing multiple threads here to make the windows code work. I think that might be our only way around a timer. Having timers that bypass the periodic timer system risks undoing a bunch of our work for wakeup reduction, unless we are super careful.

For the spec: I think we need to request more structure from these log messages, or we won't be able to actually do anything automatic based on them. Maybe we could define severity, keyword, suggested-action stuff?

comment:20 Changed 7 months ago by ahf

Status: needs_revisionneeds_review

Fixed all pending comments on the code. Moving to needs_review.

I've postponed integrating any code that needs to make use of the K/V parser (that is the PT stdout handler) for the LOG and STATUS messages. If we think these should go in first, I'll need some help figuring out what the best way to do this with a merge/rebase without making it difficult for the reviewers to review this again.

comment:21 Changed 7 months ago by dgoulet

Sponsor: Sponsor8Sponsor8-must

comment:22 Changed 7 months ago by nickm

Status: needs_reviewmerge_ready

Marked as merge_ready, but made a squashed-and-merged PR as https://github.com/torproject/tor/pull/603 so that CI can take one last peek at this branch before I merge it.

comment:23 Changed 7 months ago by nickm

Just merged. Woo!

comment:24 Changed 7 months ago by ahf

Status: merge_readyaccepted

Let's move this to being a tracker ticket for its childs.

comment:25 Changed 7 months ago by ahf

Status: acceptedneeds_review

Did an experiment on a VM over the night with running a lot of tests. PR in:

https://github.com/torproject/tor/pull/608

comment:26 Changed 7 months ago by ahf

Reviewer: dgouletnickm

Changing reviewer to nickm as per discussion on IRC.

comment:27 Changed 7 months ago by nickm

Merged that branch; please change ticket status as appropriate?

comment:28 Changed 7 months ago by ahf

Status: needs_reviewassigned

comment:29 Changed 6 months ago by gaba

Sponsor: Sponsor8-mustSponsor19-must

Will close this ticket when children #28845 and #28849 are closed.

comment:30 Changed 6 months ago by ahf

Points: 31

comment:31 Changed 6 months ago by ahf

Resolution: implemented
Status: assignedclosed
Note: See TracTickets for help on using tickets.