Opened 3 weeks ago

Last modified 3 days ago

#30591 needs_revision defect

Make tor's Travis stem job log controller messages from stem and tor

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version: Tor: 0.3.5.1-alpha
Severity: Normal Keywords: tor-ci-fail-sometimes, stem, 041-should
Cc: Actual Points: 0.8
Parent ID: #29437 Points: 0.1
Reviewer: catalyst Sponsor: Sponsor31-can

Description

Diagnostics for #29437.

Child Tickets

TicketStatusOwnerSummaryComponent
#30592closedatagarHow do I make stem log info-level to test/data/tor_log?Core Tor/Stem
#30675closedatagarAdd an argument that makes stem TRACE output log to a fileCore Tor/Stem
#30694closedRestrict the tor CI stem job to tests that actually use torCore Tor/Tor

Change History (26)

comment:1 Changed 3 weeks ago by teor

Actual Points: 0.1
Status: newneeds_review
Version: Tor: 0.2.4.8-alphaTor: 0.3.5.1-alpha

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

comment:2 in reply to:  1 Changed 3 weeks ago by teor

Status: needs_reviewneeds_information

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

Ok, that didn't work. --log info just makes stem log info-level errors to the terminal. It doesn't change test/data/tor_log.

I'll open a child ticket to sort this out.

comment:3 Changed 3 weeks ago by teor

Component: Core Tor/StemCore Tor/Tor

comment:4 Changed 3 weeks ago by teor

Status: needs_informationneeds_revision

We'll need to revise this after #30592 merges into stem.

comment:5 Changed 3 weeks ago by teor

We need to add --log TRACE to stem's command line to get controller messages.
atagar recently changed stem to launch tor with logging at debug level, which should show us tor's control messages in the logs.

comment:6 in reply to:  1 ; Changed 3 weeks ago by teor

Actual Points: 0.10.3
Keywords: stem added
Status: needs_revisionneeds_review
Summary: Make Travis stem log at info level, and tail the logMake tor's Travis stem job log controller messages from stem and tor

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

I have updated these branches with TRACE, and force-pushed.

comment:7 Changed 3 weeks ago by dgoulet

Reviewer: catalyst

comment:8 Changed 3 weeks ago by nickm

FWIW, it appears that these are running into stem failures in travis as it stands -- which is exactly what they are supposed to help us diagnose, so those stem failures are not blockers here. I see Stem spamming these messages over and over:

05/28/2019 01:04:45 [INFO] Error while receiving a control message (SocketClosed): empty socket content
05/28/2019 01:04:45 [INFO] Error while receiving a control message (SocketClosed): received exception "I/O operation on closed file."

If we have not done so already, let's open a stem ticket to improve the output here: if the socket is closed, we probably shouldn't be retrying over and over.

comment:9 Changed 3 weeks ago by nickm

Keywords: 041-should added

comment:10 Changed 3 weeks ago by atagar

Hi Nick. For what it's worth those errors generally mean that a prior test crashed or otherwise buggered up its tor instance, causing future tests to be unable to authenticate.

What kind of output would you like to have? If it would be helpful I could make stem abort its test run when authentication begins failing. That would cut down on this kind of 'spam' but when this occurrences the proper next step are to look at the test and tor logs.

comment:11 in reply to:  10 Changed 3 weeks ago by nickm

Replying to atagar:

Hi Nick. For what it's worth those errors generally mean that a prior test crashed or otherwise buggered up its tor instance, causing future tests to be unable to authenticate.

What kind of output would you like to have? If it would be helpful I could make stem abort its test run when authentication begins failing. That would cut down on this kind of 'spam' but when this occurrences the proper next step are to look at the test and tor logs.

I think that the right response might be to abort whatever loop is spinning here rather than retrying it over and over? I wouldn't mind a few messages of this kind per test when something is broken, but spamming them like this tends to fill up CI logs. Anything that would make the output _shorter_ in this case would be cool with me.

comment:12 Changed 3 weeks ago by nickm

If anybody want to have a look at the failing case that I was talking about, let's see if there's any info we can get from https://travis-ci.org/torproject/tor/jobs/538005761 ? The log-spamming starts around line 4550

comment:13 Changed 3 weeks ago by catalyst

Status: needs_reviewneeds_information

The pull requests (other than the deliberate failure) seem to exceed the Travis max log length limitation. I'm not sure whether we need to try a different approach or if we should wait for stem to fix the log spamming on control connection close.

comment:14 Changed 3 weeks ago by atagar

As discussed over IRC the verbosity you're seeing are from trace logging. It's expected when you go to that level of logging that the output will get pretty big. After all, that's kinda the point. ;P

None the less I agree this went overboard. I'll look into disabling logging during the test that's causing that (I've done that elsewhere when a test causes too much trace logs). That said, the primary issue is that Stem lacks an option to log to a file rather than stdout. I'll add that per #30675.

comment:15 Changed 3 weeks ago by teor

Status: needs_informationnew

We can also reduce TRACE output by reducing the number of stem tests (#30694).

comment:16 Changed 3 weeks ago by teor

Sponsor: Sponsor31-can

Setting as sponsor 31 can, because we use these jobs to make sure our refactoring works.

comment:17 in reply to:  6 ; Changed 3 weeks ago by teor

Actual Points: 0.30.4
Status: newneeds_review

Replying to teor:

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

I have updated these branches with TRACE, and force-pushed.

I have updated these branches so they only use the stem tests that run tor (#30694), and force-pushed.

This should reduce the TRACE output, but maybe not by enough (we might need to tail the output once #30675 is implemented).

comment:18 Changed 2 weeks ago by atagar

Hi teor. If you're looking for a hack to cut down on the TRACE output then maybe pipe the test output to a 'grep -v SocketClosed'? Only a couple messages account for 90% of the output.

I'll probably add the debug file capability over the weekend. When I do I also plan to adjust whatever test is causing such voluminous logs.

comment:19 in reply to:  17 ; Changed 13 days ago by catalyst

Status: needs_reviewneeds_revision

Replying to teor:

Replying to teor:

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

I have updated these branches with TRACE, and force-pushed.

I have updated these branches so they only use the stem tests that run tor (#30694), and force-pushed.

This should reduce the TRACE output, but maybe not by enough (we might need to tail the output once #30675 is implemented).

It looks like the repetitive log messages are still quite numerous, and might obscure useful information. Maybe we want to do the grep -v for now?

comment:20 in reply to:  19 Changed 3 days ago by teor

Actual Points: 0.40.6
Status: needs_revisionneeds_review

Replying to catalyst:

Replying to teor:

Replying to teor:

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

...

This should reduce the TRACE output, but maybe not by enough (we might need to tail the output once #30675 is implemented).

It looks like the repetitive log messages are still quite numerous, and might obscure useful information. Maybe we want to do the grep -v for now?

See my pull request:

And test requests:

And a deliberate fail:

Let's deal with other logging fixes in other tickets.

comment:21 Changed 3 days ago by teor

Hmm, the output is good, but the | grep -v SocketClosed hides the exit status.

I added a commit that sets pipefail.

I'm not going to do a whole bunch of branches all over again.

Here are the updated ones:

0.3.5: ​https://github.com/torproject/tor/pull/1044

And a deliberate fail:

master: ​https://github.com/torproject/tor/pull/1112

comment:22 Changed 3 days ago by teor

Actual Points: 0.60.8

comment:23 Changed 3 days ago by teor

This change is going to conflict with #29280, let's try to merge #29280 first, then rebase #30591 on #29280's 0.3.5 branch.

(#29280 is more complex than #30591, which is a simple patch on 0.3.5, and a clean merge forward.)

comment:24 Changed 3 days ago by teor

Owner: changed from atagar to teor
Status: needs_reviewassigned

Fix owner.

comment:25 Changed 3 days ago by teor

Status: assignedneeds_review

comment:26 Changed 3 days ago by teor

Status: needs_reviewneeds_revision

Nope, still not failing CI when it should.

Note: See TracTickets for help on using tickets.