Opened 7 months ago

Closed 4 months ago

#30591 closed defect (fixed)

Make tor's Travis stem job log controller messages from stem and tor

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version: Tor: 0.3.5.1-alpha
Severity: Normal Keywords: teor-merge, tor-ci-fail-sometimes, stem, 041-should, 041-backport
Cc: Actual Points: 0.8
Parent ID: #29437 Points: 0.1
Reviewer: catalyst Sponsor: Sponsor31-can

Description

Diagnostics for #29437.

Child Tickets

TicketStatusOwnerSummaryComponent
#30592closedatagarHow do I make stem log info-level to test/data/tor_log?Core Tor/Stem
#30675closedatagarAdd an argument that makes stem TRACE output log to a fileCore Tor/Stem
#30694closedRestrict the tor CI stem job to tests that actually use torCore Tor/Tor

Change History (40)

comment:1 Changed 7 months ago by teor

Actual Points: 0.1
Status: newneeds_review
Version: Tor: 0.2.4.8-alphaTor: 0.3.5.1-alpha

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

comment:2 in reply to:  1 Changed 7 months ago by teor

Status: needs_reviewneeds_information

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

Ok, that didn't work. --log info just makes stem log info-level errors to the terminal. It doesn't change test/data/tor_log.

I'll open a child ticket to sort this out.

comment:3 Changed 7 months ago by teor

Component: Core Tor/StemCore Tor/Tor

comment:4 Changed 6 months ago by teor

Status: needs_informationneeds_revision

We'll need to revise this after #30592 merges into stem.

comment:5 Changed 6 months ago by teor

We need to add --log TRACE to stem's command line to get controller messages.
atagar recently changed stem to launch tor with logging at debug level, which should show us tor's control messages in the logs.

comment:6 in reply to:  1 ; Changed 6 months ago by teor

Actual Points: 0.10.3
Keywords: stem added
Status: needs_revisionneeds_review
Summary: Make Travis stem log at info level, and tail the logMake tor's Travis stem job log controller messages from stem and tor

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

I have updated these branches with TRACE, and force-pushed.

comment:7 Changed 6 months ago by dgoulet

Reviewer: catalyst

comment:8 Changed 6 months ago by nickm

FWIW, it appears that these are running into stem failures in travis as it stands -- which is exactly what they are supposed to help us diagnose, so those stem failures are not blockers here. I see Stem spamming these messages over and over:

05/28/2019 01:04:45 [INFO] Error while receiving a control message (SocketClosed): empty socket content
05/28/2019 01:04:45 [INFO] Error while receiving a control message (SocketClosed): received exception "I/O operation on closed file."

If we have not done so already, let's open a stem ticket to improve the output here: if the socket is closed, we probably shouldn't be retrying over and over.

comment:9 Changed 6 months ago by nickm

Keywords: 041-should added

comment:10 Changed 6 months ago by atagar

Hi Nick. For what it's worth those errors generally mean that a prior test crashed or otherwise buggered up its tor instance, causing future tests to be unable to authenticate.

What kind of output would you like to have? If it would be helpful I could make stem abort its test run when authentication begins failing. That would cut down on this kind of 'spam' but when this occurrences the proper next step are to look at the test and tor logs.

comment:11 in reply to:  10 Changed 6 months ago by nickm

Replying to atagar:

Hi Nick. For what it's worth those errors generally mean that a prior test crashed or otherwise buggered up its tor instance, causing future tests to be unable to authenticate.

What kind of output would you like to have? If it would be helpful I could make stem abort its test run when authentication begins failing. That would cut down on this kind of 'spam' but when this occurrences the proper next step are to look at the test and tor logs.

I think that the right response might be to abort whatever loop is spinning here rather than retrying it over and over? I wouldn't mind a few messages of this kind per test when something is broken, but spamming them like this tends to fill up CI logs. Anything that would make the output _shorter_ in this case would be cool with me.

comment:12 Changed 6 months ago by nickm

If anybody want to have a look at the failing case that I was talking about, let's see if there's any info we can get from https://travis-ci.org/torproject/tor/jobs/538005761 ? The log-spamming starts around line 4550

comment:13 Changed 6 months ago by catalyst

Status: needs_reviewneeds_information

The pull requests (other than the deliberate failure) seem to exceed the Travis max log length limitation. I'm not sure whether we need to try a different approach or if we should wait for stem to fix the log spamming on control connection close.

comment:14 Changed 6 months ago by atagar

As discussed over IRC the verbosity you're seeing are from trace logging. It's expected when you go to that level of logging that the output will get pretty big. After all, that's kinda the point. ;P

None the less I agree this went overboard. I'll look into disabling logging during the test that's causing that (I've done that elsewhere when a test causes too much trace logs). That said, the primary issue is that Stem lacks an option to log to a file rather than stdout. I'll add that per #30675.

comment:15 Changed 6 months ago by teor

Status: needs_informationnew

We can also reduce TRACE output by reducing the number of stem tests (#30694).

comment:16 Changed 6 months ago by teor

Sponsor: Sponsor31-can

Setting as sponsor 31 can, because we use these jobs to make sure our refactoring works.

comment:17 in reply to:  6 ; Changed 6 months ago by teor

Actual Points: 0.30.4
Status: newneeds_review

Replying to teor:

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

I have updated these branches with TRACE, and force-pushed.

I have updated these branches so they only use the stem tests that run tor (#30694), and force-pushed.

This should reduce the TRACE output, but maybe not by enough (we might need to tail the output once #30675 is implemented).

comment:18 Changed 6 months ago by atagar

Hi teor. If you're looking for a hack to cut down on the TRACE output then maybe pipe the test output to a 'grep -v SocketClosed'? Only a couple messages account for 90% of the output.

I'll probably add the debug file capability over the weekend. When I do I also plan to adjust whatever test is causing such voluminous logs.

comment:19 in reply to:  17 ; Changed 6 months ago by catalyst

Status: needs_reviewneeds_revision

Replying to teor:

Replying to teor:

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

See my pull requests:

I have updated these branches with TRACE, and force-pushed.

I have updated these branches so they only use the stem tests that run tor (#30694), and force-pushed.

This should reduce the TRACE output, but maybe not by enough (we might need to tail the output once #30675 is implemented).

It looks like the repetitive log messages are still quite numerous, and might obscure useful information. Maybe we want to do the grep -v for now?

comment:20 in reply to:  19 ; Changed 6 months ago by teor

Actual Points: 0.40.6
Status: needs_revisionneeds_review

Replying to catalyst:

Replying to teor:

Replying to teor:

Replying to teor:

I based these branches on #30234, because it hasn't been backported to 0.3.5 yet.

...

This should reduce the TRACE output, but maybe not by enough (we might need to tail the output once #30675 is implemented).

It looks like the repetitive log messages are still quite numerous, and might obscure useful information. Maybe we want to do the grep -v for now?

See my pull request:

And test requests:

And a deliberate fail:

Let's deal with other logging fixes in other tickets.

comment:21 Changed 6 months ago by teor

Hmm, the output is good, but the | grep -v SocketClosed hides the exit status.

I added a commit that sets pipefail.

I'm not going to do a whole bunch of branches all over again.

Here are the updated ones:

0.3.5: ​https://github.com/torproject/tor/pull/1044

And a deliberate fail:

master: ​https://github.com/torproject/tor/pull/1112

comment:22 Changed 6 months ago by teor

Actual Points: 0.60.8

comment:23 Changed 6 months ago by teor

This change is going to conflict with #29280, let's try to merge #29280 first, then rebase #30591 on #29280's 0.3.5 branch.

(#29280 is more complex than #30591, which is a simple patch on 0.3.5, and a clean merge forward.)

comment:24 Changed 6 months ago by teor

Owner: changed from atagar to teor
Status: needs_reviewassigned

Fix owner.

comment:25 Changed 6 months ago by teor

Status: assignedneeds_review

comment:26 Changed 6 months ago by teor

Status: needs_reviewneeds_revision

Nope, still not failing CI when it should.

comment:27 Changed 5 months ago by catalyst

Here's another one that exceeded the Travis max log length
https://api.travis-ci.org/v3/job/551341879/log.txt

comment:28 in reply to:  21 ; Changed 5 months ago by teor

I tried a fix, but the grep hid the exit status.
And I tried pipefail, but that didn't work.

I don't know what else to try, do you have any ideas, catalyst?

Replying to teor:

Hmm, the output is good, but the | grep -v SocketClosed hides the exit status.

I added a commit that sets pipefail.

I'm not going to do a whole bunch of branches all over again.

Here are the updated ones:

0.3.5: ​https://github.com/torproject/tor/pull/1044

And a deliberate fail:

master: ​https://github.com/torproject/tor/pull/1112

comment:29 in reply to:  28 Changed 5 months ago by catalyst

Replying to teor:

I tried a fix, but the grep hid the exit status.
And I tried pipefail, but that didn't work.

If /bin/sh is dash, it might not have set -o pipefail, which seems to be a bash extension.

I don't know what else to try, do you have any ideas, catalyst?

I think there's some useful incantation that threads echo $? around through a bunch of file descriptors through several layers of subshells, though.

e.g.,

$ ( ( ( (echo foo; echo bar; echo baz; false); echo $? >&6 ) | grep -v bar >&7 ) 6>&1 | exit `cat` ) 7>&1 || echo fail
foo
baz
fail

but I'm trying to figure out if there's a simpler way to accomplish that file descriptor plumbing.

comment:30 Changed 5 months ago by atagar

Hi teor. Why are you trying to use grep for this? We added a --log-file argument specifically to avoid this.

https://trac.torproject.org/projects/tor/ticket/30591
https://gitweb.torproject.org/stem.git/commit/?id=d92936c

comment:31 in reply to:  20 ; Changed 5 months ago by teor

Status: needs_revisionneeds_review

Replying to atagar:

Hi teor. Why are you trying to use grep for this? We added a --log-file argument specifically to avoid this.

https://trac.torproject.org/projects/tor/ticket/30591
https://gitweb.torproject.org/stem.git/commit/?id=d92936c

I was trying to keep logging progress during the tests.

But you're right, we should log to a file, then use grep on the log file.
(Because there's a limit to how much output Travis will store, even on failure.)

I updated all these pull requests:

And test requests:

And a deliberate fail:

comment:32 Changed 5 months ago by teor

The log output for this job looks fine, and it fails as expected:
https://github.com/torproject/tor/pull/1112

We'll still need to work out how to find the logs from the hang 10 minutes ago, but let's do that once we've put useful information in the logs in #30901.

comment:33 in reply to:  31 Changed 4 months ago by catalyst

Status: needs_reviewmerge_ready

Replying to teor:

Replying to atagar:

Hi teor. Why are you trying to use grep for this? We added a --log-file argument specifically to avoid this.

https://trac.torproject.org/projects/tor/ticket/30591
https://gitweb.torproject.org/stem.git/commit/?id=d92936c

I was trying to keep logging progress during the tests.

But you're right, we should log to a file, then use grep on the log file.
(Because there's a limit to how much output Travis will store, even on failure.)

I updated all these pull requests:

And test requests:

And a deliberate fail:

Thanks! These look good. I think you're right that we might have to wait until we add the logging in #30901 to get useful information on the hangs.

comment:34 Changed 4 months ago by catalyst

Also the 0.4.1 and master pull requests apparently have merge conflicts now, but they don't look too bad.

comment:35 Changed 4 months ago by nickm

This looks okay to me. We may want further refinements on how we show the log, but it won't hurt anybody to merge this to 0.4.1. Please let me know whether you would prefer to merge now or wait until #30901 has landed.

comment:36 in reply to:  35 Changed 4 months ago by teor

Replying to nickm:

This looks okay to me. We may want further refinements on how we show the log, but it won't hurt anybody to merge this to 0.4.1. Please let me know whether you would prefer to merge now or wait until #30901 has landed.

Let's merge now, because the stem job currently fails with:

The job exceeded the maximum log length, and has been terminated.

https://travis-ci.org/torproject/tor/jobs/566642646#L30259

comment:37 Changed 4 months ago by teor

Keywords: 041-backport added

comment:38 Changed 4 months ago by teor

Keywords: nickm-merge dgoulet-merge asn-merge added

Please merge this ticket to master and 0.4.1 to fix the stem CI job.

comment:39 Changed 4 months ago by teor

Keywords: teor-merge added; nickm-merge dgoulet-merge asn-merge removed

Actually, I'm going to do the merge, because I'm allowed to do that when CI is broken and blocking other tasks.

comment:40 Changed 4 months ago by teor

Resolution: fixed
Status: merge_readyclosed

Backported to 0.3.5 and later, because I can merge to master to fix CI.
(Also, broken CI blocks me from doing backports.)

Merged #30213, #29280, and #30591 together.

Note: See TracTickets for help on using tickets.