control.base_controller... success (1.23s) control.controller... No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-receivedThe build has been terminated
Hi rl1987. I've seen that from time to time too but haven't dug in since it's pretty infrequent. Lack of a consistent repro is going to make this a bit troublesome to solve.
First step if someone would care to dig in is to run our tests with '--verbose --log TRACE' until we get a hang and attach the output for the controller tests.
Our CI has a 10,000 line log limit, and we're already around 5,000.
Normally, we would send the logs to a file, then tail them on error. But we can't do that if the process hangs, because the tail command would never run.
We may be able to trigger the tail using a sleep command, after the process has hung for 50 minutes?
(Our builds normally take 5-10 minutes.)
I find that this (or something similar) happens for me on my desktop: I get long timeouts on control.controller....
On closer investigation, though, it seems to happen every time when I'm running with Python>=3.6, and not when I am running with Python<=3.5. I don't know if this trend will happen for others, though. Please let me know if there's anything I can do to help diagnose what's actually going on with these stalls.
I've investigated this for a while; here's what I found.
I investigated the stall in test.integ.client.connection, which I see with Python >= 3.6 but not Python <= 3.5. I used a hacked up version of stem and a hacked up version of tor that logged a bunch of extra information. I found that the buggy behavior, from Stem's point of view, is that Tor accepted a VERSIONS cell but never replied.
From Tor's point of view, however, Stem negotiated a v1 link connection with it, and so there is no such thing as a variable-length cell or a VERSIONS cell.
Why does Tor think that stem negotiated a v1 link connection? It decides this in tor_tls_server_info_callback(), when it first calls tor_tls_client_is_using_v2_ciphers(). For some reason, SSL_get_client_ciphers() returns null at this point, meaning "the client has declared no ciphers" and convincing tor that the v1 link protocol is in use. This result of scanning the ciphers is cached through subsequent calls. If I remove the code in Tor that caches this result, then the stem test passes, since later calls to SSL_get_client_ciphers() do in fact return non-NULL.
Why does SSL_get_client_ciphers() return NULL for python3.6, but not python3.5?
When Python 3.5 is in use, the session_secret_cb function is invoked before the tor_tls_server_info_callback() function triggers cipher classification, and at that point, the ciphers are set. When Python 3.6 is in use, the tor_tls_server_info_callback() function is hit twice early on to hit cipher classification: once with the ciphers not set, and then later with the ciphers set.
I have a fix for this issue that works for me in #30021 (moved). I do not completely understand why this issue occurs with Python 3.6's usage of openssl, however.