to be more clear, integ.control.base_controller stops and doesn't proceed (as arma pointed out "hangs up" means "disconnects" which isn't the case here)
Great, though I'm having difficulty doing the same. When running the integ tests against tor commit ef0bc7f (the one that you're using in the other ticket) I get an error from the base_controller integ, but it doesn't hang. I've fixed the error - mind seeing if it runs for you now?
It still hangs on the latest master :( Should I try running against a different Tor?
Nope. We can now either try to figure out how our systems differ or try to do debugging on yours. Personally I favor the later - first step of that is to narrow this down to a single test. Add "self.skipTest("blah")" to all of the base_controller integ tests, and then try removing them until you've figured out which test is choking. This'll narrow the haystack quite a bit.
Add "self.skipTest("blah")" to all of the base_controller integ tests, and then try removing them until you've figured out which test is choking. This'll narrow the haystack quite a bit.
test_msg, test_msg_invalid, test_msg_invalid_getinfo, test_msg_repeatedly, test_status_notifications are the ones that choke.
test_msg, test_msg_invalid, test_msg_invalid_getinfo, test_msg_repeatedly, test_status_notifications are the ones that choke.
Yikes. Ok, lets just focus on the simplest of those tests (test_msg). That does three things...
constructs a BaseController around a control socket
sends it 'GETINFO config-file', checking its result
closes the controller
Skip all tests but test_msg and try adding print statements to figure out which of those points its hanging on. Ideally try to narrow it down to the actual like that it's getting stuck on. My guess is that there's some call (like send()) that I didn't think would be blocking but is for some reason on your system.
> /Users/sathya/Documents/stem/test/integ/control/base_controller.py(151)test_asynchronous_event_handling()-> class ControlledListener(stem.control.BaseController):(Pdb) c> /Users/sathya/Documents/stem/test/integ/control/base_controller.py(70)test_connect_repeatedly()-> with test.runner.get_runner().get_tor_socket() as control_socket:(Pdb) c> /Users/sathya/Documents/stem/test/integ/control/base_controller.py(45)test_from_port()-> if test.runner.Torrc.PORT in test.runner.get_runner().get_options():(Pdb) c> /Users/sathya/Documents/stem/test/integ/control/base_controller.py(57)test_from_socket_file()-> if test.runner.Torrc.SOCKET in test.runner.get_runner().get_options():(Pdb) c> /Users/sathya/Documents/stem/test/integ/control/base_controller.py(83)test_msg()-> with test.runner.get_runner().get_tor_socket() as control_socket:(Pdb) n
That's interesting. I assume that this is the default target, which means that stem is trying to establish a socket connection to tor (which it has already done dozens of times before) but somehow getting stuck. Could you please dig into get_tor_socket() to figure out if it's getting stuck on the connection or authentication? It would be helpful to figure out the last line in the stem codebase where we get stuck. One possability is that we're sending a PROTOCOLINFO and never receiving a response from tor, which would be... bad.
It's weird that the bug only manifests when multiple tests are run. We have a with clause wrapping all of the ControlSockets here, so there shouldn't be bleeding between the tests... I'm puzzled.
I've reproed this on a mac and spent the last four hours unsuccessfully trying to figure it out. Here's what I know so far...
The issue always occurs in test_status_notifications(), when we're sending garbage data to the socket prior to calling AUTHENTICATE (which causes tor to terminate the socket connection). We're hanging up on the readline() call when trying to receive the response, which should not be blocking (it should either read content or error).
This issue only manifests if we also run a couple other base_controller integ tests that put a heavy load on the socket prior to the problem. From what I can tell nothing should be bleeding between the tests, so my guess is that the prior activity must be putting the socket into some problematic state.
This occurs with both port and file based sockets.
I'm using a socket based file object (derived from socket.makefile()) to have an efficient method for reading newline delimited responses. I haven't a clue why it's only blocking on macs, and only after encountering a heavy load (python bug?).
Guess that tomorrow I'll try removing our usage of a socket file, and make raw recv() calls to see if that works.
Bad news is that the problem isn't that the readline() simply becomes blocking. Somehow our ability to talk with tor's control socket breaks, and all further calls fail. Even with new sockets...
Traceback:
File "/Users/atagar/stem/test/integ/control/base_controller.py", line 95, in test_msg_repeatedly
with test.runner.get_runner().get_tor_socket() as control_socket:
File "/Users/atagar/stem/test/runner.py", line 497, in get_tor_socket
stem.connection.authenticate(control_socket, CONTROL_PASSWORD, self.get_chroot())
File "/Users/atagar/stem/stem/connection.py", line 291, in authenticate
raise AuthenticationFailure("socket connection failed" % exc)
AuthenticationFailure: socket connection failed
Naturally all further tests fail. The tor instance itself is fine, I can still use telnet to connect and authenticate. This is... extremely weird.
At this point I'm completely stuck. The control socket is going into CLOSE_WAIT, and after creating roughly three hundred connections we're unable to make any more. I've scoured forums and the python bugtracker for hours without finding a solution.
Tested on a mac and they now pass without freezing. It doesn't look like this is a stem bug so resolving in favor of #6235 (moved). Feel free to reopen if you still get freezing issues or I'm wrong about it not being a stem bug.
Trac: Status: new to closed Resolution: N/Ato duplicate