Opened 6 years ago

Closed 6 years ago

#2812 closed defect (fixed)

TorCtl Zombie Connections

Reported by: atagar Owned by: mikeperry
Priority: Medium Milestone:
Component: Torctl Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi, when TorCtl fails to authenticate it leaves a connection to the ControlPort. To repro...

  1. Start TBB with it configured to use cookie authentication. TBB uses relative paths for cookies so TorCtl is gonna fail to connect (separate issue, related to https://trac.torproject.org/projects/tor/ticket/1101). On a side note, arm has a somewhat platform specific workaround for this (no windows) if you're interested.
  1. Check initial connections. You'll see one for Vidalia.

atagar@fenrir:~/Desktop/arm/src$ netstat -np | grep "127.0.0.1:9051.*ESTABLISHED 5267/tor"
tcp 0 0 127.0.0.1:9051 127.0.0.1:36863 ESTABLISHED 5267/tor

  1. Have TorCtl try (and fail) to connect a few times:

from TorCtl import TorCtl
TorCtl.connect()

Failed to read authentication cookie (file doesn't exist): ./Data/Tor/control_auth_cookie

TorCtl.connect()

Failed to read authentication cookie (file doesn't exist): ./Data/Tor/control_auth_cookie

  1. Check again with netstat. You'll see a new connection for each attempt you made:

atagar@fenrir:~/Desktop/arm/src$ netstat -np | grep "127.0.0.1:9051.*ESTABLISHED 5267/tor"
tcp 0 0 127.0.0.1:9051 127.0.0.1:45175 ESTABLISHED 5267/tor
tcp 0 0 127.0.0.1:9051 127.0.0.1:36863 ESTABLISHED 5267/tor
tcp 0 0 127.0.0.1:9051 127.0.0.1:45177 ESTABLISHED 5267/tor

Note that calling close on the connections or sockets is *not* sufficient (nor is most other things I've tried this last hour). The only thing I've found that does the trick is calling "s.shutdown(socket.SHUT_RDWR)" on the failed socket (see attached patch).

Cheers! -Damian

Child Tickets

Attachments (1)

torCtlFix_2812.txt (845 bytes) - added by atagar 6 years ago.
Terminates failed connections, forcing the socket to release.

Download all attachments as: .zip

Change History (11)

Changed 6 years ago by atagar

Terminates failed connections, forcing the socket to release.

comment:1 Changed 6 years ago by atagar

  • Component changed from - Select a component to Torctl

comment:2 Changed 6 years ago by atagar

  • Status changed from new to needs_review

comment:3 Changed 6 years ago by atagar

Ack, this is worse than I thought. There's a general failure with the mechanism by which TorCtl connections are released. Even when successfully connected the close() method does not cause it to release from the control port so the following:

from TorCtl import TorCtl
conn = TorCtl.connect()
conn.close()

INFO[Tue Mar 29 08:43:32 2011]:Event loop received close message.

conn = TorCtl.connect()
conn.close()

INFO[Tue Mar 29 08:43:35 2011]:Event loop received close message.

Results in two connections remaining to the control port for the life of the python process. I'll attach an updated patch to address this.

comment:4 Changed 6 years ago by atagar

Pesky. The proper solution is to add 'self._s.shutdown(socket.SHUT_RDWR)' prior to the close in TorUtil.BufSocket. That is to say...

Index: src/TorCtl/TorUtil.py
===================================================================
--- src/TorCtl/TorUtil.py (revision 24405)
+++ src/TorCtl/TorUtil.py (working copy)
@@ -221,6 +221,7 @@

self._s.send(s)


def close(self):

+ self._s.shutdown(socket.SHUT_RDWR)

self._s.close()


# SocketServer.TCPServer is nuts..

Index: src/TorCtl/TorCtl.py
===================================================================
--- src/TorCtl/TorCtl.py (revision 24499)
+++ src/TorCtl/TorCtl.py (working copy)
@@ -109,7 +109,7 @@

than prompting the user)

"""


  • s = None

+ conn = None

try:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((controlAddr, controlPort))

@@ -133,10 +133,10 @@

print "Connection refused. Is the ControlPort enabled?"

else: print "Failed to establish socket: %s" % exc


  • if s: s.shutdown(socket.SHUT_RDWR)

+ if conn: conn.close()

return None

except Exception, exc:

  • if s: s.shutdown(socket.SHUT_RDWR)

+ if conn: conn.close()

if passphrase and str(exc) == "Unable to authenticate: password incorrect":

# provide a warning that the provided password didn't work, then try

===================================================================

In general this works fine, except that when I try to join on TorCtl.Connection._thread after the close() it runs into deadlock (evidently this change means that _loop never terminates, my guess is that _read_reply() is blocking).

I'm not quite sure of the proper workaround for this. The earlier fix addresses the issue I'm primarily concerned about with arm so I'll leave it with that for now. -Damian

comment:5 Changed 6 years ago by atagar

Extra gotcha, the call throws socket.error if the socket doesn't exist so should be wrapped in a try/catch:

try: s.shutdown(socket.SHUT_RDWR)
except socket.error: pass

comment:6 Changed 6 years ago by mikeperry

Hrmm. Are you eating exceptions somewhere in your TorCtl? Why is your TorCtl even giving you a Connection object back when auth fails? Mine throws the exception up to the python prompt when I try to reproduce this, making me unable to even call close. This makes sense that there's a leaked socket in that case. We need to catch the auth error and close it in the connect() call.

You'll also need to merge your other comments into a single diff for me to more easily understand them. They seem orthogonal to this particular issue, but pasted diffs-as-comments are not really easy to understand, so perhaps I'm missing some additional issues.

comment:7 Changed 6 years ago by atagar

  • Owner set to mikeperry
  • Status changed from needs_review to assigned

Mike and I discussed this on irc. This should be simple to fix though there's still some mystery around why adding the shutdown effects the _thread. I might take another pass at this if Mike doesn't beat me to it.

17:32 < atagar> mikeperry: are you confusing the connect with the pathsupport connect?
17:32 < atagar> in torctl it doesn't raise an exception: https://gitweb.torproject.org/pytorctl.git/blob/HEAD:/TorCtl.py
17:33 < atagar> er, meant line 129
17:33 < mikeperry> it doesn't but authenticate does
17:34 < mikeperry> ah but that is caught
17:34 < atagar> right
17:34 < mikeperry> ah, I was just confused by your example then
17:34 < atagar> (in retrospect we should change that)
17:34 < mikeperry> conn is none for me there, but you didn't indicate that
17:35 < atagar> I didn't show the return result in the example (nore is it presented if none). That was copy-and-paste from the interpretor.
17:35 < mikeperry> I think just closing the socket is not enough, we should also be closing the thread
17:36 < atagar> you mean for the deadlock issue discussed at the end?
17:37 < mikeperry> yeah
17:43 < atagar> Sounds good. Would you still like another patch for the fixes-this-but-causes-deadlock against the current git master?
17:45 < atagar> In looking at the arm close function this sounds perfect (http://pastebin.com/TVCU6Ntk). The only reason I'm joining on _thread is to prevent the syshook error from an orphaned thread during shutdown.
17:46 < atagar> (though I'm still not sure why shutting down the socket causes _thread to never terminate as it did previously)
17:54 < mikeperry> did you conn.close() there without the socket shutdown?
17:57 < atagar> yes - is it expected that the caller is doing that?
17:58 < mikeperry> no. I think it should be part of the fix if we're going to return none from connect
17:59 < mikeperry> that should handle thread shutdown.. though I wonder if it is because of the socket.error issue that it doesn't
17:59 < mikeperry> there should be no need to directly close the socket if conn.close() is properly called
17:59 < atagar> There's two issues here - the connect, when it fails, should be closing the socket. But even when it succeeds calling close on the TorCtl object should be shutting down the socket - no?
18:01 < atagar> (and, ideally, terminating _thread)
18:01 < mikeperry> close() should be terminating _thread
18:01 < atagar> s/terminating/joining on
18:10 < atagar> mikeperry: Anything you want from me? I can take another stab at the deadlock issue in a bit if you want (I wasn't sure if there was a reason for _thread being left alone, hence leaving this to you).
18:21 < mikeperry> atagar: _thread should terminate on its own, printing "Closed control connection. Exiting thread."
18:21 < mikeperry> it is set to be a daemon thread, so I *think* this means it is a detatched pthread underneath and does not need joining
18:22 < mikeperry> but nickm wrote the initial threading model here
18:22 < atagar> hm, I'm suspecting that even daemon threads need to be terminated before the python process is shut down to avoid syshook errors
18:23 < mikeperry> that is possible, but the python docs certainly seem to make out like they are meant to be allowed to be running at shutdown
18:25 < atagar> Yup, sounds like. I'm not quite sure what was up with that syshook error then. :/

comment:9 Changed 6 years ago by atagar

  • Status changed from assigned to needs_review

Fix checked into my 'bug2812' branch which shuts down closed sockets and cleans up after failed connection attempts:
https://gitweb.torproject.org/atagar/pytorctl.git/commit/477c1bcde44e92243bf061f565d9403fa8e31276

comment:10 Changed 6 years ago by mikeperry

  • Resolution set to fixed
  • Status changed from needs_review to closed

Looks good. Merged.

Note: See TracTickets for help on using tickets.