Opened 11 years ago

Last modified 7 years ago

#939 closed defect (Fixed)

Our socket count is below zero

Reported by: arma Owned by:
Priority: Low Milestone:
Component: Core Tor/Tor Version: 0.2.1.12-alpha
Severity: Keywords:
Cc: arma, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Running moria1 on 0.2.1.12-alpha-dev (r18691), it ran for a few weeks. Then
I ^c'ed it to move to 0.2.1.13-alpha, and its last words were

Mar 09 17:22:50.253 [warn] tor_close_socket(): Bug: Our socket count is below zero: -1. Please submit a bug report.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (6)

comment:1 Changed 11 years ago by nickm

So, Tor keeps a count of the total number of sockets it has open, so that it can notice when it is about to run
out of resources. This count is in the variable n_sockets_open in compat.c. The function tor_close_socket()
decrements the count. The functions tor_open_socket() and tor_accept_socket() and tor_socketpair() increment
it. (You can ignore the mark_sockets_open and DEBUG_SOCKET_COUNTING parts when you review this; they are debugging
code that is off by default. You can also ignore the WIN32 code; this report was from moria, which runs Linux.)

If the count gets negative, the possibilities seem to be:

  • We are using tor_close_socket() to close a socket that we did not open with tor_open_socket() or tor_accept_socket().
  • We are closing some socket twice, and the code that's already trying to catch this case in tor_close_socket() is broken somehow. (If the code is right, it should notice that close() has failed in this case with EBADF.)
  • There's a race condition in the code somewhere. I don't think we close or open sockets from any subthreads, but if we do, that could conceivably cause this, _and_ explain why it only happens rarely. [Ignore any closes that only happen if TOR_IS_MULTITHREADED is not defined; we always build with threads on Linux.]
  • Something else I haven't thought of!

The secret of bugfixing is to remember that if everything were working the way it was supposed to be working,
there would be no bug. Therefore, do not assume that anything is working right.

ETA: As a corollary to "the secret of bugfixing", the possibility that I'm wrong about something above should not be
overlooked.

comment:2 Changed 10 years ago by nickm

Oh say. There is conceivably a race condition in the tor_close_socket() at the end of cpuworker_main() that could
conceivably mess up the socket count.

comment:3 Changed 10 years ago by nickm

So assuming this is a race condition, option one is to add locking to the socket count code.

Option two is to not count the second half of the multithreaded cpuworker's socketpair against the socket total,
so that we don't need to decrement that total when the socketpair closes. This just creates another bug, though,
in that we could run into our socket limit if there were a lot of cpuworkers.

Option three is for the main thread to handle closing the second half of the socketpair when the cpuworker thread
closes if we're multithreaded.

Option one seems simplest. Thoughts? I've hacked it up against maint-0.2.1; you can see it at my public repo
git://git.torproject.org/~nickm/git/tor.git in the branch bug939_lock_socket_count

comment:4 Changed 10 years ago by nickm

Merged and closed.

comment:5 Changed 10 years ago by nickm

flyspray2trac: bug closed.

comment:6 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.