wiki:doc/WindowsBufferProblems

Problem overview

Feb 06 02:47:39.469 [err] do_main_loop(): select failed: No buffer space available [WSAENOBUFS ] [10055]

If your Tor server is experiencing a problem with "[WSAENOBUFS] [10055]" error messages while running Tor, you are experiencing Flyspray Bug 98. This is a well known, and apparently commonly experienced, bug with running Tor servers on non-server versions of Microsoft Windows 98, ME, 2000, and XP.

The official Microsoft description for WSAENOBUFS is:

WSAENOBUFS
10055

No buffer space available.
    An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

The WSAENOBUFS is related to a buffer used for data before and after it traverses the TCP/IP stack. As far as we can tell, there is no common hardware or software platform for those who experience this problem.

Running a Tor server on a vanilla XP install does not (easily) trigger the problem. But it can be consistently reproduced if you also run TCP/IP intensive applications such as P2P clients (BitTorrent, eDonkey, eMule, etc).

The result is that the activity overloads the TCP/IP stack. Since network drivers share the same buffers, often the whole network on the computer ceases to work, and it requires a reboot to fix.

Things that are not the problem

This error is entirely unrelated to the WSAENOCONN error WinXP Home and Pro users commonly experience. The error messages are different: WSAENOCONN causes Event Log entries such as "EventID 4226: TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts". TCPIP.SYS in XP is hardcoded to a limit of 10 half-open connections per second. A sufficiently high bandwidth Tor exit server WILL experience this error, but this does not cause Tor to crash (though it does cause some outbound connections to fail, and eventually we should build some workarounds for this). SpeedGuide.net provides a more detailed explanation.

So what IS the problem?

We're not totally sure. But we have a theory.

First, some background. One of the ways Windows does networking with lots of connections at once is with an approach called "overlapped IO". Basically you hand it a socket, a length, and a buffer, and tell it to either read or write, and Windows will take it from there and let you know when it's done.

Quoting from http://www.codeproject.com/internet/IOCP_Server_client.asp?msg=1187159:

With every overlapped send or receive operation, it is possible that the data buffer submitted will be locked. When memory is locked, it cannot be paged out of physical memory. The operating system imposes a limit on the amount of memory that can be locked. When this limit is reached, the overlapped operations will fail with the WSAENOBUFS error.

But Tor doesn't use overlapped IO: it uses the select() system call to learn when sockets are available for reading or writing, and then uses non-blocking writes and reads to send and receive data.

So our theory is that when we send() and recv(), Windows copies the contents of the buffer into a kernel buffer. If we send or recv too much at once, Windows runs out of kernel buffer space.

Our current plan is that we need to abandon select() on Windows in favor of overlapped IO. This involves three steps. Step one is to add overlapped IO support to libevent. (Libevent already has a notion of a buffer api, so we could extend that.) Step two is to change the way Tor calls OpenSSL, so it operates on local buffers rather than interacting with the network itself (presumably using recv and send). The third step is to change Tor's networking loop to use libevent's buffer API rather than the socket API. If you'd like to help with any of these steps, let us know!

Another guess is that the loop around select() is buggy in the Windows libevent implementation. Tor is the only high-performance user of libevent on Windows as far as we know, so this is quite possible. Check out the code here: http://cvs.sourceforge.net/viewcvs.py/levent/libevent/WIN32-Code/win32.c?view=auto

How to make it break less quickly

You can try increasing the priority of Tor, Privoxy, and Vidalia in Taskmanager by hitting CTRL-ALT-DEL, going to the processes tab, and right clicking on each process and changing the priority to "Above normal". You can use Prio to make this automatic every time you start Tor.

You can also screw with the registry:

The following registry entries have been shown to mitigate the buffer issues to varying degrees of success. As always, if you do not understand the Windows Registry, and RegEdit, do not attempt these modifications. Your mileage may vary.

At least one user has reported success by following the instructions from http://web.ircsystems.net/codemastr/bufspace.html:

To do this go to Start, Run and type regedit. In the left pane navigate to 
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters once 
there, you must create the entry TcpNumConnections. To do this, right click in 
the right pane and choose new from the menu and select DWORD Value. Give it the 
name TcpNumConnections. Then right click it and select modify and enter a value 
of 800. Then restart your computer.

There are a few TCP related registry entries that potentially manipulate the internal buffer size available for data to be passed through the tcp stack. Manipulating HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize and TcpWindowSize to 0xfaf00 (1027840) seemed to increase the time to failure when running Tor and BitTorrent.

Configuring HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts="3" also seemed to help the exit server last longer. Setting this to "1" is another option as it doesn't remove 12-bytes from every header for timestamp placement. However, Tor seems to have lots of odd packet problems on an exit server (as shown by ethereal, lots of re-transmits, lost ACKs, etc), and the "3" solution seemed to quiet these things down. (Only packet headers were captured during the tests, not actual data.)

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\SackOpts="1" is another helpful setting.

An experimental feature recently added to Tor that constrains the send and receive socket buffer sizes may also reduce or alleviate this problem. If your Tor version supports it, try the following option in your configuration:

ConstrainedSockets 1

Some more data points

It appears that a system with 384MB of ram or greater, a fresh install of Win XP Home, fully patched via Windows Update, and solely running a Tor exit server does not experience these problems. This is true for both 0.1.0.16-stable and 0.1.1.12-alpha versions of Tor. The configuration of tor is a simple exit server with no bandwidth limits, burst restrictions, nor hibernation.

We continue to debug this issue. Recent tests show that total available ram at boot time correlates with the creation of the [WSAENOBUFS] error. The amount of memory available to the system was configured via the C:\boot.ini option of /MAXMEM=###. The results are as follows:

*At /MAXMEM=128, simply starting up the tor server was enough to create a [WSAENOBUFS] error. *At /MAXMEM=256, the tor server did create a [WSAENOBUFS] error. Time varied from 2-5 hours. *At /MAXMEM=384, the tor server did not create a [WSAENOBUFS] error after 6 hours. *At /MAXMEM=512, the tor server did not create a [WSAENOBUFS] error after 6 hours. Further investigation is needed at this memory level. *At /MAXMEM=1024, the tor server did not create a [WSAENOBUFS] error after 48 hours.

We've learned that Windows does allocate large chunks of memory per socket on connect. See this graphic of Non-Paged Pool Behavior in Win XP. It appears we are consuming against a hard limit, unable to be configured through registry settings. MSDN articles refer to a hardcoded algorithm in non-server editions of Windows that determine non-paged pool size at boot. At this time, the memory factor, along with heavy network usage, appear to be the causes of the [WSAENOBUFS] error.

Alternative solutions

Virtualization doesn't help solve the underlying problem, but perhaps helps build the installed base. For lateral thinkers, VMWare Player (available at no cost) can be used by Windows users to run Tor on Linux. In particular the Browser Appliance available here might be a good starting point for a web client. There are many other VMWare Appliances which may also be easily modified to use Tor.

The JanusVM appliance provides a transparent proxy using Tor on Linux inside VMWare. This could also be used for acting as a server with the usual configuration applied to the following files at console:

*Configure the JanusVM to use a static IP address instead of DHCP using the menu. *Edit the /etc/tor/torrc file with desired Tor Server settings. *Modify firewall rules in /etc/init.d/janus.routing-with-tor to accept incoming requests on the ports required. *Reboot the virtual machine.

Please consider the Operational Security requirements of running a Tor server before deploying on a VM just as you would for any other type of host.

Last modified 3 years ago Last modified on Jun 11, 2011 3:23:09 PM