Opened 9 years ago

Last modified 7 months ago

#98 assigned defect (None)

WSAENOBUFS: Running out of buffer space on Windows

Reported by: spy1 Owned by:
Priority: major Milestone: Tor: unspecified
Component: Tor Version: 0.0.9.4
Keywords: tor-relay Cc: spy1, ePokruphos, nickm, arma, Sebastian, phobos
Actual Points: Parent ID: #1753
Points:

Description (last modified by phobos)

Feb 06 02:47:39.469 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

Feb 06 12:51:31.380 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

Feb 06 14:25:25.031 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

Feb 07 02:15:17.138 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

[Automatically added by flyspray2trac: Operating System: Windows 2k/XP]

Child Tickets

Attachments (2)

b8zs_tor.PNG (14.1 KB) - added by phobos 9 years ago.
b8zs memory when tor fails
VidaliaLog-09.21.2013.txt (3.8 KB) - added by vmon 7 months ago.

Download all attachments as: .zip

Change History (89)

comment:1 Changed 9 years ago by nickm

I did some googling, and I think that isn't a Tor bug.
hang on, where's that web page?
Read http://www.jsiinc.com/subb/tip0900/rh0914.htm

<spy1> WHICH web page?

> Or http://faq.proxyplus.de/art10002.htm
If the advice there works, let me know.
if it doesn't work, let me know

<goodell> nickm2: eek. Does this mean that Tor will have to start frobbing Windows registries just to be able to get more than 5000 ephemeral TCP ports?

> Either that, or Tor server operators will. Apparently, windows likes to spell 'ulimit -n 10000' funny

<goodell> Yeah, my thoughts exactly. But at least that answers my question
+of whether Windows has an analogous problem.

<spy1> Well. THAT should keep me busy for the rest of the day. Thank you for the links.

> The problem really isn't the number of sockets, though. it's the fact that sockets remain allocated for 240 seconds after the application closes them!

comment:2 Changed 9 years ago by nickm

So the solution seems to be: see the suggested fix on those pages works. If so,
consider whether Tor should 1) alter the registry entries automatically and/or
2) be more robust against WSAENOBUFS.

comment:3 Changed 9 years ago by spy1

Feb 06 00:50:31.002 [notice] circuit_log_path(): circ (length 3, exit rootdown): dizum(open) tongatest(open) rootdown(open)
Feb 06 00:58:47.967 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 06 00:59:00.155 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 06 01:53:31.018 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 06 01:53:31.018 [notice] circuit_log_path(): circ (length 3, exit rodos): countach(open) wuschelpuschel(open) rodos(open)
Feb 06 02:02:19.958 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 06 02:02:36.612 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 06 02:02:37.013 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 06 02:02:37.013 [notice] circuit_log_path(): circ (length 3, exit masquerade): gamma(open) tongatest(open) masquerade(open)
Feb 06 02:47:39.469 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

Feb 08 01:31:33.002 [notice] connection_ap_handshake_attach_circuit(): Giving up on unattached conn (61 sec old).
Feb 08 01:34:51.727 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 01:34:52.498 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 01:34:52.568 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 01:36:03.761 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 01:36:06.174 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 01:36:08.688 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 01:53:41.001 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 08 01:53:41.001 [notice] circuit_log_path(): circ (length 3, exit simonthesourcerer): gamma(open) Tonga(open) simonthesourcerer(open)
Feb 08 01:54:23.001 [notice] connection_ap_handshake_attach_circuit(): Giving up on unattached conn (60 sec old).
Feb 08 01:59:05.307 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 02:17:29.455 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 02:17:29.746 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 02:17:40.691 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 02:44:45.007 [notice] connection_ap_handshake_attach_circuit(): Giving up on unattached conn (60 sec old).
Feb 08 02:52:18.479 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

Feb 08 08:43:20.217 [notice] Tor 0.0.9.4 opening new log file.
Feb 08 08:43:27.106 [notice] circuit_send_next_onion_skin(): Tor has successfully opened a circuit. Looks like it's working.
Feb 08 08:55:00.002 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 08 08:55:00.002 [notice] circuit_log_path(): circ (length 3, exit bmwanon): tor26(open) Tonga(open) bmwanon(open)
Feb 08 08:59:37.004 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 08 08:59:37.004 [notice] circuit_log_path(): circ (length 3, exit athenathegodess): masquerade(open) Omega(open) athenathegodess(open)
Feb 08 09:07:22.003 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 08 09:07:22.003 [notice] circuit_log_path(): circ (length 3, exit Omega): datenhalde(open) c3po(open) Omega(open)
Feb 08 09:22:39.339 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

comment:4 Changed 9 years ago by spy1

Feb 08 12:08:18.796 [notice] Tor 0.0.9.4 opening new log file.
Feb 08 12:08:28.530 [notice] directory_handle_command_get(): Client asked for the mirrored directory, but we don't have a good one yet. Sending 503 Dir not available.
Feb 08 12:08:39.025 [notice] circuit_send_next_onion_skin(): Tor has successfully opened a circuit. Looks like it's working.
Feb 08 12:08:44.062 [notice] directory_handle_command_get(): Client asked for the mirrored directory, but we don't have a good one yet. Sending 503 Dir not available.
Feb 08 12:10:11.117 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 08 12:15:14.333 [notice] directory_handle_command_get(): Client asked for the mirrored directory, but we don't have a good one yet. Sending 503 Dir not available.
Feb 08 12:15:55.743 [notice] directory_handle_command_get(): Client asked for the mirrored directory, but we don't have a good one yet. Sending 503 Dir not available.
Feb 08 12:17:07.366 [notice] directory_handle_command_get(): Client asked for the mirrored directory, but we don't have a good one yet. Sending 503 Dir not available.
Feb 08 12:18:16.866 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

Well, I don't know what else to do. I'm going to leave it shut down for awhile unless something new developes. Pete

comment:5 Changed 9 years ago by spy1

I just did those nine critical updates for WinXp from M$ updates - Tor
seems to be running fine and I'll let you know if it has any effect on
the Tor shut-down problem I'm having, either good or bad. Pete

comment:6 Changed 9 years ago by nickm

To be clear, you still got those errors after the registry fixes, but installing the critical updates
made stuff better?

comment:7 Changed 9 years ago by spy1

Nick - I don't know yet. I was going to leave the server running when
I went to work, but that didn't work out. I've started it up again and
I'll check it in the morning. Pete

comment:8 Changed 9 years ago by spy1

Well, it ran for about 4 hours or so last night before shutting down:

Feb 10 02:12:45.400 [notice] Tor 0.0.9.4 opening new log file.
Feb 10 02:13:07.942 [notice] conn_close_if_marked(): Conn (addr 80.133.132.148, fd 592, type Dir, state 1) is being closed, but there are still 73 bytes we can't write. (Marked at main.c:277)
Feb 10 02:13:11.668 [notice] circuit_send_next_onion_skin(): Tor has successfully opened a circuit. Looks like it's working.
Feb 10 02:26:46.039 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 02:26:50.295 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 02:27:05.958 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 02:27:06.028 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 02:27:06.038 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 02:27:06.278 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 02:48:58.004 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 10 02:48:58.004 [notice] circuit_log_path(): circ (length 3, exit dizum): chaoscomputerclub(open) anorien(open) dizum(open)
Feb 10 02:51:42.010 [notice] connection_ap_handshake_attach_circuit(): Giving up on unattached conn (60 sec old).
Feb 10 03:13:44.001 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 10 03:13:44.001 [notice] circuit_log_path(): circ (length 3, exit chaoscomputerclub): r30(open) rodos(open) chaoscomputerclub(open)
Feb 10 03:35:16.189 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 03:35:27.005 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 03:35:38.651 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 03:35:41.465 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 03:38:14.005 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 10 03:38:14.005 [notice] circuit_log_path(): circ (length 3, exit dizum): toffolandia(open) balance(open) dizum(open)
Feb 10 03:52:30.586 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 03:52:30.656 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 03:53:10.334 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 03:56:39.004 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 10 03:56:39.004 [notice] circuit_log_path(): circ (length 3, exit blagtor): nirvana(open) Tonga(open) blagtor(open)
Feb 10 03:56:43.009 [notice] connection_ap_handshake_attach_circuit(): Giving up on unattached conn (60 sec old).
Feb 10 04:03:22.243 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

SO, I didn't re-start the computer this morning and re-opened Tor - which closed almost immediately:

Feb 10 09:49:22.265 [notice] Tor 0.0.9.4 opening new log file.
Feb 10 09:49:28.163 [notice] circuit_send_next_onion_skin(): Tor has successfully opened a circuit. Looks like it's working.
Feb 10 10:14:29.192 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

which means that whatever type of resource it used up has to be re-established by doing a re-boot, I think. Interesting problem here. Pete

comment:9 Changed 9 years ago by spy1

I made the two (actually one since one was already done) registry changes that
were suggested to me by nicm on the same day they were suggested. AFAICS, it's
managed to gain me about an hours' worth of "uptime" when running the server (since
it now stays up for two hours (approx.) rather than just one hour) - but the
irritating thing is that, instead of just being able to re-start the server and
have another two hours, I actually have to re-boot to release whatever it is the
Tor server is exhausting, resource-wise. If I don't the server only stays up a
very short time (sometimes only minutes) before it shuts down again.

I'm wide-open for additional things to try here. Pete

comment:10 Changed 9 years ago by spy1

I also tried increasing the thread priority of tor.exe to "Realtime: 24" with
Sysinternals Process Explorer, but that didn't help (at least not without a re-start):

Feb 10 11:07:04.618 [notice] Tor 0.0.9.4 opening new log file.
Feb 10 11:07:07.162 [notice] circuit_send_next_onion_skin(): Tor has successfully opened a circuit. Looks like it's working.
Feb 10 11:07:26.790 [notice] conn_close_if_marked(): Conn (addr 195.64.88.140, fd 552, type Dir, state 1) is being closed, but there are still 72 bytes we can't write. (Marked at main.c:277)
Feb 10 11:15:47.009 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 10 11:15:47.009 [notice] circuit_log_path(): circ (length 3, exit serifos): cyanid(open) dali(open) serifos(open)
Feb 10 12:38:04.379 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 12:38:04.719 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 12:38:04.719 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 12:38:04.799 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 12:38:04.799 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 12:38:04.809 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 12:38:15.455 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]
Feb 10 13:02:04.510 [notice] Tor 0.0.9.4 opening new log file.
Feb 10 13:02:17.188 [notice] circuit_send_next_onion_skin(): Tor has successfully opened a circuit. Looks like it's working.
Feb 10 13:09:19.215 [warn] connection_about_to_close_connection(): Harmless bug: Edge connection hasn't sent end yet?
Feb 10 13:15:50.007 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 10 13:15:50.007 [notice] circuit_log_path(): circ (length 3, exit serifos): nirvana(open) masquerade(open) serifos(open)
Feb 10 13:24:56.002 [notice] connection_ap_expire_beginning(): Stream is 15 seconds late on address 'www.mynetwatchman.com'. Retrying.
Feb 10 13:24:56.002 [notice] circuit_log_path(): circ (length 3, exit nirvana): gamma(open) datenhalde(open) nirvana(open)
Feb 10 13:27:33.007 [notice] connection_ap_handshake_attach_circuit(): Giving up on unattached conn (60 sec old).
Feb 10 13:39:21.306 [err] do_main_loop(): poll failed: No buffer space available [WSAENOBUFS ] [10055]

I tried running this little utility: http://www.speedguide.net/read_articles.php?id=1497 ,
but it tells me that I already have 50 half-open connections available, so it doesn't
LOOK like I'd want to reduce it to 10.

Am I the only one I'm driving nutso here? Pete

comment:11 Changed 9 years ago by phobos

Have you tried upgrading to 0.1.0.10? Although, I think XP will still run into the same issues,
as tcpip.sys hardcodes how many connects/sec it'll allow. I'm assuming this is XP Home/Pro and not server.
I believe Home/Pro are limited on purpose, to avoid people using them as servers instead of W2K3 Server.

comment:12 Changed 9 years ago by phobos

is this still a problem with 0.1.0.14 or 0.1.1.5-alpha? I've been unable to replicate this bug.
There are a few things I can think of:
1) tcpip.sys limits
2) security policies limiting access to system resources (which ones, I don't know yet, still investigating)
3) really asynchronous connection causing one way to overload the other (fast downstream, slow upstream for instance)
4) network card driver having issues with connections/traffic flow (what network card do you have?)
5) ghosts in the machine
6) winsock.dll or heap resources running real low

comment:13 Changed 9 years ago by phobos

The official Microsoft description is:
WSAENOBUFS
10055

No buffer space available.

An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

Changed 9 years ago by phobos

b8zs memory when tor fails

comment:14 Changed 9 years ago by phobos

Someone named b8zs reported this error in 0.1.1.7-alpha under winxp pro running as server.
It occurs roughly 2 hours after starting the process.
Attached PNG image shows that memory looks standard.

comment:15 Changed 9 years ago by unix_worship

I get this same problem on 0117 alpha, on XP SP2. I have shut down my
Tor server for the time being, becuase it opens too many ports, crashes,
and apparently leave residual services running which kill my CPU.

comment:16 Changed 9 years ago by nickm

It's been suggested that some people are seeing a problem like this because
SP2 introduces a limit on the number of simultaneous half-connections that can be
patched around using the "patcher" tool at http://www.lvllord.de/ . I have not verified this,
or confirmed the benevolence of this tool.

SP2's behavior seems to be designed to prevent worms from probing multiple systems, and in order
to annoy decent programmers like us.

See also
http://www.speedguide.net/read_articles.php?id=1497&print=friendly

I suspect that one reason this bug has lasted for so long is that, so far as I can tell, nearly
all win32 versions limit TCP sockets, but all of them seem to do it in weird and different ways.

comment:17 Changed 9 years ago by phobos

This isn't exactly the problem. The problem is the sheer number of
connections active in the winsock layer, each of which consumes 10KB of memory.

These connection buffers are allocated out of non-paged kernel memory,
which means it's a hardcoded limit in WinXP/2K, I believe loosely tied
to the amount of memory available in the system.

I'm looking at registry entries which may be able to change this allocated
non-paged kernel memory. Another option is to simply limit the total
number of connections in Tor on non-server Windows.

A good debugging point is when running TOr, look in the Event Log
under System. You'll see messages similar to "TCPIP.SYS Limit reached".

This is the 10 connections per second per unique IP limit, not the WSAENOBUFS error.

comment:18 Changed 9 years ago by phobos

netstat -abn would be another good thing to check when this occurs. Count up how many open sockets tor.exe owns.

Setting up perfmon with the following may also elucidate the problem:
1) Remove all current counters
2) Add counters as follows:
2a) Performance Object: Process
2b) Select "All Counters"
2c) Under "Select Instances from List" choose "tor"
3) In particular, watch "Pool Nonpaged Bytes" and correlate the WSAENOBUFS error to the value of that counter.

"Pool Nonpaged Bytes is the size, in bytes, of the nonpaged pool, an area of
system memory (physical memory used by the operating system) for objects that
cannot be written to disk, but must remain in physical memory as long as they
are allocated. Memory\Pool Nonpaged Bytes is calculated differently than
Process\Pool Nonpaged Bytes, so it might not equal Process\Pool Nonpaged Bytes\_Total.
This counter displays the last observed value only; it is not an average."

Correlating this to total TCP connections may help figure out where the limit
is in non-Server versions of Windows.

comment:19 Changed 9 years ago by ePokruphos

You know what? I found this article on MSDN just now and it appears to be strikingly apropos to this conversation (especially that note at the bottom):

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/randz/protocol/tcp_time-wait_delay.asp

Thoughts?

comment:20 Changed 8 years ago by arma

This article looks great, except it is only for Windows Server 2003, which already handles Tor just fine.

Do the recommendations also work for various versions of Win XP, or do they not let you edit those
registry pieces?

comment:21 Changed 8 years ago by Ruhe

I'm trying to run a Tor 0.1.0.15 server with libevent 1.1a on Windows XP x64 Prof. SP1.

The server reproducible crashes with

[err] do_main_loop(): libevent poll with win32 failed: No buffer space available [WSAENOBUFS ] [10055]

It crashes even with TcpTimedWaitDelay=60 and MaxUserPort=15000
(HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters).

comment:22 Changed 8 years ago by Ruhe

Tor 0.1.1.9-alpha without registry tweaks (see above) seems to work fine.

With 0.1.0.15 try this:

KeepalivePeriod 180

If Tor still aborts with WSAENOBUFS set KeepalivePeriod to 180 AND MaxConn to 512.
This could fix the problem. At least my experience, but not 100% verified.

comment:23 Changed 8 years ago by Ruhe

Unfortunately 0.1.1.9 crashes too ([WSAENOBUFS ] [10055]). But I think I've found a reason for it: My anti-virus scanner, NOD32 v2.51.12 x64, with enabled HTTP scanning.

If the HTTP scanning on port 80,8080,3128 and "autodetect HTTP communication on other ports" is enabled, both versions will crash.

After I disabled the IMON (Internet Monitor) and EMON (MS Outlook E-Mail Monitor) features in NOD32 at least 0.1.1.9 runs and runs...

Take a look at http://www.eset.com/products/nt.htm for some infos about both modules.

comment:24 Changed 8 years ago by spy1

Mine seems to be staying up now (InfoNest). As far as NOD32 goes, my settings are as follows:

From the "Control Center", click on IMON/Setup/ - the POP3 tab has "Enable IMON email checking" checked (with port 110 entered); Compatibility Setup/Setup set to Maximum Compatibility on slider

On the HTTP tab, "Enable HTTP checking" is checked with ports 80, 8080, 3128, 9001, 9030 entered.

"Automatically detect HTTP communications on other ports" is also checked.

On the "Miscellaneous" tab, Exclusion/Edit, I have entered C:\Program Files\Tor\tor.exe

(I also have both the Tor folder - C:\Program Files\Tor and C:\Documents and Settings\spy1\Application Data\Tor excluded in AMON/Setup/Exclusions).

We'll see, I guess. Pete

comment:25 Changed 8 years ago by spy1

BTW, I guess I need to verify that I made the following additions to the registry, too:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters - I added the "
Add Value name MaxUserPort, a type REG_DWORD" and set the value to 30000 (Hexadecimal) and put in

"Add Value name TcpTimedWaitDelay as a type REG_DWORD" with a value of 100 (Hexadecimal).

These are SOLID additions and figures - I just re-checked them and they're THERE. Pete

comment:26 Changed 8 years ago by unknownhost

Does anyone know what the magic combination of the following registry settings (are there more tweaks?) in Windows XP Pro SP2 work with Tor:

Even with the following settings (ranges I've tried), my Tor server goes down after about 90 minutes...

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
KeepalivePeriod -> 120-240
MaxUserPort -> 20000 - 50000
TcpTimedWaitDelay -> 60 - 120
TcpNumConnections -> 200 - 500

  • Unknownhost

comment:27 Changed 8 years ago by phobos

I've setup a test machine running 0.1.1.2-alpha with WinXP Home. The stock install appears to have locked up fairly quickly.
I then installed and did the idiot proof "optimize all network interfaces" from SpeedGuide's TCP Optimizer (http://www.speedguide.net/tcpoptimizer.php).
Thus far, the tor server ran for 24h+ without incident. I have a 4MB debug log as a result.
I ran this with BandwidthRate 20KB. This is a middleman server to start.
If things work out, I'll try an exit server.

I'm now running the same setup with no BandwidthRate at all.

comment:28 Changed 8 years ago by phobos

I have completed numerous tests on 0.1.0.16 and 0.1.1.12-alpha with a fresh WinXP Home machine.
I was unable to replicate this problem. I tried a number of combinations:
1) 12-alpha as middleman with bandwidth rate
2) 12-alpha as middleman with no bandwidth rate
3) 12-alpha as exit server with non-optimized tcp stack
4) 12-alpha as exit server with optimized tcp stack
5) 16-stable as exit server with non-optimized tcp stack
6) 16-stable as exit server with optimized tcp stack

I used the default exit policy in all cases for numbers 3-6.

I could not recreate the no buffer error at all, in any config. I'm starting to wonder if this error
is caused by people setting up a tor server, and using the machine as their desktop.

This could overload the tcp buffers on a non-optimized tcp stack, in theory. Esp if running bitorrent or
some other p2p client.

Surprisingly, I'm able to host popular iso's via bittorrent without issue. I'll regularly see 200-400 connections
to my torrent client uploading these isos.

comment:29 Changed 8 years ago by phobos

I believe the problems with this issue are tied to many programs trying to access the IP stack simultaneously. I can

recreate the issue when running an exit server and sharing up fedora iso's via bittorrent from the same machine. If I

run a purely MS machine with nothing but a fully patched base OS and tor, I cannot recreate the problem. During this
test, I was able to sustain 2Mb/s throughput as an exit server alone (not running BT). Lack of traffic was not the issue.

There are a few tcp related registry entries which potentially manipulate the internal buffer size available for data to
be passed through the tcp stack. Manipulating HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize
and TcpWindowSize to 0xfaf00 (1027840) seemed to increase the duration tor and bittorrent could coexist.

Configuring HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts="3" also seemed to help the exit server last longer.
Setting this to "1" is another option as it doesn't remove 12-bytes from every header for timestamp placement.
However, tor seems to have lots of odd packet problems on an exit server (as shown by ethereal, lots of re-transmits,
lost ACKs, etc), and the "3" solution seemed to quiet these things down. Only packet headers were captured during the tests,
not actual data.

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\SackOpts="1" is another helpful setting.

Messing with the registry on installation may provide relief from this problem, however it may cause other unknown
problems as well. I am unaware of any tcp layer problems with my test machine whilst running these settings.

comment:30 Changed 8 years ago by phobos

We've created a Wiki page to summarize everything we know about this problem. As of today, indications are that it is

memory related more than anything else. Remember, correlation is not causality. The summary wiki page is

http://wiki.noreply.org/noreply/TheOnionRouter/WindowsBufferProblems

comment:31 Changed 8 years ago by spy1

Just to update you, the "SpeedGuide" fix didn't keep the problem from happening here by itself. I also made the two reg
changes Andrew suggested and THAT did not solve the buffer problem.

I had to go back through all my settings in NOD32 and make them the way I indicated in my previous post (I think this is
an IMPORTANT consideration for anyone using NOD32 and trying to run a Tor server on a WinXP Pro box who's using that
computer for everything!!!!) - and I had to re-make the registry changes that I also indicated there.

Since doing so (at about 11:35 a.m. Eastern time) this morning, the server hasn't gone down again yet.

BUT, I also haven't opened FireFox with about twenty+ windows going yet, either - I won't be able to check that out again
until tomorrow (I want to make sure that the server stays up as is before introducing any other variables). The last time
I fixed this like that, the Firefox multiple windows didn't make any difference (IOW, the server stayed up then).

Just my observations from here and I hope some of it helps. Pete

comment:32 Changed 8 years ago by phobos

I've found the speedguide fixes and reghacks just allow the server to run longer, rather than actually fix the problem.
We continue to focus on the use of select() vs. microsoft's preferred overlapped io function. It could be that libevent
or openssl are also the culprit. I can test out your specific reg changes in the next few days to see how long the server

stays online.

comment:33 Changed 8 years ago by spy1

Thanks Andrew. The two reg changes I made should read "Decimal" instead
of "Hexadecimal", BTW (in my December 1, 2005 post above). Pete

comment:34 Changed 8 years ago by spy1

Computer maintained the server all through last night (and an "Eraser' run of free disk space, 1 pass).
Had to do a re-start this morning due to a M$ update that needed to re-boot to install, but the server
ran fine all day after that even with 20+ windows open in Firefox (the M$ update didn't mess anything
up either, thank God). Just thought you'd like to know. (My server
s name is "InfoNest). Pete

comment:35 Changed 8 years ago by spy1

It's been a full week, now, and the server running here is NOT experiencing any problems. The ONLY time
it goes down is when I myself do a re-start. (Love that band-width graph).

If I decide to go to the latest alpha version (I'm running 0.1.1.13-alpha now) -
can I simply load it over-the-top of .13? Does that work? Or do I have to do a
complete un-install/re-install? At the point I'm at here (server functioning well) -
should I even DO the update? Pete

comment:37 Changed 8 years ago by phobos

Are things still working on 18-rc with or without the registry entry changes?

comment:38 Changed 8 years ago by phobos

Are things still working with 19-rc with or without the registry changes?

comment:39 Changed 8 years ago by spy1

I've had Tor un-installed for quite awhile now - ever since my ISP threatened to cut me off.

comment:40 Changed 8 years ago by spy1

Installed Vidalia 0.0.7, Tor 0.1.1.23, Qt 4.1.0 package.
Running as a non-exit server.
Had to make the same changes out-lined above again (server wouldn't stay up).

Just got it done at 11:30 Eastern, will let you know if it stays up. Pete

comment:41 Changed 8 years ago by spy1

Stays up fine, although I have a problem with freezes on my browser
(it doesn't run through Tor - the server runs alone). Probably my fault,
I applied one of the other things you had on the "problem page", and I
shouldn't have. The problem manifests if I don't immediately open my browser
and start using it at start-up (when Tor starts). It almost seems as
though Tor's using ALL my processor (or ALL my band-width, although I know
it's not). Anyway, hope that answers your question - the latest build DOESN'T
allow the server to run without making the changes manually. Not here, anyway.
Pete

comment:42 Changed 8 years ago by spy1

Strike the "browser freeze-up" part of that last post, it all seems to be
working fine now.
The only thing I changed was to NOT let Vidalia/Tor start with Windows. I
start it manually after everything else is already up and running. Pete

comment:44 Changed 7 years ago by ePokruphos

So it's been 5 months since anybody's commented about this bug, which still plagues the Windows builds to this day.
Any new insights these days? IIRC, I believe I saw Nick discussing IOCP in libevent-users recently.

comment:45 Changed 7 years ago by nickm

We've got a gsoc student who's going to try to get IOCP to work nicely with libevent.

Also, some people have reported that vista doesn't have this problem... which would be nice,
if everybody were migrating to vista, but people aren't.

comment:46 Changed 7 years ago by coderman

Nick recently committed an experimental "ConstrainedSockets 1" feature. This should help reduce occurrences of this problem by limiting the send and receive buffer sizes on sockets created by Tor. Please let me know if you try this and the version of OS used; as of now it has only been confirmed on virtual servers where a similar system wide TCP memory buffer contention is encountered.

comment:47 Changed 7 years ago by knappo

First I am using NOD32 (Even if I have no proof I strongly believe that this software is connected with the problem).

I've ran in the socket problem some time ago but unfortunately I'm unable to nail it on one particular change in my system. I am using WinXP SP2 German with 1GB RAM and yesterday my server stopped after 2 minutes with WSAENOBUFS.

I changed my torrc and included "ConstrainedSockets 1" and set NOD32 IMON to exclude tor.exe, tor-resolve.exe and vidalia.exe. From this moment on, the server made no more problems and is still up (24h so far).

Here are relevant settings from my torrc:
RelayBandwidthRate 92160
RelayBandwidthBurst 102400
ConstrainedSockets 1
DirPort 80
ORPort 443

And these are parameters regarding my TCP/IP settings from the registry:
GlobalMaxTcpWindowSize 1036288 (dec.)
TcpWindowSize 1036288 (dec.)
Tcp1323Opts 1 (dec.)
TcpMaxDupAcks 2 (dec.)
DefaultTTL 64 (dec.)

I hope this information can help others to avoid this problem.

comment:48 Changed 7 years ago by ghazel

Not sure what the status of this bug is, but we ran in to it in the BitTorrent code-base a long time ago, and fixed it.

We did it in a few ways:

  • IOCP for platforms that support it, does not seem to exhibit the 10055 case
  • WSAEventSelect + WSAEnumNetworkEvents + WSAWaitForMultipleEvents for platforms without IOCP support

By overlapping sockets into event slots (and using WSAEnumNetworkEvents to determine which socket had an event), it seems to reduce the chance of a 10055 to the point where I never saw one again. If you do run in to one though, just set a limit on the number of sockets to N-1, and delay future sockets until that limit is not hit.

comment:49 Changed 7 years ago by spy1

Okay, I tried out 0.2.0.8-alpha (r11898) last night - FIRST without making any of the changes I outlined above in my previous posts. The server went down with the WASENOBUFS error in about an hour and ten minutes.

Went back in and made all the changes again, and it's been up since a little after midnight last night. HTH Pete

comment:50 Changed 6 years ago by phobos

Is this still occurring? We seem to have a number of Win32 nodes online at any point in time.

comment:51 Changed 6 years ago by spy1

Haven't tried it since my last post, Andrew, but if I have time, I'll give it a shot again this weekend. Pete

comment:52 Changed 5 years ago by phobos

So this has turned into something that needs a really good Windows developer to look at and fix.
Should we keep this bug open or close it as deferred, given it's a major item to tackle in the next year?

comment:53 Changed 5 years ago by nickm

We leave it open. Just because it isn't easy to fix, doesn't mean it's not a bug.

comment:54 Changed 4 years ago by rmarquardt

TCP/IP and NBT configuration parameters for Windows XP: http://support.microsoft.com/kb/314053/

comment:55 Changed 4 years ago by Rob

Hi,

I have investigated this problem a bit and found some interesting info.
The book Network Programming for Microsoft Windows, Second edition
(ISBN 0-7356-1579-9) has a chapter "scalable winsock applications"
with some measurements on different server architectures.
It turns out that the select() architecture uses a large amount of
Non-paged memory. This architecture also has a very high CPU usage.

select() architecture
=====================

Connections (attempted/connected) : 7000 / 4011
Memory used in Kb : 4208
Non-paged memory in Kb : 135123
CPU Usage : 95-100%
Threads : 1
Throughput (send/receive) : 0/0

Connections (attempted/connected) : 12000 / 5779
Memory used in Kb : 5224
Non-paged memory in Kb : 156260
CPU Usage : 95-100%
Threads : 1
Throughput (send/receive) : 0/0

overlapped with completion port (the best)
==========================================

Connections (attempted/connected) : 7000 / 7000
Memory used in Kb : 36160
Non-paged memory in Kb : 31128
CPU Usage : 40-50%
Threads : 2
Throughput (send/receive) : 6.2 MB / 3.9 MB (?)

Connections (attempted/connected) : 12000 / 12000
Memory used in Kb : 59256
Non-paged memory in Kb : 38862
CPU Usage : 40-50%
Threads : 2
Throughput (send/receive) : 5 MB / 5 MB

Connections (attempted/connected) : 50000 / 49997
Memory used in Kb : 242272
Non-paged memory in Kb : 148192
CPU Usage : 55-65%
Threads : 2
Throughput (send/receive) : 4.3 MB / 4.3 MB

The server was a 1.7 GHz Xeon with 768 MB memory.
They used 3 client machines on a 100 Mb network.
All the machines used Windows XP as OS.
For the select() server they used the FD_ macro's,
this is clearly not optimal.

On the WindowsBufferProblems page there was a link to a graphic of an
attempt to make 25000 socket connections. This failed with WSAENOBUFS
at connection 16205.

Seeing this i thought that maybe the problem is not a Windows problem
alone. The reason Gnu/Linux doesn't run into problems is that it puts
a limit on the number of file descriptors a program can use. On my
OpenSuse system the soft limit is 1024, the hard limit is 8192.
This is far below the number of sockets a Windows program is allowed
to use.

Maybe the solution is simple: Don't allow Tor to use more sockets on
Windows than on Gnu/Linux?

comment:56 follow-up: Changed 4 years ago by phobos

  • Description modified (diff)

we're going to begin testing the latest -alpha releases on windows to see if this problem is still as persistent as it used to be. And possibly start testing bufferevents to see if they help resolve the issue.

comment:57 in reply to: ↑ 56 Changed 4 years ago by phobos

Replying to phobos:

we're going to begin testing the latest -alpha releases on windows to see if this problem is still as persistent as it used to be. And possibly start testing bufferevents to see if they help resolve the issue.

I should clarify that as "testing as a busy relay..."

comment:58 Changed 4 years ago by phobos

  • Owner set to phobos
  • Status changed from assigned to accepted

comment:59 Changed 4 years ago by phobos

testing of a 0.2.2.14-alpha exit relay on winxp pro has begun.

comment:60 Changed 4 years ago by nickm

  • Milestone set to Deliverable-Sep2010
  • Parent ID set to #1753

comment:61 Changed 4 years ago by phobos

The current state is that the XP exit relay has run for 3 days and runs into Event ID 4226 before anything else. Tor is still functional, as is the rest of the system, it's just limited to some arbitrary concurrent number of tcp/ip sessions. This article, http://www.speedguide.net/read_articles.php?id=1497, describes the problem and possible solutions better than the MSDN article on ev4226.

I guess in some way, this means the original problem is solved. XP relays may just be unfeasible given their tcp/ip restrictions.

comment:62 Changed 4 years ago by nickm

A UDP transport would solve the problem for XP relays, though running a busy exit would remain tricky. XP bridges will probably stay just fine.

Reportedly, people who want to do bittorrent on these versions of windows have been known to do binary patching to raise or remove the limit. We might want to suggest this to busy Windows relays, but I don't think we should even imagine doing it automatically.

Do we know to what extent (if any) this half-open connection limit exists on later versions of Windows?

comment:63 follow-up: Changed 4 years ago by Sebastian

http://www.speedguide.net/read_articles.php?id=2744 claims that the problem exists for Windows Vista, too

comment:64 in reply to: ↑ 63 Changed 4 years ago by Sebastian

Replying to Sebastian:

http://www.speedguide.net/read_articles.php?id=2744 claims that the problem exists for Windows Vista, too

Up to service pack 2 only, apparently, after which the restriction was lifted

comment:65 Changed 4 years ago by phobos

I can confirm this WSAENOBUFS error occurs on a busy exit relay in a fully up to date WinXP system as of today.

comment:66 follow-up: Changed 4 years ago by phobos

I need to acquire a Win7 non-server system and start testing.

comment:67 Changed 4 years ago by nickm

Right, progress chugs along. Once the regular bufferevent stuff gets more testing, we can turn on the IOCP side of it, which should help here some, for some people.

Soon, all of our bugs will might 3 or more digits--really!

comment:68 in reply to: ↑ 66 Changed 4 years ago by phobos

Replying to phobos:

I need to acquire a Win7 non-server system and start testing.

I have a test binary based on tor master, waiting for win7 system to appear.

comment:69 follow-up: Changed 4 years ago by phobos

It seems Windows 7 running 0.2.2.17-alpha as published on our website is working fine.  None of these errors appear after 20h of running an exit relay.

comment:70 in reply to: ↑ 69 Changed 3 years ago by phobos

Replying to phobos:

It seems Windows 7 running 0.2.2.17-alpha as published on our website is working fine.  None of these errors appear after 20h of running an exit relay.

And this continues to work successfully as an exit relay. Pushing 1.516 Mbps since October 19th.

comment:71 Changed 3 years ago by Joschka

I'm running Windows 7 64 bit and tor last a little more than one hour before crashing with this buffer failure message.

I hope this update helps!
Jim Kay

Dec 20 09:33:44.602 [Notice] Tor v0.2.1.27 (re57cb6b9762a2f94). This is experimental software. Do not rely on it for strong anonymity. (Running on Very recent version of Windows [major=6,minor=1] [workstation] {terminal services, single user})
Dec 20 09:33:44.603 [Notice] Initialized libevent version 1.4.14-stable using method win32. Good.
[snip]
Dec 20 10:46:59.029 [Error] libevent call with win32 failed: No buffer space available [WSAENOBUFS ] [10055]

comment:72 Changed 3 years ago by phobos

  • Owner phobos deleted
  • Status changed from accepted to assigned

comment:73 Changed 3 years ago by herbalist

  • Cc herbalist added

Using vidalia-bundle-0.2.1.30-0.2.10. Running 98SE, modified with KernelEX 4.5 final and Revolutions Pack 9.7.2. The PC is a Pentium 4-2.4GHZ with 1 GB RAM. PC is for general purpose usage in addition to Tor relay.

Tor configured as relay/exit node, using 512 DSL upload speed. Changed MSTCP\MaxConnections to 512. TcpTimedWaitDelay set to 30.

The PC is on its 4th day as a relay/exit node. No error messages. No connection problems. Available memory and resources stable. The improved stability is most likely from changes made by Revolutions Pack, which has greatly improved 9X stability. Perhaps some of the changes it makes would help on XP?

comment:74 Changed 3 years ago by herbalist

  • Cc herbalist removed

comment:75 follow-up: Changed 3 years ago by Joschka

Am I to understand from the previous update that 98SE is the target platform for running Tor?

I, at least, don't have any software that old nor any hardware I could run it on.

comment:76 in reply to: ↑ 75 Changed 3 years ago by Sebastian

Replying to Joschka:

Am I to understand from the previous update that 98SE is the target platform for running Tor?

No, and why would you make that assumption? herbalist notes that he is not having trouble on win98se, and adds a suspicion of why and how that might help winxp users.

I, at least, don't have any software that old nor any hardware I could run it on.

go you

comment:77 Changed 3 years ago by Joschka

Why would I make that assumption?

I am trying to interpret the usefulness of reporting on Win98SE (very obsolete) and offering help to XP users (very obsolete) saying nothing about Vista (obsolete) and of absolutely no value to Windows 7 users where the low level architecture is entirely different. Not only that, but I'm running a 64 bit system.

Perhaps you can tell me what use there is to that last posting?

comment:78 Changed 3 years ago by herbalist

Quoted from https://www.torproject.org/download/download.html.en
"Stable Expert Bundle works with Windows 98SE, ME, Windows 7, Vista, XP, 2000, 2003 Server"

98 is a supported platform for Tor.

From https://trac.torproject.org/projects/tor/wiki/TheOnionRouter/WindowsBufferProblems

"If your Tor server is experiencing a problem with "[WSAENOBUFS] [10055]" error messages while running Tor, you are experiencing https://bugs.torproject.org/flyspray/index.php?do=details&id=98%7CFlyspray Bug 98. This is a well known, and apparently commonly experienced, bug with running Tor servers on non-server versions of Microsoft Windows 98, ME, 2000, and XP."

These systems all share this bug. The changes made to my system appear to have mitigated this problem. If the changes made by those upgrades can be applied to XP, which is also a supported platform for Tor (and the most common OS on the planet) more of them could function effectively as stable relays, which are badly needed.

From https://trac.torproject.org/projects/tor/wiki/TheOnionRouter/TorFAQ#ServerOS

"It would be great if more people with Windows experience help out, so we can improve Tor's usability and stability in Windows."

They specifically asked for input from Windows users. There is no reason or need for prejudice against all but the most current version of Windows.

comment:79 follow-up: Changed 3 years ago by Joschka

Nor is there any reason for prejudice against Windows 7 64bit which is listed in your posting as 'supported.'

But Tor most definitely does NOT work on Windows 7 64bit.

So, if I'm willing to run Tor and provide a service to other people, must I also run 98SE or XP which are otherwise of no use to me? Should I come back after Microsoft releases two more versions of Windows and then see if Windows 7 64bit is finally supported; after I no longer have Windows 7?

comment:80 in reply to: ↑ 79 Changed 3 years ago by arma

Replying to Joschka:

But Tor most definitely does NOT work on Windows 7 64bit.

Doesn't work at all, or doesn't work as a relay?

The expected answer to relays-on-versions-of-Windows-that-don't-have-server-in-their-name is to bundle the new libevent2 library, which should handle network use on Windows better. Stay tuned to #2007 for a test bundle, which apparently is waiting on #2001.

comment:81 Changed 3 years ago by Joschka

I didn't test it any further than the failure I reported above.

Once the error message appears, my impression is that it does nothing more in the way of operating.

comment:82 Changed 3 years ago by nickm

  • Milestone changed from Deliverable-Sep2010 to Tor: unspecified

comment:83 Changed 2 years ago by runa

A user with Windows Server 2008 R2 (running a Tor exit relay) reports that he enabled Windows XP compatibility and the error disappeared.

comment:84 Changed 19 months ago by nickm

  • Keywords tor-relay added

comment:85 Changed 19 months ago by nickm

  • Component changed from Tor Relay to Tor

Changed 7 months ago by vmon

comment:86 Changed 7 months ago by vmon

A user (RT Ticket #13958) report the same problem stop them from connecting to the network. The only way out is to close Vidalia and re-run it when they get disconnected. This is what they get when attempting reconnecting without restarting vidalia/tor:

Sep 21 12:26:07.919 [Notice] New control connection opened.
Sep 21 12:26:07.919 [Notice] Bootstrapped 85%: Finishing handshake with first hop.
Sep 21 12:26:07.920 [Error] libevent call with win32 failed: No buffer space available [WSAENOBUFS ] [10055]

comment:87 Changed 7 months ago by vmon

arma's recommendations to solve the problem:
This happens because windows can't actually support the networking system calls tor uses. if there isn't enough physical memory in the machine, it fails sometimes.

The fix is "get more physical memory, or run fewer programs at once. or maybe get a newer windows, but we're not sure that helps." It's a crummy fix.

Maybe they should run tails in a vm in their windows. that's probably at least as good a fix.

Note: See TracTickets for help on using tickets.