Opened 6 years ago

Closed 5 years ago

#9708 closed enhancement (implemented)

Clarify "please raise ulimit -n" message

Reported by: philip Owned by: rl1987
Priority: Low Milestone: Tor: 0.2.6.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.16-rc
Severity: Keywords: tor-relay 024-backport log, ulimit, limits 025-triaged 025-deferrable lorax
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

My logfiles often contained messages like:

  • Error creating network socket: Too many open files in system

This happens when socket layer calls return ENFILE or EMFILE. Raising kern.maxfiles and kern.maxfilesperproc makes these messages go away, as expected. However, the following message kept happening:

  • Failing because we have 11062 connections already. Please raise your ulimit -n. [8422 similar message(s) suppressed in last 21600 seconds]

I spent a long time looking into file descriptor limits because the value (11062) was suspiciously close to the maximum number of file descriptors per process (11095), which I just raised earlier. Then I discovered #define ERRNO_IS_ACCEPT_RESOURCE_LIMIT(e) in src/common/compat.h and that I had to look for ENOMEM and ENOBUFS in addition to ENFILE and EMFILE.

Reading through the socket code in the kernel, I found that FreeBSD has a default maximum accept() backlog of 128 connections. When more than that number of TCPs are in the syncache, accept() will fail with one of ENOMEM or ENOBUFS. I haven't spent the time to figure out which (it's a maze of twisty passages between kern/uipc_socket and netinet/tcp_usrreq.c -- neither of those are in my active brain cache).

The actual "solution" to the problem (allow tor to accept more connections, and thus make the message go away) was to raise kern.ipc.somaxconn.

The instruction to raise ulimit -n is definitely wrong for FreeBSD. Or at least only part of the story, and certainly cause for confusion. Perhaps the message should point to some generic documentation suggesting system limits/knobs to twiddle, rather than assuming that all the world is Linux and ulimit -n will work in every case.

Child Tickets

Change History (12)

comment:1 Changed 6 years ago by nickm

Keywords: tor-relay 024-backport added
Milestone: Tor: 0.2.5.x-final

Is the best we can do here to change the message on FreeBSD (or *BSD? or everywhere?) to something like "increase ulimit -n or kern.ipc.somaxconn" ? Or is there some way to tell which resource-exhaustion condition is happening?

comment:2 Changed 6 years ago by philip

I'll dig a little deeper and see if I can figure out exactly which resource is being exhausted.

As I was typing this, I saw another one of these messages appear in my log (though only 2234 similar message(s) suppressed in last 21600 seconds, this time, which is quite an improvement - yesterday they were above 100000) and the "netstat -s -p tcp" counter I expected to increase (listen queue overflows) hasn't increased.

There's probably Yet Another knob to twiddle.

I'll report back here when I figure it out!

comment:3 Changed 6 years ago by philip

After a bit more fiddling, it turns out that the relevant tunables are:

  • kern.maxfiles: maximum file descriptors in the system
  • kern.maxfilesperproc: maximum file descriptors per process
  • kern.ipc.maxsockets: maximum numbers of sockets
  • kern.ipc.somaxconn: maximum number of sockets in the listen queue

With kern.maxfiles=30000, kern.maxfilesperproc=27000, kern.ipc.maxsockets=30000 and kern.ipc.somaxconn=4096, I'm able to keep Tor juggling about 21000 connections on average without any upsetting messages in the tor log. My kernel gets a little bit unhappy if I raise these any further, but ... that's not really Tor's fault.

I'm not sure if there's any point in Tor reporting what resource is being exhausted. The kernel will complain about the length of the listen queue. It'll also helpfully report the PID of the process that's eating all your filedescriptors (that message was flooded out by an unrelated one yesterday, so I didnt spot it).

Given the number of tunables, I'd suggest amending the manpage with a section "performance tunables", pointing out 'ulimit -n' on Linux and the list above for FreeBSD (I imagine that otherBSD tunables are broadly similar if not identical, but I've not checked).

comment:4 Changed 6 years ago by nickm

This looks like valuable stuff. Can anybody write it up for general consumption in a few system-specific paragraphs? I'd be happy to incorporate those in the manpage, or into a new doc/TUNING file, or something like that.

comment:5 Changed 6 years ago by nickm

Keywords: 025-triaged 025-deferrable added

Triage: this is just changing a warning and adding a manpage section. If somebody composes the manpage section, I'm happy to put it in the manpage and change the warning to tell people to look at it. But if the manpage section doesn't get written, we should defer.

comment:6 Changed 5 years ago by nickm

Keywords: lorax added
Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

Without knowing what the documentation should say, I can't make the documentation say that. I'd be happy to take a patch here whenever, but I can't generate the information myself.

comment:7 Changed 5 years ago by teor

For BSD network performance tuning, see also:
https://wiki.freebsd.org/NetworkPerformanceTuning
https://rerepi.wordpress.com/2008/04/19/tuning-freebsd-sysoev-rit/

These *BSD tunables apply to OS X as well, as it's sysctl and kernel are *BSD-based-ish.

However, OS X is further complicated (isn't it always?)...

kern.ipc.maxsockets is dynamically determined with the set value (512) as a minimum, therefore this sysctl is read-only on OS X.

launchd trying-to-be-helpfully sets kern.maxfiles and/or kern.maxfilesperproc based on the NumberOfFiles key in launchd plists for system-wide processes. It's safer to set them identically in both /etc/sysctl.conf and tor.launchd.plist, although the launchd plist should be sufficient. (We have to set kern.ipc.somaxconn in /etc/sysctl.conf anyway.)

See https://developer.apple.com/library/mac/documentation/Darwin/Reference/Manpages/man5/launchd.plist.5.html for all the gory details.

This probably means an associated update to https://trac.torproject.org/projects/tor/wiki/doc/MacRunOnBoot
See #13412

comment:8 Changed 5 years ago by teor

Also for OS X, mainly those << 10.9, the following advice is useful:
https://rolande.wordpress.com/2010/12/30/performance-tuning-the-network-stack-on-mac-osx-10-6/

From the OS X / OS X Server merge onwards, and 64-bit kernels/processors/apps, most of these parameters are set at or higher than these levels already, and shouldn't be messed with.

comment:9 Changed 5 years ago by rl1987

Owner: set to rl1987
Status: newaccepted

comment:10 Changed 5 years ago by rl1987

Status: acceptedneeds_review

comment:11 Changed 5 years ago by nickm

Milestone: Tor: 0.2.???Tor: 0.2.6.x-final

comment:12 Changed 5 years ago by nickm

Resolution: implemented
Status: needs_reviewclosed

Merged; thanks!

Note: See TracTickets for help on using tickets.