Opened 19 months ago

Last modified 14 months ago

#24737 new defect

Recommend a MaxMemInQueues value in the Tor man page

Reported by: starlight Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: doc, tor-relay
Cc: Actual Points:
Parent ID: Points: 0.5
Reviewer: Sponsor:

Description

due to recent DOS attacks much incorrect advice has been tossed around on tor-relays regarding the application of MaxMemInQueues

many seem to believe that MaxMemInQueues should be set to 75-80% of available memory but this is painfully (in the sense of OOM crashes) incorrect

proper advice is to set MaxMemInQueues to 45% of physical memory available for the instance, assuming DisableAllSwap=1 is also in effect; 40% is a safer, more conservative value

one of my relays configured with MaxMemInQueues=1024MB recently emitted

We're low on memory.  Killing circuits with over-long queues. (This behavior is controlled by MaxMemInQueues.)
Removed 1063029792 bytes by killing 1 circuits; 21806 circuits remain alive. Also killed 0 non-linked directory connections.  

after which the tor daemon was observed to consume precisely 2GB per /proc/<tor-pid>/status:VmRSS

the aforementioned incorrect advice was followed in #22255 and the operator continues to experience OOM failures

another mitigation is to establish conservative linux memory management with the sysctl settings

vm.overcommit_memory = 2
vm.overcommit_ratio = X

where X is set such that /proc/memifo:CommitLimit is approximately 80% of physical memory (90% if 16GB or more is present)

The settings will prevent sparse-memory applications from running (e.g. ASAN instrumented code), but is appropriate for dedicated tor relays systems. Effectively disables OOM killer and should result in graceful memory exhaustion behavior, though I have not investigated tor daemon response in the face of malloc() fails returning null pointers.

Child Tickets

Change History (15)

comment:1 in reply to:  description ; Changed 19 months ago by teor

Keywords: doc tor-relay added
Milestone: Tor: unspecified

I'm not sure what you want us to do in response to this ticket.
If you can write up a short wiki page with some advice, we could point to it rather than trying to guess the right setting.

Replying to starlight:

due to recent DOS attacks much incorrect advice has been tossed around on tor-relays regarding the application of MaxMemInQueues

many seem to believe that MaxMemInQueues should be set to 75-80% of available memory but this is painfully (in the sense of OOM crashes) incorrect

proper advice is to set MaxMemInQueues to 45% of physical memory available for the instance, assuming DisableAllSwap=1 is also in effect; 40% is a safer, more conservative value

I don't think percentages are helpful - I think creating a table with free RAM to MaxMemInQueues values would be more helpful. (See below.)

one of my relays configured with MaxMemInQueues=1024MB recently emitted

We're low on memory.  Killing circuits with over-long queues. (This behavior is controlled by MaxMemInQueues.)
Removed 1063029792 bytes by killing 1 circuits; 21806 circuits remain alive. Also killed 0 non-linked directory connections.  

after which the tor daemon was observed to consume precisely 2GB per /proc/<tor-pid>/status:VmRSS

To be more precise: MaxMemInQueues doesn't track destroy queues, nor does it track various other Tor data structures,
So you have to set it at a level that allows space for a few hundred megabytes of Tor data, and then some destroy queues.

At 1024 MB per instance, this means 512 MB or less.

But with 10 GB per instance, it really is ok to allow 5-7 GB in queues.
(I have a relay that allows the default 8 GB in queues, and it's fine.)

the aforementioned incorrect advice was followed in #22255 and the operator continues to experience OOM failures

Are you the operator?
Have they tried 0.3.2.8-rc and reopened another ticket?


The settings will prevent sparse-memory applications from running (e.g. ASAN instrumented code), but is appropriate for dedicated tor relays systems. Effectively disables OOM killer and should result in graceful memory exhaustion behavior, though I have not investigated tor daemon response in the face of malloc() fails returning null pointers.

The tor daemon will assert and exit if malloc returns NULL.

comment:2 in reply to:  1 Changed 19 months ago by starlight

Replying to teor:

I'm not sure what you want us to do in response to this ticket.
If you can write up a short wiki page with some advice, we could point to it rather than trying to guess the right setting.

I suggest adding some verbiage to the Tor Manual where most people would look first when adjusting MaxMemInQueues.

I don't think percentages are helpful - I think creating a table with free RAM to MaxMemInQueues values would be more helpful. (See below.)
. . .
To be more precise: MaxMemInQueues doesn't track destroy queues, nor does it track various other Tor data structures,
So you have to set it at a level that allows space for a few hundred megabytes of Tor data, and then some destroy queues.

At 1024 MB per instance, this means 512 MB or less.

But with 10 GB per instance, it really is ok to allow 5-7 GB in queues.
(I have a relay that allows the default 8 GB in queues, and it's fine.)

My observation is that when MaxMemInQueues triggers a circuit kill, the daemon will have consumed in physical memory approximately twice the setting value. Of course YMMV on the precise amount, but this observational rule-of-thumb is far away from the suggestion that 120-130% of MaxMemInQueues will be used.

the aforementioned incorrect advice was followed in #22255 and the operator continues to experience OOM failures

Are you the operator?
Have they tried 0.3.2.8-rc and reopened another ticket?

Not the operator on that ticket. It came up in a search and seems to me his MaxMemInQueues is too high relative to RAM.

The tor daemon will assert and exit if malloc returns NULL.

Ah, well then vm.overcommit_memory=2 will cause the daemon to die sooner rather than later instead of a more graceful response such as killing one circuit. Still better then allowing Linux OOM handler choose a victim to kill.

Alternately, my advice for hardy souls willing to expend such effort:

1) leave the default vm.overcommit_memory=0 in effect
2) write a script to set /proc/<pid>/task/<tid>/oom_adj to -17 for every process in the system
3) have a script set oom_adj=0 for a process you would rather have die than the tor daemon
3b) if one sets -17 for every process, then Linux will suspend the memory requester until some becomes available; this could result in a hung system, a crashed system, or it could result in a semi-graceful recovery in the case where socket buffer memory is freed as queues drain

Additionally one should set vm.min_free_kbytes=131072 or even =262144. By default Linux sets this value so low that a sudden surge in arriving network traffic will use up all free memory so fast OOM killer and dirty-cache writes can't keep pace and the system will OOPs (hard crash).

comment:3 Changed 19 months ago by starlight

have to add:

1) the observation that tor worst-case memory consumption is 2x MaxMemInQueues does not include kernel socket-buffer memory consumption; socket memory can be substantial and must be allowed for, which is where I came up with setting MaxMemInQueues to 40% of free memory available for a given instance; note KIST lowers egress socket buffering, but leave ingress socket memory utilization to the kernel and to the behavior of remote peers

2) inspiration for this ticket was the OOM kill of a daemon configured MaxMemInQueues=2G running on 4G machine, and subsequently the event mentioned in "Description" -- both apparently were "sniper attacks"

comment:4 Changed 19 months ago by teor

Milestone: Tor: unspecifiedTor: 0.3.2.x-final
Points: 0.5
Summary: oft given MaxMemInQueues advice is wrongRecommend a MaxMemInQueues value in the Tor man page

Thanks!

If someone writes a few sentences, we can add them to the MaxMemInQueues man page entry.
We might want to have a different recommendation for older and newer Tor versions, as some of the bugs you mention were fixed in 0.3.2.8-rc.

The detailed steps you wrote for Linux would be more appropriate in doc/TUNING, or in a wiki page entry.

comment:5 Changed 19 months ago by teor

Moved conversation from #22255.

Replying to starlight:

Replying to teor:

I'd still like to see someone repeat this analysis with 0.3.2.8-rc, and post the results to #24737.
It's going to be hard for us to close that ticket without any idea of the effect of our changes.

I'm not willing to run a newer version till one is declared LTS, but can say that even when my relay is not under attack memory consumption goes to 1.5G with the 1G max queue setting. Seems to me the 2x max queues memory consumption is a function of the overheads associated with tor daemon queues and related processing, including malloc slack space.

Saying 2x is a useful guide, but I think we can do better. Because I see very different behaviour on systems with a lot more RAM.

This is how the overheads work on my 0.3.0 relay with 8 GB per tor instance, and a high MaxMemInQueues:

  • 512 MB per instance with no circuits
  • 256 - 512 MB extra per instance with relay circuits
  • 256 - 512 MB extra per instance with exit streams

The RAM usage will occasionally spike to a few gigabytes, but I've never seen it all used.

So I think we should document the following RAM usage and MaxMemInQueues settings:

  • Relays: minimum 768 MB, set MaxMemInQueues to (RAM per instance - 512 MB)*N
  • Exits: minimum 1GB, set MaxMemInQueues to (RAM per instance - 768 MB)*N

For all versions without the destroy cell patch (0.3.2.7-rc and all current versions as of 1 January 2018), N should be 0.5 or lower. It's reasonable to expect destroy cell queues and other objects to take up approximately the same amount of RAM as the queues.

For all versions with the destroy cell patch (0.3.2.8-rc and all versions released after 1 January 2018), N should be 0.75 or lower. It's reasonable to expect destroy cell queues and other objects to take up a third of the queue RAM.

Now we just have to turn this into a man page patch and wiki entry.

Anyone running a busy relay on an older/slower system and with MaxMemInQueues=1024MB can check /proc/<pid>/status to see how much memory is consumed. Be sure DisableAllSwap=1 is set and the queue limit is not higher since the point is to observe actual memory consumed relative to a limit likely to be approached under normal operation.

Another idea is to add an option to the daemon to cause queue memory preallocation. This would be a nice hardening feature as it will reduce malloc() calls issued under stress, and of course would allow more accurate estimates of worst-case memory consumption. If OOM strikes with preallocated queues that would indicate memory leakage.

Please open a ticket for this feature in 0.3.4.

comment:6 in reply to:  5 Changed 19 months ago by teor

Replying to teor:

Moved conversation from #22255.

Replying to starlight:

Replying to teor:

I'd still like to see someone repeat this analysis with 0.3.2.8-rc, and post the results to #24737.
It's going to be hard for us to close that ticket without any idea of the effect of our changes.

I'm not willing to run a newer version till one is declared LTS, but can say that even when my relay is not under attack memory consumption goes to 1.5G with the 1G max queue setting. Seems to me the 2x max queues memory consumption is a function of the overheads associated with tor daemon queues and related processing, including malloc slack space.

Saying 2x is a useful guide, but I think we can do better. Because I see very different behaviour on systems with a lot more RAM.

This is how the overheads work on my 0.3.0 relay with 8 GB per tor instance, and a high MaxMemInQueues:

  • 512 MB per instance with no circuits
  • 256 - 512 MB extra per instance with relay circuits
  • 256 - 512 MB extra per instance with exit streams

The RAM usage will occasionally spike to a few gigabytes, but I've never seen it all used.

So I think we should document the following RAM usage and MaxMemInQueues settings:

  • Relays: minimum 768 MB, set MaxMemInQueues to (RAM per instance - 512 MB)*N
  • Exits: minimum 1GB, set MaxMemInQueues to (RAM per instance - 768 MB)*N

For all versions without the destroy cell patch (0.3.2.7-rc and all current versions as of 1 January 2018), N should be 0.5 or lower. It's reasonable to expect destroy cell queues and other objects to take up approximately the same amount of RAM as the queues.

For all versions with the destroy cell patch (0.3.2.8-rc and all versions released after 1 January 2018), N should be 0.75 or lower. It's reasonable to expect destroy cell queues and other objects to take up a third of the queue RAM.

Now we just have to turn this into a man page patch and wiki entry.

Here's some advice I've just given some relay operators:

If you have 4 Tor Exits, a 1 Gbps connection, and this much RAM, use this setting:
8 GB RAM -> MaxMemInQueues 256 MB
16 GB RAM -> MaxMemInQueues 1 GB
32 GB RAM -> MaxMemInQueues 2 GB

I think this is the right level of detail for a man page.

We could probably afford 3 GB with 32 GB of RAM, but there are other issues:

  • do we really benefit from buffering more than a minute of traffic?
  • how much extra CPU load do we get if we set MaxMemInQueues too high?
  • how low does MaxMemInQueues need to be to resist a sniper attack?

I also opened #24782 so we change the default in Tor itself.

comment:7 Changed 19 months ago by teor

Still working on the best advice to give.

Here's a MaxMemInQueues setting that's easier to understand:

Set MaxMemInQueues to half your available RAM per tor instance.
(It doesn't track all of Tor's memory usage.)

If your machine has one relay, if you have this much RAM, try this setting:
4 GB -> MaxMemInQueues 512 MB
8 GB -> MaxMemInQueues 2 GB
16 GB -> MaxMemInQueues 4 GB
32 GB -> MaxMemInQueues 8 GB

(If you have more than one relay on the machine, divide MaxMemInQueues by the
number of relays. If you still have RAM issues, take down one relay.)

Here's a list of other options relay operators can use for load tuning, probably appropriate for a wiki page:

https://lists.torproject.org/pipermail/tor-relays/2018-January/014014.html

comment:8 Changed 19 months ago by starlight

I think suggestions in comment 6 and 7 are a bit conservative (but ok) and still like my approx 40% of available memory per instance. So on my 4G machine thinking ~1G for the kernel, I set MaxMemInQueues=1024MB for one relay instance and have some room for a some other daemons. With this setting tor daemon 0.2.9.14 goes to 1.5GB under heavy load (old slow CPU and medium-fast FiOS connection, YMMV) and when hit with a known sniper attack it went to 2GB and survived with Tor's OOM logic killing a 1GB circuit (event log entries above). Leaves quite a bit of space for socket buffer memory and about 500-700MB of other daemons. Note I prefer DisableAllSwap=1 and recommend it strongly, so all Tor daemon memory will fall in the Unevictable/Mlocked accounting and _cannot_ be paged to disk (a detrimental behavior no doubt).

Put another way, MMIQ=1G -> daemon 2GB (80%) plus socket-buffer-delta guess 500MB or 2.5GB total budget (100%) for the instance.

I see kernel SLAB around 900MB (buffer frees tuned lazily with ~7000 active TCP connections at the time of observation, peak around ~9000 connections).

On a 4G machine running just Tor and nothing else, I'd take 40% of 3G and get MMIQ=1228MB.

Don't forget sysctl.conf

vm.min_free_kbytes = 262144

which causes linux to attempt to keep 1/4 GB of memory free. Linux will take aggressive action to page-out idle memory and free cached files when this threshold is hit--it's not an absolute impediment to allocations. The idea is a huge sudden burst of network traffic will rapidly chew up free memory for socket buffers, and if /proc/meminfo:MemFree hits zero and the kernel needs to allocate memory while servicing a network interrupt, the systems will OOPS/crash. So one wants linux to maintain a nice cushion against hard memory exhaustion. /proc/meminfo:Cached not-dirty memory the easiest target for obtaining true free memory, but Cached pages cannot be converted to MemFree during interrupt service--takes some time, i.e. a few hundred microseconds to a couple of milliseconds depending on how busy the scheduler is.

On an 8GB machine I'd still take 1G for the kernel and then 40% of 7G for MaxMemInQueues=2800MB. Two daemons MMIQ=1400MB. On big memory systems (16GB and up) I don't bother setting MMIQ higher than 4096MB or 4G for an instance.

comment:9 Changed 19 months ago by starlight

bad news:

Memory leaks in tor are more severe than reported at the top of this ticket.

My relay became a HSDIR earlier today while also undergoing attack and the Tor daemon leaked memory all the way from 1.5GB total memory utilization to 2.4GB utilization and was killed.

0.2.9.14 is dead (so much for LTS) and I am forced to upgrade to 0.3.2.8-rc

comment:10 in reply to:  9 Changed 19 months ago by teor

Replying to starlight:

bad news:

Memory leaks in tor are more severe than reported at the top of this ticket.

My relay became a HSDIR earlier today while also undergoing attack and the Tor daemon leaked memory all the way from 1.5GB total memory utilization to 2.4GB utilization and was killed.

0.2.9.14 is dead (so much for LTS) and I am forced to upgrade to 0.3.2.8-rc

Please open a different ticket with this information, or we will lose track of it.

comment:11 Changed 19 months ago by starlight

ok, #24806

comment:12 Changed 14 months ago by teor

Milestone: Tor: 0.3.2.x-finalTor: unspecified

There is no patch in this ticket, moving it to unspecified until we have something to review.

comment:13 Changed 14 months ago by starlight

update from the trenches:

Situation is dramatically better and better understood.

Major caveat, certainly should be mentioned: all-bets-are-off if collection for either of

CellStatistics 1
ConnDirectionStatistics 1

are set. CellStatistics in particular results in excess gigabytes of memory consumption on busy relays.

0.3.4.1-alpha stats for two relays running now for ten days
relays not experiencing attack or abuse activity

medium-size guard with slow CPU, excess BW capacity
===================================================
~11M self-measure
~4M Blutmagie BW avg

MaxMemInQueues 1024MB

/proc/meminfo

MemAvailTot:     3071560
MemTotal:        4059768 kB
MemFree:         2808820 kB
Cached:           262740 kB
Mlocked:          698840 kB
SwapTotal:       2097144 kB
SwapFree:        2083096 kB
Dirty:                72 kB
Slab:             295564 kB
CommitLimit:     4127028 kB
Committed_AS:     594968 kB

/proc/$(pgrep tor)/status

VmPeak:   836876 kB
VmSize:   689024 kB
VmHWM:    777472 kB
VmRSS:    629680 kB
VmData:   551248 kB
somewhat fast Exit, consensus rank ~170, exit rank ~60
======================================================
~25M self-measure
~16M Blutmagie BW avg

MaxMemInQueues 2048MB

/proc/meminfo

MemAvailTot:     13784300
MemTotal:       16457808 kB
MemFree:        13482956 kB
Cached:           301344 kB
Mlocked:          939808 kB
SwapTotal:       4194296 kB
SwapFree:        4194296 kB
Dirty:               236 kB
Slab:             754436 kB
CommitLimit:    12423200 kB
Committed_AS:    1506840 kB

/proc/$(pgrep tor)/status

VmPeak:  1181552 kB
VmSize:   988900 kB
VmHWM:   1129152 kB
VmRSS:    936500 kB
VmData:   855580 kB

Observed similar values in recent months running 0.3.3, including the final days
of last winter's overload attacks.

In light of the observations and the numerous improvements in memory OOM accounting,
reporting and mitigation, plus the new circuit queued-cell maximum logic, it appears
safe to recommend MaxMemInQueues values incorporating reasonable premiums that allow
for usual OS-process overheads. Perhaps physical memory of 120% or 130% MaxMemInQueues
per dameon instance? If Shadow-environment tests for simulating attacks exist it would
be worth running them against 0.3.4 before arriving at final recommendations.

comment:14 Changed 14 months ago by starlight

KIST scheduler is effective at minimizing data queued on egress socket buffers, but ingress socket memory is determined by the TCP/IP stack and remote peer behavior. Perhaps then 150% of MaxMemInQueues provides a better margin? A Shadow test simulating all-out botnet attack scenarios would help greatly determining extreme worst-case memory consumption.

comment:15 Changed 14 months ago by starlight

The critical factor in socket memory allowance is

net.ipv4.tcp_mem = <min> <pressure> <max>

where <max> is the absolute maximum memory allocated to socket
buffers in 4096 byte pages. Checking a couple of systems the
default values vary form 2/3rds of a 3G virtual machine to 20% of
an 8GB physical machine. Not to be confused with tcp_rmem
and tcp_wmem per-socket tuning parameters.

Correct advice advice for preventing OOM daemon crashes in
worst-case scenarios should probably be something like:

1) find out what tcp_mem <max> is and subtract that from physical
memory to arrive at memory available for the daemon; subtract an
additional 384-512MB for the kernel. Tune tcp_mem if you don't
like the defaults.

2) The remaining memory is allocated to one or more tor daemons
where each daemon is allocated 130% of MaxMemInQueues.

The above can be turned into a table indicating MaxMemInQueues
values for different typical distros easily enough, though
hopefully most operators are able to divide a number by 1.3

Note: See TracTickets for help on using tickets.