Opened 4 months ago

Closed 4 months ago

Last modified 6 weeks ago

#24687 closed defect (not a bug)

Tor eats all mbufs on FreeBSD

Reported by: AMDmi3 Owned by:
Priority: High Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version: Tor: 0.3.1.9
Severity: Major Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I'm running tor relay on FreeBSD 11.1, and not long ago the system started to occasionally stop responding to the network with

kernel: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached

accompanied with

kernel: sonewconn: pcb 0xfffff80003c61570: Listen queue overflow: 193 already in queue awaiting acceptance (211 occurrences)

messages in the logs.

It first happened on Dec 13 and repeated 3 times, the approximate lifetime of the relay is ~1 day.

Seems like a DOS attack which makes tor open a lot of connections and eat all the mbuf space. I don't see any peaks on trafic or pps graphs though, and there are no messages in tor log.

I'm currently trying to gather more information.

Child Tickets

Attachments (1)

mbufs-month.png (31.4 KB) - added by AMDmi3 6 weeks ago.
Graph illustrating the issue fixed

Download all attachments as: .zip

Change History (9)

comment:1 Changed 4 months ago by dgoulet

Component: - Select a componentCore Tor/Tor
Milestone: Tor: unspecified

Most likely the ongoing DDoS on the network is affecting your relay. The tor-relays@ mailing list archive has many discussions going on about this. I'm not very familiar with mbuf in FreeBSD but if this is related to a connection limit, you can bump it higher.

The other option is to set MaxMemInQueues (man tor) to a reasonable limit of RAM you want your relay to use and once that limit is reached, Tor will triggers its OOM to remove bytes from memory and thus putting less pressure on the network.

Appart from that, we are currently working on fixes to improve "tor" to deal better with this DDoS. Please, update the tickets with any more information you can gather on this.

Thanks!

comment:2 Changed 4 months ago by AMDmi3

Thanks for pointers, I'll read the maillist.

While here, I need to clarify some bits. mbufs/mbuf clusters are units of memory management in FreeBSD kernel IPC subsystems. In particular, socket buffers are stored in mbufs. There's a (tunable) systemwide limit on maximal number of mbuf clusters, and when it's reached no more mbufs are allocated and incoming packets may no longer be processed, which results in network completely dead for a whole machine. This is what happens here.

Since it's related to kernel memory management, it's unlikely that it could be fixed by tor memory options. My guess is that the attack makes tor open a lot of sockets and fills their buffers. In theory, on my FreeBSD setup it allows taking up to (sysctl net.inet.tcp.recvspace * min(sysctl kern.ipc.maxsockets, ulimit -n)) = (65536 * 31740) ~= 2G of kernel memory. The machine only has 1G, but mbuf limit is hit before the memory is exhausted anyway.

This could be fixed by limiting the max number of open files for tor with some low value, but my graphs show that it needs at least 6k sockets as it is, so any sane limit (around 10k) will still allow to take a lot of memory. So it should probably be handled on the tor side somehow, limiting number of connections which take a lot of memory (I assume normal connections don't consume this much) or tuning socket buffer sizes.

comment:3 Changed 4 months ago by teor

Milestone: Tor: unspecifiedTor: 0.3.3.x-final

If kernel buffers are the issue, you probably want to use the ConnLimit or ConstrainedSockets options.

comment:4 Changed 4 months ago by nickm

I also wonder if one of the kistlite bugs fixed in 0.3.2.8-rc might have been responsible here. What version of Tor were you running? If it was something between 0.3.2.1-alpha and 0.3.2.7-rc, please upgrade to the latest version, and stuff might get a little better.

comment:5 Changed 4 months ago by AMDmi3

I've gathered some socket statistics, and it doesn't show any apparent anomalies (e.g. specific sockets with unusually large buffers). There's just a lot of sockets with moderately big receive buffers (up to 512KB), and these sum up to take a lot of buffer space. The only anomaly is that number of such sockets raise quite quickly at some moment.

ConstrainedSockets seem to be just what I need, thanks. Don't want to limit number of sockets for now, since there's no peaks in socket usage, it stable around ~6k.

I'm running tor 0.3.1.9.

comment:6 Changed 4 months ago by teor

Resolution: not a bug
Status: newclosed

It looks like the existing tor options are enough to resolve this issue.

comment:7 Changed 6 weeks ago by AMDmi3

For the note, it looks like an update to 0.3.3.3 has fixed the issue completely - I don't see any more spikes in mbuf usage.

Changed 6 weeks ago by AMDmi3

Attachment: mbufs-month.png added

Graph illustrating the issue fixed

comment:8 Changed 6 weeks ago by cypherpunks

It might also be the case that 0.3.3.3-alpha was just at the same time when the bad guys decided to leave the network
https://metrics.torproject.org/userstats-relay-country.html?start=2017-12-14&end=2018-03-14&country=de&events=off

Last edited 6 weeks ago by cypherpunks (previous) (diff)
Note: See TracTickets for help on using tickets.