After upgrade from 3.3.7 -> 3.4.8, I've noticed that memory usage is erratic. Please consult the image below, showing memory available in green and memory used in blue on a tor dedicate system.
The upgrade to 3.4.8 took place on Saturday in week 37. The connection count / load was relativity stable:
(green inbound, blue outbound)
I was wondering if there were any changes from 3.3.7 to 3.4.8 which might impact memory behaviour on a middle-relay.
I'm also wander what is the "source" of this usage?
We really need somebody who can run Tor as a middle relay using tcmalloc or a similar memory usage profiling tool. That would have the best chance to tell us what accounts for the increased memory usage.
I've been running 0.3.4.8 for 2 days now and still no sign of memory leak whatsoever. Relay at 23MB/s and at ~370M of RAM which is normal. On Ubuntu Xenial here.
Sep 24 15:37:08.385 [notice] Tor 0.3.4.8 (git-da95b91355248ad8) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.2g, Zlib 1.2.8, Liblzma 5.1.0alpha, and Libzstd N/A.
Would be interesting to see the version line from the reporters as well just to be sure it is not caused by an old a newer version of any of the libraries. For example to compare the versions 0.3.3.9-1d90.stretch+1 and 0.3.4.8-1d90.stretch+1 mentioned in the reddit post.
it happens also here (tor relay configured as non-exit)
It takes about 8 hours to eat up 4GB of memory, then it starts eating swap, when it is done with swap it gets killed (oom-killer) and started again by systemd.
it happens pretty reliably (every day since upgrading from 0.3.3.x)
only observed on Ubuntu 18.04 (not on Debian 9 or FreeBSD also running 0.3.4.8)
the system is also running nyx
I can not help debugging since I've no access to the affected box (just saw the monitoring and logs)
We really need somebody who can run Tor as a middle relay using tcmalloc or a similar memory usage profiling tool. That would have the best chance to tell us what accounts for the increased memory usage.
can you write down the steps to do so, so affected people can collect the necessary information?
I can also confirm the problem on Ubuntu 18.04, installed from deb.torproject.org:
Tor 0.3.4.8 (git-5da0e95e4871a0a1) running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.0g, Zlib 1.2.11, Liblzma 5.2.2, and Libzstd 1.3.3.
System is on auto-update, problems started on Oct 5th, 10pm UTC. There do not seem to have been any updates during that time period.
Leak fills up memory roughly every 4h and restarts the process. The relay is not an exit node, but a fallback-directory. I'm happy to help investigating.
The same is happening to me with an Ubuntu 14.04 VPS. I start the service and in some hours the service is down with the same errors from the firts post.
Tor version: 0.3.4.8, Kernel version: 3.13.0-160-generic, OpenSSL 1.1.0h
Version: Tor 0.3.4.8 (git-da95b91355248ad8) running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1, Zlib 1.2.11, Liblzma 5.2.4, and Libzstd 1.3.5.
However, on a 3rd node, running the same Tor version but an older Kernel (4.16.13-2-hardened) this does NOT happen.
What's funny about this memleak is, that this memory usage "comes out of nowhere", in top it does not appear as VIRT, RES or even SHR. Just by restarting services "with good luck" I found out that Tor was causing this memleak. When the OOM killer was summoned, it seemed also to be unsure what was causing this, because it killed first Redis, PostgreSQL, Dovecot and then Tor with that the memory freeing up again.
It seems that Tor doesn't like Valgrind's memory profiler:
Tor[22094]: connection_dir_finished_flushing(): Bug: Emptied a dirserv buffer, but it's still spooling! (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: connection_mark_for_close_internal_(): Bug: Duplicate call to connection_mark_for_close at src/or/directory.c:5201 (first at src/or/main.c:1210) (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: tor_bug_occurred_(): Bug: src/or/connection.c:841: connection_mark_for_close_internal_: This line should not have been reached. (Future instances of this warning will be silenced.) (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: Line unexpectedly reached at connection_mark_for_close_internal_ at src/or/connection.c:841. Stack trace: (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(log_backtrace+0x45) [0x2970b5] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(tor_bug_occurred_+0xbc) [0x2b26cc] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(connection_dir_finished_flushing+0xad) [0x24a33d] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(connection_handle_read+0xb00) [0x21f5b0] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(+0x4fbff) [0x157bff] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/lib/libevent-2.1.so.6(+0x220d8) [0x4c120d8] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/lib/libevent-2.1.so.6(event_base_loop+0x53f) [0x4c12b1f] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(do_main_loop+0x225) [0x15a0b5] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(tor_run_main+0x1125) [0x15c935] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(tor_main+0x3b) [0x15440b] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(main+0x1a) [0x15418a] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/lib/libc.so.6(__libc_start_main+0xf3) [0x5b81223] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Bug: /usr/bin/tor(_start+0x2e) [0x1541ee] (on Tor 0.3.4.8 da95b91355248ad8)Tor[22094]: Failing because we have 991 connections already. Please read doc/TUNING for guidance.