Tor 0.3.4.8 is leaking memory

changed milestone to %Tor: 0.3.5.x-final

added 034-backport 035-must component::core tor/tor memleak milestone::Tor: 0.3.5.x-final oom priority::very high regression? reporter::anong resolution::fixed severity::critical status::closed tor-relay type::defect version::tor 0.3.4.8 labels

It seems this has been observed by other users as well https://www.reddit.com/r/TOR/comments/9gts72/memory_leaks_since_034_on_debian/

Trac:
Username: anong

Here are the latest log entries before the process got killed. Tor is running on a Raspberry Pi under Debian 9.4 (DietPi), with kernel 4.14.70-v7+.

Sep 21 12:37:03 DietPi Tor[1369]: Heartbeat: Tor's uptime is 1 day 12:00 hours, with 1365 circuits open. I've sent 69.43 GB and received 68.12 GB.
Sep 21 12:37:03 DietPi Tor[1369]: Circuit handshake stats since last time: 3340/3340 TAP, 33194/33194 NTor.
Sep 21 12:37:03 DietPi Tor[1369]: Since startup, we have initiated 0 v1 connections, 0 v2 connections, 0 v3 connections, and 32984 v4 connections; and received 2 v1 connect
Sep 21 12:37:03 DietPi Tor[1369]: DoS mitigation since startup: 0 circuits killed with too many cells. 0 circuits rejected, 0 marked addresses. 0 connections closed. 389 si
Sep 21 13:17:01 DietPi CRON[7483]: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 21 13:17:01 DietPi CRON[7487]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Sep 21 13:17:02 DietPi CRON[7483]: pam_unix(cron:session): session closed for user root
Sep 21 14:17:01 DietPi CRON[7615]: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 21 14:17:01 DietPi CRON[7619]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Sep 21 14:17:01 DietPi CRON[7615]: pam_unix(cron:session): session closed for user root
Sep 21 14:26:18 DietPi dropbear[7726]: Child connection from 192.168.100.11:51216
Sep 21 14:26:18 DietPi dropbear[7726]: Pubkey auth succeeded for 'root' with key md5 de:04:75:29:92:ad:c2:a0:9a:62:27:4a:37:53:92:fb from 192.168.100.11:51216
Sep 21 14:26:18 DietPi dropbear[7727]: lastlog_perform_login: Couldn't stat /var/log/lastlog: No such file or directory
Sep 21 14:26:18 DietPi dropbear[7727]: lastlog_openseek: /var/log/lastlog is not a file or directory!
Sep 21 14:27:23 DietPi sudo[7880]:     root : TTY=pts/1 ; PWD=/root ; USER=debian-tor ; COMMAND=/usr/local/bin/nyx
Sep 21 14:27:23 DietPi sudo[7880]: pam_unix(sudo:session): session opened for user debian-tor by root(uid=0)
Sep 21 14:27:25 DietPi Tor[1369]: New control connection opened from 127.0.0.1.
Sep 21 14:28:30 DietPi sudo[7880]: pam_unix(sudo:session): session closed for user debian-tor
Sep 21 15:17:01 DietPi CRON[7943]: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 21 15:17:01 DietPi CRON[7947]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Sep 21 15:17:02 DietPi CRON[7943]: pam_unix(cron:session): session closed for user root
Sep 21 15:34:42 DietPi dropbear[7726]: Exit (root): Disconnect received
Sep 21 16:17:01 DietPi CRON[8072]: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 21 16:17:01 DietPi CRON[8076]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Sep 21 16:17:02 DietPi CRON[8072]: pam_unix(cron:session): session closed for user root
Sep 21 17:17:01 DietPi kernel: tor invoked oom-killer: gfp_mask=0x14080c0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
Sep 21 17:17:02 DietPi kernel: tor cpuset=/ mems_allowed=0
Sep 21 17:17:02 DietPi kernel: CPU: 3 PID: 1376 Comm: tor Not tainted 4.14.70-v7+ #1144
Sep 21 17:17:02 DietPi kernel: Hardware name: BCM2835
Sep 21 17:17:02 DietPi kernel: [<8010ffd8>] (unwind_backtrace) from [<8010c240>] (show_stack+0x20/0x24)
Sep 21 17:17:02 DietPi kernel: [<8010c240>] (show_stack) from [<80787b24>] (dump_stack+0xd4/0x118)
Sep 21 17:17:02 DietPi kernel: [<80787b24>] (dump_stack) from [<80224564>] (dump_header+0xac/0x208)
Sep 21 17:17:02 DietPi kernel: [<80224564>] (dump_header) from [<802238cc>] (oom_kill_process+0x478/0x584)
Sep 21 17:17:02 DietPi kernel: [<802238cc>] (oom_kill_process) from [<8022422c>] (out_of_memory+0x124/0x334)
Sep 21 17:17:02 DietPi kernel: [<8022422c>] (out_of_memory) from [<80229ce8>] (__alloc_pages_nodemask+0x1060/0x11d8)
Sep 21 17:17:02 DietPi kernel: [<80229ce8>] (__alloc_pages_nodemask) from [<802561e8>] (__pte_alloc+0x34/0x19c)
Sep 21 17:17:02 DietPi kernel: [<802561e8>] (__pte_alloc) from [<80259eb4>] (handle_mm_fault+0xb98/0xc20)
Sep 21 17:17:02 DietPi kernel: [<80259eb4>] (handle_mm_fault) from [<802543f0>] (__get_user_pages+0x108/0x470)
Sep 21 17:17:02 DietPi kernel: [<802543f0>] (__get_user_pages) from [<80254dc4>] (populate_vma_page_range+0x84/0x8c)
Sep 21 17:17:02 DietPi kernel: [<80254dc4>] (populate_vma_page_range) from [<80254e7c>] (__mm_populate+0xb0/0x148)
Sep 21 17:17:02 DietPi kernel: [<80254e7c>] (__mm_populate) from [<802429cc>] (vm_mmap_pgoff+0xc4/0xc8)
Sep 21 17:17:02 DietPi kernel: [<802429cc>] (vm_mmap_pgoff) from [<8025d3d0>] (SyS_mmap_pgoff+0x58/0xd0)
Sep 21 17:17:02 DietPi kernel: [<8025d3d0>] (SyS_mmap_pgoff) from [<80108060>] (ret_fast_syscall+0x0/0x28)
Sep 21 17:17:02 DietPi kernel: Mem-Info:
Sep 21 17:17:02 DietPi kernel: active_anon:7 inactive_anon:0 isolated_anon:0
                                active_file:19 inactive_file:0 isolated_file:0
                                unevictable:177347 dirty:6 writeback:0 unstable:0
                                slab_reclaimable:2269 slab_unreclaimable:6250
                                mapped:18266 shmem:20 pagetables:394 bounce:0
                                free:5786 free_pcp:42 free_cma:1700
Sep 21 17:17:02 DietPi kernel: Node 0 active_anon:28kB inactive_anon:0kB active_file:76kB inactive_file:0kB unevictable:709388kB isolated(anon):0kB isolated(file):0kB mappe
Sep 21 17:17:02 DietPi kernel: Normal free:23144kB min:16384kB low:20480kB high:24576kB active_anon:196kB inactive_anon:0kB active_file:44kB inactive_file:0kB unevictable:7
Sep 21 17:17:02 DietPi kernel: lowmem_reserve[]: 0 0
Sep 21 17:17:02 DietPi kernel: Normal: 198*4kB (UMEC) 229*8kB (UMEC) 325*16kB (UMEC) 69*32kB (UMEC) 14*64kB (UMEC) 6*128kB (UMC) 2*256kB (UC) 3*512kB (M) 3*1024kB (UM) 1*20
Sep 21 17:17:02 DietPi kernel: 18353 total pagecache pages
Sep 21 17:17:02 DietPi kernel: 11 pages in swap cache
Sep 21 17:17:02 DietPi kernel: Swap cache stats: add 4204, delete 4193, find 24/46
Sep 21 17:17:02 DietPi kernel: Free swap  = 1080828kB
Sep 21 17:17:02 DietPi kernel: Total swap = 1097724kB
Sep 21 17:17:02 DietPi kernel: 255488 pages RAM
Sep 21 17:17:02 DietPi kernel: 0 pages HighMem/MovableOnly
Sep 21 17:17:02 DietPi kernel: 5443 pages reserved
Sep 21 17:17:02 DietPi kernel: 2048 pages cma reserved
Sep 21 17:17:02 DietPi kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Sep 21 17:17:02 DietPi kernel: [   84]     0    84     6889      446      10       0       98             0 systemd-journal
Sep 21 17:17:02 DietPi kernel: [  142]     0   142     3492      435       8       0      126         -1000 systemd-udevd
Sep 21 17:17:02 DietPi kernel: [  276]     0   276      616      262       6       0       30             0 dropbear
Sep 21 17:17:02 DietPi kernel: [  814]     0   814     1049      258       5       0       35             0 agetty
Sep 21 17:17:02 DietPi kernel: [  953]     0   953     1324      403       6       0       43             0 cron
Sep 21 17:17:02 DietPi kernel: [ 1369]  1001  1369   182943   177124     358       0        0             0 tor
Sep 21 17:17:02 DietPi kernel: Out of memory: Kill process 1369 (tor) score 338 or sacrifice child
Sep 21 17:17:02 DietPi kernel: Killed process 1369 (tor) total-vm:731772kB, anon-rss:635436kB, file-rss:73060kB, shmem-rss:0kB
Sep 21 17:17:02 DietPi kernel: oom_reaper: reaped process 1369 (tor), now anon-rss:636364kB, file-rss:73048kB, shmem-rss:0kB

Trac:
Username: anong

Same thing on my server, Tor version 0.3.4.8 (git-5da0e95e4871a0a1) on Ubuntu Xenial. You can clearly see on this picture where I updated to 0.3.4.8: https://www.ganggarrison.com/filehost/sharex/chrome_2018-09-23_02-28-57.png

Trac:
Username: medo

Hm. Did previous versions have this problem? Which is the first version that behaves this way for you?

Are you able to try running Tor with some kind of memory debugging tool like tcmalloc to see where the RAM is going?

Trac:
Milestone: N/A to Tor: 0.3.5.x-final
Keywords: oom deleted, oom 034-backport added

Trac:
Keywords: oom 034-backport deleted, regression? memleak oom 034-backport tor-relay added

This happened initially after the upgrade to 0.3.4.8. Never happened before. The process just crashed again:

Sep 23 21:03:40 DietPi kernel: [<80254dc4>] (populate_vma_page_range) from [<80254e7c>] (__mm_populate+0xb0/0x148)
Sep 23 21:03:40 DietPi kernel: [<80254e7c>] (__mm_populate) from [<802429cc>] (vm_mmap_pgoff+0xc4/0xc8)
Sep 23 21:03:40 DietPi kernel: [<802429cc>] (vm_mmap_pgoff) from [<8025d3d0>] (SyS_mmap_pgoff+0x58/0xd0)
Sep 23 21:03:40 DietPi kernel: [<8025d3d0>] (SyS_mmap_pgoff) from [<80108060>] (ret_fast_syscall+0x0/0x28)
Sep 23 21:03:40 DietPi kernel: Mem-Info:
Sep 23 21:03:40 DietPi kernel: active_anon:0 inactive_anon:0 isolated_anon:0
                                active_file:10 inactive_file:25 isolated_file:0
                                unevictable:180523 dirty:16 writeback:0 unstable:0
                                slab_reclaimable:2202 slab_unreclaimable:6519
                                mapped:20297 shmem:0 pagetables:444 bounce:0
                                free:4014 free_pcp:200 free_cma:0
Sep 23 21:03:40 DietPi kernel: Node 0 active_anon:0kB inactive_anon:0kB active_file:40kB inactive_file:100kB unevictable:722092kB isolated(anon):0kB isolated(file):0kB mapped:81188kB dirty:64kB writeback:0kB shmem:0kB writeback_tmp:0kB
Sep 23 21:03:40 DietPi kernel: Normal free:16056kB min:16384kB low:20480kB high:24576kB active_anon:0kB inactive_anon:0kB active_file:160kB inactive_file:44kB unevictable:722096kB writepending:64kB present:1021952kB managed:1000180kB ml
Sep 23 21:03:40 DietPi kernel: lowmem_reserve[]: 0 0
Sep 23 21:03:40 DietPi kernel: Normal: 4*4kB (UM) 382*8kB (UME) 365*16kB (UE) 110*32kB (UME) 55*64kB (UME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15952kB
Sep 23 21:03:40 DietPi kernel: 20347 total pagecache pages
Sep 23 21:03:40 DietPi kernel: 0 pages in swap cache
Sep 23 21:03:40 DietPi kernel: Swap cache stats: add 22160, delete 22160, find 2613/5056
Sep 23 21:03:40 DietPi kernel: Free swap  = 4159740kB
Sep 23 21:03:40 DietPi kernel: Total swap = 4194300kB
Sep 23 21:03:40 DietPi kernel: 255488 pages RAM
Sep 23 21:03:40 DietPi kernel: 0 pages HighMem/MovableOnly
Sep 23 21:03:40 DietPi kernel: 5443 pages reserved
Sep 23 21:03:40 DietPi kernel: 2048 pages cma reserved
Sep 23 21:03:40 DietPi kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Sep 23 21:03:40 DietPi kernel: [   84]     0    84     7373      446      11       0      100             0 systemd-journal
Sep 23 21:03:40 DietPi kernel: [  142]     0   142     3492      356       8       0      140         -1000 systemd-udevd
Sep 23 21:03:40 DietPi kernel: [  276]     0   276      616      293       6       0       35             0 dropbear
Sep 23 21:03:40 DietPi kernel: [  814]     0   814     1049      258       5       0       35             0 agetty
Sep 23 21:03:40 DietPi kernel: [ 9666]     0  9666     1324      434       7       0       61             0 cron
Sep 23 21:03:40 DietPi kernel: [10659]  1001 10659   182060   180276     367       0        0             0 tor
Sep 23 21:03:40 DietPi kernel: [17067]     0 17067      733      371       6       0       53             0 dropbear
Sep 23 21:03:40 DietPi kernel: [17068]     0 17068     1678      338       7       0      476             0 bash
Sep 23 21:03:40 DietPi kernel: [17624]     0 17624      733      371       6       0       47             0 dropbear
Sep 23 21:03:40 DietPi kernel: [17625]     0 17625     1678      362       8       0      476             0 bash
Sep 23 21:03:40 DietPi kernel: Out of memory: Kill process 10659 (tor) score 139 or sacrifice child
Sep 23 21:03:40 DietPi kernel: Killed process 10659 (tor) total-vm:728240kB, anon-rss:640012kB, file-rss:81092kB, shmem-rss:0kB

I am not familiar with memory debugging / tcmalloc. I have downgraded to 0.3.3.9 now.

Trac:
Username: anong

Similar story here.

After upgrade from 3.3.7 -> 3.4.8, I've noticed that memory usage is erratic. Please consult the image below, showing memory available in green and memory used in blue on a tor dedicate system.

The upgrade to 3.4.8 took place on Saturday in week 37. The connection count / load was relativity stable:

(green inbound, blue outbound) I was wondering if there were any changes from 3.3.7 to 3.4.8 which might impact memory behaviour on a middle-relay. I'm also wander what is the "source" of this usage?

Potentially relevant config entries (remainder is port/ip, accounting & exit(no)): MaxConsensusAgeForDiffs 4 hours DirCache 0 MaxMemInQueues 64 MB (but "defaulting" to 256mb now)

Os: Ubuntu 16.04.5 LTS

Adding version number: Sep 29 06:25:01.000 [notice] Tor 0.3.4.8 (git-5da0e95e4871a0a1) opening new log file.

Trac:
Username: r1610091651@telenet.be

We really need somebody who can run Tor as a middle relay using tcmalloc or a similar memory usage profiling tool. That would have the best chance to tell us what accounts for the increased memory usage.

I've been running 0.3.4.8 for 2 days now and still no sign of memory leak whatsoever. Relay at 23MB/s and at ~370M of RAM which is normal. On Ubuntu Xenial here.

Sep 24 15:37:08.385 [notice] Tor 0.3.4.8 (git-da95b91355248ad8) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.2g, Zlib 1.2.8, Liblzma 5.1.0alpha, and Libzstd N/A.

Would be interesting to see the version line from the reporters as well just to be sure it is not caused by ~~an old~~ a newer version of any of the libraries. For example to compare the versions 0.3.3.9-1~~d90.stretch+1 and 0.3.4.8-1~~d90.stretch+1 mentioned in the reddit post.

This was the version line from 0.3.4.8 where I experienced the issues:

Tor 0.3.4.8 (git-5da0e95e4871a0a1) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.1.0f, Zlib 1.2.8, Liblzma 5.2.2, and Libzstd 1.1.2.

I have downgraded to 0.3.3.9 now which is running fine:

Tor 0.3.3.9 (git-ca1a436fa8e53a32) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.1.0f, Zlib 1.2.8, Liblzma 5.2.2, and Libzstd 1.1.2.

Trac:
Username: anong

it happens also here (tor relay configured as non-exit)

It takes about 8 hours to eat up 4GB of memory, then it starts eating swap, when it is done with swap it gets killed (oom-killer) and started again by systemd.

it happens pretty reliably (every day since upgrading from 0.3.3.x)

only observed on Ubuntu 18.04 (not on Debian 9 or FreeBSD also running 0.3.4.8)

the system is also running nyx

I can not help debugging since I've no access to the affected box (just saw the monitoring and logs)

We really need somebody who can run Tor as a middle relay using tcmalloc or a similar memory usage profiling tool. That would have the best chance to tell us what accounts for the increased memory usage.

can you write down the steps to do so, so affected people can collect the necessary information?

Trac:
Username: mem

Tentatively assign the 035-must tag here, but we will have to defer this if we can't figure it out.

Trac:
Keywords: regression? memleak oom 034-backport tor-relay deleted, regression? memleak oom 034-backport tor-relay 035-must added

Mark all 035-must tickets as "very high"

Trac:
Priority: High to Very High

I can also confirm the problem on Ubuntu 18.04, installed from deb.torproject.org:

Tor 0.3.4.8 (git-5da0e95e4871a0a1) running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.0g, Zlib 1.2.11, Liblzma 5.2.2, and Libzstd 1.3.3.

System is on auto-update, problems started on Oct 5th, 10pm UTC. There do not seem to have been any updates during that time period.

Leak fills up memory roughly every 4h and restarts the process. The relay is not an exit node, but a fallback-directory. I'm happy to help investigating.

Please contact me at the provided relay address: https://metrics.torproject.org/rs.html#details/B291D30517D23299AD7CEE3E60DFE60D0E3A4664

Trac:
Username: torkit

The same is happening to me with an Ubuntu 14.04 VPS. I start the service and in some hours the service is down with the same errors from the firts post.

Tor version: 0.3.4.8, Kernel version: 3.13.0-160-generic, OpenSSL 1.1.0h

Trac:
Username: cypherbits

Happens to me on 2 nodes, running Tor 0.3.4.8 and Linux 4.18.12.a-1-hardened: https://imgur.com/a/h4bzyhy

Version: Tor 0.3.4.8 (git-da95b91355248ad8) running on Linux with Libevent 2.1.8-stable, OpenSSL 1.1.1, Zlib 1.2.11, Liblzma 5.2.4, and Libzstd 1.3.5.

However, on a 3rd node, running the same Tor version but an older Kernel (4.16.13-2-hardened) this does NOT happen.

What's funny about this memleak is, that this memory usage "comes out of nowhere", in top it does not appear as VIRT, RES or even SHR. Just by restarting services "with good luck" I found out that Tor was causing this memleak. When the OOM killer was summoned, it seemed also to be unsure what was causing this, because it killed first Redis, PostgreSQL, Dovecot and then Tor with that the memory freeing up again.

Trac:
Username: wrmsr

It seems that Tor doesn't like Valgrind's memory profiler:

Tor[22094]: connection_dir_finished_flushing(): Bug: Emptied a dirserv buffer, but it's still spooling! (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: connection_mark_for_close_internal_(): Bug: Duplicate call to connection_mark_for_close at src/or/directory.c:5201 (first at src/or/main.c:1210) (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: tor_bug_occurred_(): Bug: src/or/connection.c:841: connection_mark_for_close_internal_: This line should not have been reached. (Future instances of this warning will be silenced.) (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug: Line unexpectedly reached at connection_mark_for_close_internal_ at src/or/connection.c:841. Stack trace: (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(log_backtrace+0x45) [0x2970b5] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(tor_bug_occurred_+0xbc) [0x2b26cc] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(connection_dir_finished_flushing+0xad) [0x24a33d] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(connection_handle_read+0xb00) [0x21f5b0] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(+0x4fbff) [0x157bff] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/lib/libevent-2.1.so.6(+0x220d8) [0x4c120d8] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/lib/libevent-2.1.so.6(event_base_loop+0x53f) [0x4c12b1f] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(do_main_loop+0x225) [0x15a0b5] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(tor_run_main+0x1125) [0x15c935] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(tor_main+0x3b) [0x15440b] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(main+0x1a) [0x15418a] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/lib/libc.so.6(__libc_start_main+0xf3) [0x5b81223] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Bug:     /usr/bin/tor(_start+0x2e) [0x1541ee] (on Tor 0.3.4.8 da95b91355248ad8)
Tor[22094]: Failing because we have 991 connections already. Please read doc/TUNING for guidance.

Trac:
Username: wrmsr

Tor 0.3.4.8 is leaking memory

Child items 0

Activity