Opened 3 years ago

Closed 3 years ago

#19151 closed defect (worksforme)

Looks like a memory leak?

Reported by: t-3.net Owned by:
Priority: Medium Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version: Tor: 0.2.7.6
Severity: Major Keywords: unsolved
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Exit relay named Libero (64.113.32.29) running on CentOS Linux, kernel 2.6.32-573.18.1.el6.x86_64 , Tor 0.2.7.6.

I started seeing crashes around the beginning of May, as so:

May 10 02:09:59 Libero-CentOS kernel: tor invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
May 10 02:09:59 Libero-CentOS kernel: [<ffffffff8112a9f2>] ? oom_kill_process+0x82/0x2a0
May 10 02:09:59 Libero-CentOS kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
May 10 02:09:59 Libero-CentOS kernel: tor invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
May 10 02:09:59 Libero-CentOS kernel: [<ffffffff8112a9f2>] ? oom_kill_process+0x82/0x2a0
May 10 02:09:59 Libero-CentOS kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
May 11 14:57:58 Libero-CentOS kernel: tor invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
May 11 14:57:58 Libero-CentOS kernel: [<ffffffff8112add2>] ? oom_kill_process+0x82/0x2a0
May 11 14:57:58 Libero-CentOS kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
May 11 14:57:58 Libero-CentOS kernel: tor invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
May 11 14:57:58 Libero-CentOS kernel: [<ffffffff8112add2>] ? oom_kill_process+0x82/0x2a0
May 11 14:57:58 Libero-CentOS kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
May 18 21:11:50 Libero-CentOS kernel: tor invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
May 18 21:11:50 Libero-CentOS kernel: [<ffffffff8112a9f2>] ? oom_kill_process+0x82/0x2a0
May 18 21:11:50 Libero-CentOS kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name

I increased the memory of the virtual server to 4 gigs around May 11, saw it crash again on May 18.

I set up graphing on the memory after that, and it's not looking good. I'm going to try to attach an.odf file (libreoffice draw) that shows what I'm seeing.

It may have started shortly after an update.

If it helps you, I can get you into the vserver if you can get me an ssh public key to install. You can email me at tor@… if you want for that.

Child Tickets

Attachments (7)

tor-problem-map23.odg (455.5 KB) - added by t-3.net 3 years ago.
odg file (openoffice draw) with memory graphs, etc.
libero-memory-may25.png (28.7 KB) - added by t-3.net 3 years ago.
Memory graph morning of May 25
libero-memory-may30.png (43.9 KB) - added by t-3.net 3 years ago.
Libero memory use, May 30, 2016
libero-june1.png (44.9 KB) - added by t-3.net 3 years ago.
Libero memory use, June 1, 2016
libero-june2.png (43.4 KB) - added by t-3.net 3 years ago.
Libero memory use, June 2, 2016 (this system has 1 gig of swap)
libero-june4.png (47.3 KB) - added by t-3.net 3 years ago.
Libero memory use, June 4, 2016
libero-normal-now.png (24.9 KB) - added by t-3.net 3 years ago.
This thing is normal now, and this is now without any memory management configuration. I guess this ticket can be closed?

Download all attachments as: .zip

Change History (17)

Changed 3 years ago by t-3.net

Attachment: tor-problem-map23.odg added

odg file (openoffice draw) with memory graphs, etc.

comment:1 Changed 3 years ago by nickm

Keywords: 029-proposed unsolved added
Milestone: Tor: 0.2.???

(Please think hard before actually giving us a login on a relay; it's a security risk.)

comment:2 Changed 3 years ago by t-3.net

Well, if you find that you're having trouble duplicating the problem and would like to do some debugging, the offer is there. I'm not too worried about the security aspect of tor dev accessing the relay using their ssh key. The vserver is not real bad on security, and it's not like I'd be throwing open the standard ssh port with password auth enabled.

If you wanted, I could set up a copy of the vserver that has had /var/lib/tor/keys/* shredded and then you can put whatever keys you like there, but after that kind of change I think it would take a while to get its traffic back. The problem may require the traffic amounts Libero gets now or in some other way be peculiar to the existing setup.

comment:3 Changed 3 years ago by nickm

First thing to try: set the MaxMemInQueues option to something on the order of 80% of the amount of memory you would like Tor to allocate? That's how to tell Tor to control its own memory usage. If Tor grows way beyond the value that you set, then we might have a memory leak bug.

comment:4 Changed 3 years ago by nickm

Keywords: 029-proposed removed
Milestone: Tor: 0.2.???Tor: 0.2.9.x-final

Calling these "yes" because they are bugfixes.

comment:5 Changed 3 years ago by t-3.net

I've set this, hopefully good:

MaxMemInQueues 3GB

(If this turns out to fix it, I never had to set anything like this in the past and this was running on like 2 gigs of ram or something before. I wonder what's different.)

Last edited 3 years ago by t-3.net (previous) (diff)

comment:6 Changed 3 years ago by t-3.net

As of now, the memory use is on a slow, steady increase as before (and the memory use graph doesn't resemble the traffic graph).

If we're waiting to see if it hits a 3+ GB threshold and stops, it will take a number of days at this rate of growth, maybe 5+ days.

Changed 3 years ago by t-3.net

Attachment: libero-memory-may25.png added

Memory graph morning of May 25

comment:7 Changed 3 years ago by t-3.net

I added an attachment of what the memory graph looks like now (Memory graph morning of May 25). The config change was made in the middle of the day on the 23rd. The traffic and CPU graphs don't resemble it.

Changed 3 years ago by t-3.net

Attachment: libero-memory-may30.png added

Libero memory use, May 30, 2016

Changed 3 years ago by t-3.net

Attachment: libero-june1.png added

Libero memory use, June 1, 2016

Changed 3 years ago by t-3.net

Attachment: libero-june2.png added

Libero memory use, June 2, 2016 (this system has 1 gig of swap)

Changed 3 years ago by t-3.net

Attachment: libero-june4.png added

Libero memory use, June 4, 2016

comment:8 Changed 3 years ago by t-3.net

The system seems to have stabilized at 3.7x gigs of RAM with the 3gig setting, and Tor looks set to not crash OOM.

Kinda weird, Libero used to not have anything where it had to be configured or else it would steadily eat its memory. I wonder what it's stashing in there, in particular if no other relays are seeing OOM crashes. I hope nothing in Libero's been messed with somehow.

comment:9 Changed 3 years ago by t-3.net

Something rather strange has happened with this. I ran system updates on Libero and since that time, the odd memory behavior has stopped. I did set it for a 1 gig max in the torrc but, it's not getting close to that. It's hanging out around 430M - 470M.

Changed 3 years ago by t-3.net

Attachment: libero-normal-now.png added

This thing is normal now, and this is now without any memory management configuration. I guess this ticket can be closed?

comment:10 Changed 3 years ago by nickm

Resolution: worksforme
Status: newclosed

Okay, but please reopen if the problem comes back? This one was pretty confusing.

Note: See TracTickets for help on using tickets.