Opened 7 years ago

Closed 4 years ago

#5337 closed defect (worksforme)

Memory fragmentation(?) issues on dirauths

Reported by: Sebastian Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Keywords: tor-auth
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


I wonder how we can better track down the extremely high memory usage on some dirauths. For gabelmoo, I notice that the usage is certainly a lot higher than it used to be, and that it seems to grow rapidly with drawn-out multiple day "plateau" phases of virtually no increase in memory footprint. What are debug mechanisms (beyond our default usage of valgrind, which doesn't think there are any leaks) that I might be able to use here?

This is a somewhat big issue for me, as gabelmoo makes the system swap and therefore unresponsive, so that I have to restart it frequently

Child Tickets

Change History (19)

comment:1 Changed 7 years ago by nickm

First thing to try is linking with a different malloc to confirm whether that makes a difference. The "openbsd malloc" thing is solid but not so efficient. I also hear good thing about tcmalloc .

In the past, the usage pattern that caused the most trouble with glibc malloc was allocating a lot of small things, then freeing most (but not all) of them. If we know about when this behavior started, we could look for new patterns like that.

comment:2 Changed 7 years ago by Sebastian

Restarted gabelmoo on openbsd-malloc, we'll see what happens

comment:3 Changed 7 years ago by Sebastian

Looks like openbsd-malloc helped, after 5 days of running I'm still on less than 350MB of memory usage.

comment:4 Changed 7 years ago by Sebastian

Except now, just 4 hours later, gabelmoo is using almost 2GB of memory. Something is quite wrong.

comment:5 Changed 7 years ago by nickm

tcmalloc (in google performance tools) has a heap-profiling tool that is supposed to be able to tell you what's eating all the RAM. Maybe it could help here?

comment:6 Changed 7 years ago by Sebastian

I'll try that. Tho since my last statement here I didn't have to restart gabelmoo, as it behaved quite well memory-wise, even through the big traffic spike we had a few days ago. So I assume it's some external event triggering something.

comment:7 Changed 7 years ago by arma

Memory bloat happens, among other times, when you get a whole lot of connections (Tor allocates the memory, and then typically doesn't give it back). Linus was complaining about getting bombarded by connections. I wonder if those are related.

comment:8 Changed 7 years ago by arma

As a follow-up, I bet we could induce the memory bloat by octopusing a Tor relay. If we can make it reproducible, that would let us try out fixes too.

comment:9 Changed 7 years ago by Sebastian

My point was that during the past two weeks, I was bombarded, yet no unusual memory usage occurred

comment:10 Changed 7 years ago by nickm

Status: newneeds_information

comment:11 Changed 7 years ago by Sebastian

Earlier today, gabelmoo's memory usage again surged out of nowhere. At the same time some other dirauths had problems to. I suspect some very specific event triggers this :/

comment:12 Changed 7 years ago by rransom

We had a report on tor-talk of RAM bloat in a Tor bridge running on Windows, so this probably isn't dirauth-specific.

comment:13 Changed 7 years ago by nickm

Has anybody tried tcmalloc as a possible fix here?

comment:14 Changed 7 years ago by nickm

Keywords: tor-auth added

comment:15 Changed 7 years ago by nickm

Component: Tor Directory AuthorityTor

comment:16 Changed 7 years ago by arma

should we close this one? see also #7019. seems like this ticket appears periodically, and has the same issues, and gets closed after a while for the same reasons.

comment:17 Changed 6 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: unspecified

comment:18 Changed 4 years ago by arma

I continue to think we should close this.

(even though the issue, or at least a related one, is coming up again recently -- this ticket doesn't help us.)

comment:19 Changed 4 years ago by nickm

Resolution: worksforme
Status: needs_informationclosed
Note: See TracTickets for help on using tickets.