Opened 3 years ago

Closed 3 years ago

#3406 closed defect (fixed)

High memory usage from parse_ns_body

Reported by: atagar Owned by:
Priority: normal Milestone:
Component: Torctl Version:
Keywords: Cc:
Actual Points: Parent ID:
Points:

Description

From earlier today...

09:16 < atagar> Figured out my memory issue. It's from calling TorCtl's get_network_status function which consumes 3.5 MB of memory at once.
09:16 < atagar> I immediately discard those results and the memory is freed, however because of how large objects are allocated in python 'ps' is never aware that the memory's been released:
09:16 < atagar> http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
09:16 < atagar> Fix: either ignore it (it's misreported memory usage) or implement my own less memory intensive version of get_network_status.
09:20 < mikeperry> atagar: does python *never* release said memory? in otherwords, do you experience this as a leak, or just a consistent 3.5MB of unneeded overhead?
09:20 < mikeperry> s/release/reuse
09:22 < atagar> I'm not quite sure how to tell the difference. If I call get_network_status and discard the results (ie, don't get a reference to the nslist object it returns) then ps reports that memory as being used for the life of the python process.
09:23 < atagar> though I'd imagine this memory would be reused by python for other activity when needed
09:25 < mikeperry> hopefully. there is definitely a leak in the bw authorities that is proving ridiculously hard to track down. we're down to it either being a python leak, an sqlite leak, or a python-sqlite bindings leak
09:26 < atagar> side note: the memory all comes from the NetworkStatus objects generated on line 1246 (ie, the problem is definitely because we have a list with all that memory allocated at one time, rather than the incremental results contributing to it)
09:27 < atagar> ie, if we were able to have an iterator pattern here then there would be no issue
09:28 < mikeperry> if you want to write an intermediate funtion that only gives you them one at a time, you can do that. just don't change the external behavior of the current function.
09:28 < mikeperry> I'd merge that
09:29 < atagar> Will do. I'm a little worried about a future change accidently making a get_network_status call, but I'll edit the comment to warn of this.
09:29 < atagar> btw, do you have any idea when you'll be able to look at the five torctl patches I sent on Monday?
09:30 * nickm is going to be semi- offline for most of today afternoon doing errands writing reviews and hacking. Hoping this is a good idea. :)
09:34 < atagar> mikeperry: I'll also fix update_consensus so it doesn't encounter this issue (no external change but will avoid new consensus events from triggering huge memory usage)

Child Tickets

Change History (6)

comment:1 Changed 3 years ago by atagar

  • Status changed from new to needs_review

comment:2 Changed 3 years ago by mikeperry

Any reason why you didn't also provide the getIterator option for get_consensus?

Incidentally, you didn't see the memory savings you expected in the ConsensusTracker because the list itself does not occupy a lot of memory. Think of a linked list of pointers: basically 4*length(list). The problem is that the ConsensusTracker actually keeps all of NetworkStatus objects around in memory in its ns_map, as well as all the Router objects corresponding to descriptors.

You may be able to get some memory savings by altering ConsensusTracker to optionally only track NetworkStatus documents in ns_map, and not store the sorted_r or routers structures there. If you decide to provide the ability to only store NetworkStatus objects, you should probably provide it by factoring out that code into a superclass rather than an option.

comment:3 Changed 3 years ago by mikeperry

FYI, all your other branches look good to merge, but I can't merge them easily because the commit for #3406 is in them for some reason.

comment:4 follow-up: Changed 3 years ago by atagar

Any reason why you didn't also provide the getIterator option for get_consensus?

Because I've never used it so didn't think of it.

The problem is that the ConsensusTracker actually keeps all of NetworkStatus objects

Gotcha. I was expecting for ns_map to contain a set of all NetworkStatus objects and for the following get_network_status to allocate a second set (so this change would halve the memory) but guess that's not the case - oh well.

I'm not personally concerned about the memory usage of the ConsensusTracker since I don't use it. I was just hoping that this would be a nice side effect.

FYI, all your other branches look good to merge, but I can't merge them easily because the commit for #3406 is in them for some reason.

Great. I'm not clear how you want this patch changed. It lowered the get_network_status memory usage dramatically so it's doing exactly what I'd intended.

Btw, the reason the patches are based on each other was to avoid merging headaches for you (and also let me test the end result over these last few days). Otherwise you'd be having conflicts with many of these patches.

If separate patches based on master is *really* what you want then you can still get this by cherry-picking the changes... and then doing the conflict resolution and retested you would have needed via that route. :P

comment:5 in reply to: ↑ 4 Changed 3 years ago by mikeperry

Replying to atagar:

FYI, all your other branches look good to merge, but I can't merge them easily because the commit for #3406 is in them for some reason.

Great. I'm not clear how you want this patch changed. It lowered the get_network_status memory usage dramatically so it's doing exactly what I'd intended.

Ok. I may just copy the change to get_consensus myself then.

Btw, the reason the patches are based on each other was to avoid merging headaches for you (and also let me test the end result over these last few days). Otherwise you'd be having conflicts with many of these patches.

If separate patches based on master is *really* what you want then you can still get this by cherry-picking the changes... and then doing the conflict resolution and retested you would have needed via that route. :P

Well, no, I don't want separate patches. But the one-branch-for-each-bug-but-also-several-bugs-in-each-branch madness is also causing conflicts now, because I merged some of them but not all, and I don't know which one to merge next.

Which branch contains all of them?

In the future, if fixes end up causing conflicts such that you have to each fix appear in multiple branches, just lable each commit with the bug number in the git shortdesc, and use only one branch.

comment:6 Changed 3 years ago by mikeperry

  • Resolution set to fixed
  • Status changed from needs_review to closed

Performed the changes to get_consensus, changed the param from getIterator to get_iterator and merged.

Note: See TracTickets for help on using tickets.