Opened 4 years ago

Closed 4 years ago

#13574 closed enhancement (fixed)

Tweak memory usage of hourly cronjob

Reported by: karsten Owned by:
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: iwakeh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

While attempting to set up an Onionoo mirror I ran into memory problems with the hourly updater.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2367)
	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:480)
	at java.lang.StringBuffer.append(StringBuffer.java:309)
	at java.lang.StringBuffer.append(StringBuffer.java:300)
	at java.util.regex.Matcher.appendReplacement(Matcher.java:841)
	at java.util.regex.Matcher.replaceAll(Matcher.java:906)
	at java.lang.String.replaceAll(String.java:2162)
	at org.torproject.onionoo.docs.DocumentStore.storeDocumentFile(DocumentStore.java:278)
	at org.torproject.onionoo.docs.DocumentStore.store(DocumentStore.java:228)
	at org.torproject.onionoo.writer.DetailsDocumentWriter.updateRelayDetailsFiles(DetailsDocumentWriter.java:184)
	at org.torproject.onionoo.writer.DetailsDocumentWriter.writeDocuments(DetailsDocumentWriter.java:72)
	at org.torproject.onionoo.writer.DocumentWriterRunner.writeDocuments(DocumentWriterRunner.java:29)
	at org.torproject.onionoo.cron.Main.main(Main.java:55)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2367)
	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
	at java.lang.StringBuffer.append(StringBuffer.java:237)
	at java.io.StringWriter.write(StringWriter.java:101)
	at org.apache.commons.lang.StringEscapeUtils.escapeJavaStyleString(StringEscapeUtils.java:196)
	at org.apache.commons.lang.StringEscapeUtils.escapeJavaStyleString(StringEscapeUtils.java:164)
	at org.apache.commons.lang.StringEscapeUtils.escapeJavaScript(StringEscapeUtils.java:131)
	at org.torproject.onionoo.docs.DetailsDocument.escapeJSON(DetailsDocument.java:21)
	at org.torproject.onionoo.docs.DetailsDocument.setContact(DetailsDocument.java:267)
	at org.torproject.onionoo.writer.DetailsDocumentWriter.updateRelayDetailsFiles(DetailsDocumentWriter.java:158)
	at org.torproject.onionoo.writer.DetailsDocumentWriter.writeDocuments(DetailsDocumentWriter.java:72)
	at org.torproject.onionoo.writer.DocumentWriterRunner.writeDocuments(DocumentWriterRunner.java:29)
	at org.torproject.onionoo.cron.Main.main(Main.java:55)

Neither stack trace looks like those operations would be using crazy amounts of memory, so I set up a cronjob that runs jcmd $pid GC.class_histogram once every minute. Here's the top 10 right before the JVM exited with that second stack trace:

 num     #instances         #bytes  class name
----------------------------------------------
   1:       8533341      463113472  [C
   2:       8533040      204792960  java.lang.String
   3:       3705980      148239200  java.util.TreeMap$Entry
   4:       2995129      143766192  java.util.TreeMap
   5:        593855      114020160  org.torproject.onionoo.docs.NodeStatus
   6:        604450       48686880  [Ljava.util.HashMap$Entry;
   7:       2390562       38248992  java.util.TreeSet
   8:        931813       29818016  java.util.HashMap$Entry
   9:        604451       29013648  java.util.HashMap
  10:        611484       14675616  java.lang.Long

From this profile it seems that NodeStatus would be a good candidate to save some memory. I'm attaching a branch with some memory tweaks to it as soon as I have a ticket number.

Child Tickets

Attachments (2)

classhist-2014-10-28.png (102.8 KB) - added by karsten 4 years ago.
Totals of top-20 classes in heap by occupied bytes
classhist-2014-11-04.png (87.3 KB) - added by karsten 4 years ago.
Totals of top-20 classes in heap by occupied bytes, updated

Download all attachments as: .zip

Change History (11)

comment:2 Changed 4 years ago by karsten

Here's another stack trace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2367)
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
        at java.lang.StringBuffer.append(StringBuffer.java:237)
        at java.io.StringWriter.write(StringWriter.java:101)
        at org.apache.commons.lang.StringEscapeUtils.escapeJavaStyleString(StringEscapeUtils.java:196)
        at org.apache.commons.lang.StringEscapeUtils.escapeJavaStyleString(StringEscapeUtils.java:164)
        at org.apache.commons.lang.StringEscapeUtils.escapeJavaScript(StringEscapeUtils.java:131)
        at org.torproject.onionoo.docs.DetailsDocument.escapeJSON(DetailsDocument.java:21)
        at org.torproject.onionoo.docs.DetailsDocument.setContact(DetailsDocument.java:267)
        at org.torproject.onionoo.writer.DetailsDocumentWriter.updateRelayDetailsFiles(DetailsDocumentWriter.java:158)
        at org.torproject.onionoo.writer.DetailsDocumentWriter.writeDocuments(DetailsDocumentWriter.java:72)
        at org.torproject.onionoo.writer.DocumentWriterRunner.writeDocuments(DocumentWriterRunner.java:29)
        at org.torproject.onionoo.cron.Main.main(Main.java:55)

Added another commit to that branch that avoids String.replaceAll() and uses StringUtils.replace() instead. Trying out that patch right now.

comment:3 Changed 4 years ago by iwakeh

I'll try to also take a closer look soon.

These escapes when generating Json format are 'features' of gson (afaik), but not really a protocol requirement. So, a more long term solution could be to replace gson with jackson or json.org; both are available for wheezy. And, maybe most string-replacement-tweaks could be avoided.

Changed 4 years ago by karsten

Attachment: classhist-2014-10-28.png added

Totals of top-20 classes in heap by occupied bytes

comment:4 Changed 4 years ago by karsten

Latest patch looks good so far, but earlier branches worked fine for a while before breaking, too. I just attached a graph with totals of top-20 classes in heap by occupied bytes. The red lines are timestamps of thrown exceptions. From the timing it seems like the next exception might be thrown within the next 12 hours or so. Let's see.

Regarding replacing Gson with something else, I agree that this would be a long-term solution. I'm actually happy with Gson, except for this particular problem which seems related to it. Though I never ran into this issue on the main Onionoo instance, just on the mirror. I'd rather not want to give up on Gson unless we're certain that it's the issue.

comment:5 Changed 4 years ago by iwakeh

Didn't have time for testing, but switching from commons-lang to commons-lang3 might be a good choice in addition to your patch.
The switch to commons-lang3 (lang3 changes) contained a complete rewrite of 'StringEscapeUtils'. And, I remember that I saw some good effect on using less 'String's during runtime in a former project.

Code changes would only be the import statements

 -import org.apache.commons.lang.StringEscapeUtils;
 -import org.apache.commons.lang.StringUtils;
 +import org.apache.commons.lang3.StringEscapeUtils;
 +import org.apache.commons.lang3.StringUtils;

classpath

 -      <include name="commons-lang-2.6.jar"/>
 +      <include name="commons-lang3-3.1.jar"/>

and

-    return StringUtils.replaceEach(StringEscapeUtils.escapeJavaScript(s),
+    return StringUtils.replaceEach(StringEscapeUtils.escapeEcmaScript(s),

as well as the corresponding unescapes (and the addition of the debian package).

Might be worth a try?
commons-lang is only used for un/escaping.

comment:6 Changed 4 years ago by karsten

Good idea. I'll try it out as soon as the Onionoo mirror has the Debian package installed.

comment:7 Changed 4 years ago by karsten

Trying it out now. New branch is task-13574-2 (removed an unnecessary import in an earlier commit).

Changed 4 years ago by karsten

Attachment: classhist-2014-11-04.png added

Totals of top-20 classes in heap by occupied bytes, updated

comment:8 Changed 4 years ago by karsten

No further problems on the mirror, see the newly attached graph. I just merged task-13574-2 into master. At least these patches don't seem to hurt, and maybe they even help.

comment:9 Changed 4 years ago by karsten

Resolution: fixed
Status: needs_reviewclosed

Looks good so far. Closing this ticket. If new problems show up or we come up with new patches, we can always create new tickets.

Note: See TracTickets for help on using tickets.