Opened 18 months ago

Last modified 7 months ago

#21354 new enhancement

Make Onionoo more memory-efficient

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Normal Keywords: metrics-2018
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

When I re-imported 2 years of bridge network statuses to fix #20994, I noticed that this is only possible when importing descriptors in batches of a few months. This is pretty bad, because it also means that we wouldn't be able to start a new Onionoo instance from scratch. And a new operator might not even figure out what's wrong and assume that Onionoo contains a serious bug. (Maybe not a totally unreasonable assumption.)

Today I found the reason for this problem: UptimeStatusUpdater stores sets of Long values of bridge network status publication times, and that simply does not scale for more than a few months of statuses. Unfortunately, we need to store all these publication times in memory, because opening and updating status files for each fingerprint contained in a status also does not scale, for a different reason.

But we can do better here. We can apply the same trick that we're using for storing relay flag strings, that is, resolve strings to indexes using a single, static map, and store indexes in a bitset rather than strings in a set. Here's a possible patch:

diff --git a/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java b/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
index 2b5f5fc..6280902 100644
--- a/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
+++ b/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
@@ -86,14 +86,18 @@ public class UptimeStatusUpdater implements DescriptorListener,
     }
   }
 
+  private static Map<Long, Integer> dateHourMillisToIndex = new HashMap<>();
+
+  private static Map<Integer, Long> dateHourIndexToMillis = new HashMap<>();
+
   private SortedMap<Long, Flags> newRelayStatuses = new TreeMap<>();
 
   private SortedMap<String, SortedMap<Long, Flags>>
       newRunningRelays = new TreeMap<>();
 
-  private SortedSet<Long> newBridgeStatuses = new TreeSet<>();
+  private BitSet newBridgeStatuses = new BitSet();
 
-  private SortedMap<String, SortedSet<Long>>
+  private SortedMap<String, BitSet>
       newRunningBridges = new TreeMap<>();
 
   private void processRelayNetworkStatusConsensus(
@@ -123,13 +127,18 @@ public class UptimeStatusUpdater implements DescriptorListener,
     if (!fingerprints.isEmpty()) {
       long dateHourMillis = (status.getPublishedMillis()
           / DateTimeHelper.ONE_HOUR) * DateTimeHelper.ONE_HOUR;
+      if (!dateHourMillisToIndex.containsKey(dateHourMillis)) {
+        dateHourIndexToMillis.put(dateHourIndexToMillis.size(), dateHourMillis);
+        dateHourMillisToIndex.put(dateHourMillis, dateHourMillisToIndex.size());
+      }
+      int dateHourIndex = dateHourMillisToIndex.get(dateHourMillis);
       for (String fingerprint : fingerprints) {
         if (!this.newRunningBridges.containsKey(fingerprint)) {
-          this.newRunningBridges.put(fingerprint, new TreeSet<Long>());
+          this.newRunningBridges.put(fingerprint, new BitSet());
         }
-        this.newRunningBridges.get(fingerprint).add(dateHourMillis);
+        this.newRunningBridges.get(fingerprint).set(dateHourIndex);
       }
-      this.newBridgeStatuses.add(dateHourMillis);
+      this.newBridgeStatuses.set(dateHourIndex);
     }
   }
 
@@ -140,17 +149,19 @@ public class UptimeStatusUpdater implements DescriptorListener,
       this.updateStatus(true, e.getKey(), e.getValue());
     }
     this.updateStatus(true, null, this.newRelayStatuses);
-    for (Map.Entry<String, SortedSet<Long>> e :
-        this.newRunningBridges.entrySet()) {
+    for (Map.Entry<String, BitSet> e : this.newRunningBridges.entrySet()) {
+      BitSet dateHourIndexes = e.getValue();
       SortedMap<Long, Flags> dateHourMillisNoFlags = new TreeMap<>();
-      for (long dateHourMillis : e.getValue()) {
-        dateHourMillisNoFlags.put(dateHourMillis, null);
+      for (int i = dateHourIndexes.nextSetBit(0); i >= 0;
+          i = dateHourIndexes.nextSetBit(i + 1)) {
+        dateHourMillisNoFlags.put(dateHourIndexToMillis.get(i), null);
       }
       this.updateStatus(false, e.getKey(), dateHourMillisNoFlags);
     }
     SortedMap<Long, Flags> dateHourMillisNoFlags = new TreeMap<>();
-    for (long dateHourMillis : this.newBridgeStatuses) {
-      dateHourMillisNoFlags.put(dateHourMillis, null);
+    for (int i = this.newBridgeStatuses.nextSetBit(0); i >= 0;
+        i = this.newBridgeStatuses.nextSetBit(i + 1)) {
+      dateHourMillisNoFlags.put(dateHourIndexToMillis.get(i), null);
     }
     this.updateStatus(false, null, dateHourMillisNoFlags);
   }

But let's not merge this yet, as it only solves this particular problem of importing lots of bridge network statuses, and maybe there's an ever better way to fix this.

Let's rather look at this (possible) improvement together with earlier improvements towards memory efficiency, and let's think about what remains to be done.

For example, should we avoid certain data structures (like TreeSet/TreeMap) and replace them with more memory-efficient data structures?

Are there other tricks we should use more consistently?

Child Tickets

Change History (4)

comment:1 Changed 18 months ago by iwakeh

But let's not merge this yet, as it only solves this particular problem of importing lots of bridge network statuses, and maybe there's an ever better way to fix this.

Agreed, that this can wait a while.

Let's rather look at this (possible) improvement together with earlier improvements towards memory efficiency, and let's think about what remains to be done.

Maybe, rather revisit this in our Berlin meeting, where there is already the topic of making Onionoo more operator friendly? Tweaks and tricks might not suffice in the long run.

comment:2 Changed 10 months ago by karsten

Keywords: metrics-2018 added

comment:3 Changed 10 months ago by karsten

Keywords: metrics-2017 added; metrics-2018 removed

comment:4 Changed 7 months ago by iwakeh

Keywords: metrics-2018 added; metrics-2017 removed

Will be completed in 2018.

Note: See TracTickets for help on using tickets.