Opened 3 years ago

Closed 3 years ago

#20525 closed defect (fixed)

Be more careful when deleting extraneous local descriptor files

Reported by: karsten Owned by: karsten
Priority: Medium Milestone: metrics-lib 1.6.0
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: iwakeh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

As found in this ticket, DescriptorIndexCollector deletes descriptor files from a previous or concurrent collect run if it doesn't collect those files itself. This is unexpected behavior and differs from what DescriptorCollectorImpl does. Let's fix this.

Here's a minimal example (taken from that other ticket):

    DescriptorCollector dc =
        DescriptorSourceFactory.createDescriptorCollector();
    dc.collectDescriptors("https://collector.torproject.org",
        new String[] { "/recent/torperf/" }, 0L, new File("sync"), true);
    dc.collectDescriptors("https://collector.torproject.org",
        new String[] { "/recent/exit-lists/" }, 0L, new File("sync"), true);

This is the (annotated) output of running this code (1) with the default DescriptorIndexCollector, (2) with DescriptorCollectorImpl, and (3) a fixed version of DescriptorIndexCollector:

$ java -cp bin/:lib/descriptor-1.5.0.jar:lib/slf4j-api-1.7.7.jar:lib/commons-compress-1.9.jar:lib/gson-2.2.4.jar Main
$ du -hs sync/recent/*
 14M	sync/recent/exit-lists
  0B	sync/recent/torperf        # <- deleted, bad!
$ rm -rf sync/
$ java -cp bin/:lib/descriptor-1.5.0.jar:lib/slf4j-api-1.7.7.jar:lib/commons-compress-1.9.jar:lib/gson-2.2.4.jar -Ddescriptor.collector=org.torproject.descriptor.impl.DescriptorCollectorImpl Main
$ du -hs sync/recent/*
 14M	sync/recent/exit-lists
3.0M	sync/recent/torperf        # <- not deleted, good!
$ java -cp bin/:lib/descriptor-1.5.0-dev.jar:lib/slf4j-api-1.7.7.jar:lib/commons-compress-1.9.jar:lib/gson-2.2.4.jar Main
$ du -hs sync/recent/*
 14M	sync/recent/exit-lists
3.0M	sync/recent/torperf        # <- not deleted, good!

Possible patch used in (3) above:

diff --git a/src/main/java/org/torproject/descriptor/index/DescriptorIndexCollector.java b/src/main/java/org/torproject/descriptor/index/DescriptorIndexCollector.java
index d32f811..daef379 100644
--- a/src/main/java/org/torproject/descriptor/index/DescriptorIndexCollector.java
+++ b/src/main/java/org/torproject/descriptor/index/DescriptorIndexCollector.java
@@ -70,7 +70,8 @@ public class DescriptorIndexCollector implements DescriptorCollector {
     this.fetchRemoteFiles(index.path, remoteFiles, minLastModified,
         localDirectory, localFiles);
     if (deleteExtraneousLocalFiles) {
-      this.deleteExtraneousLocalFiles(remoteFiles, localDirectory, localFiles);
+      deleteExtraneousLocalFiles(remoteDirectories, remoteFiles, localDirectory,
+          localFiles);
     }
   }
 
@@ -108,12 +109,16 @@ public class DescriptorIndexCollector implements DescriptorCollector {
     }
   }
 
-  static void deleteExtraneousLocalFiles(
+  static void deleteExtraneousLocalFiles(String[] remoteDirectories,
       SortedMap<String, FileNode> remoteFiles,
       File localDir, SortedMap<String, Long> locals) {
     for (String localPath : locals.keySet()) {
-      if (!remoteFiles.containsKey(localPath)) {
-        new File(localDir, localPath).delete();
+      for (String remoteDirectory : remoteDirectories) {
+        if (localPath.startsWith(remoteDirectory)) {
+          if (!remoteFiles.containsKey(localPath)) {
+            new File(localDir, localPath).delete();
+          }
+        }
       }
     }
   }

I briefly thought about an alternative fix where we look at the actual index.json contents to decide whether a local descriptor file that we didn't care about in the current collect run would be safe to delete or not. After all, we have that information, so we could as well use it.

The two arguments against it are (1) that we'd change the previous behavior of DescriptorCollector in a somewhat backward-incompatible way and (2) that this code might be smarter than developers might think it is and hence confuse them in a bad way.

I think I'd rather go with something like the patch above.

Child Tickets

Change History (5)

comment:1 Changed 3 years ago by iwakeh

Cc: iwakeh added

comment:2 Changed 3 years ago by karsten

Milestone: metrics-lib 1.6.0

Let's try to fix this in 1.6.0.

comment:3 Changed 3 years ago by karsten

Status: newneeds_review

The patch above still seems like a good idea. Please review my branch task-20525 that even extends a test to cover this changed behavior.

comment:4 Changed 3 years ago by iwakeh

Status: needs_reviewmerge_ready

Looks fine and removes this indeed 'unexpected behavior' as mentioned in the commit message.
Passes tests (has the checkstyle complaints, which are fixed in a different ticket).
The new tests documents the wanted behaviour sufficiently.
Ready for merge.

comment:5 Changed 3 years ago by karsten

Resolution: fixed
Status: merge_readyclosed

Great! Merged to master, closing. Thanks!

Note: See TracTickets for help on using tickets.