Opened 3 years ago

Closed 3 years ago

#20521 closed enhancement (fixed)

Deprecate `DescriptorReader.setExcludeFiles()` and add two separate methods for loading and saving a history file

Reported by: karsten Owned by: karsten
Priority: Medium Milestone: metrics-lib 1.6.0
Component: Metrics/Library Version:
Severity: Normal Keywords: metrics-help
Cc: iwakeh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


The history file implementation in DescriptorReader has a design bug for much too long that I'd finally want to fix.

The issue is that DescriptorReader writes the history file passed in setExcludeFiles() immediately after reading and parsing the last descriptor and putting it into the queue, regardless of whether the application has finished processing those descriptors.

If the application fails after the history file is written, it may not be able to process descriptors in the next execution that have still been in the queue at the time of failing.

One way to reduce that gap would be to have BlockingIteratorImpl tell DescriptorReader when the application has first learned that hasNext() returned false or that next() threw a NoSuchElementException. The benefit of that solution would be that no interface change would be required. The downside would be that the application might not have fully cleaned up at the time of learning that there are no further descriptors available. In one example, the application would close a database import file, perform the database import, and only write the history file after learning that the database import has succeeded.

A better solution would be to decouple saving the history file to disk from completing the descriptor read operation. We would replace the setExcludeFiles() method by a setHistoryFile() and a saveHistoryFile() method. Applications would use setHistoryFile() before starting to read descriptors, process all descriptors, perform any cleaning up, and then call saveHistoryFile().

Here's an example of the code that this would save in applications that currently work around this known design bug by loading and saving history files themselves:

diff --git a/modules/hidserv/src/org/torproject/metrics/hidserv/ b/modules/hidserv/src/org/torproject/metrics/hidserv/
index 2ef404e..215b0c9 100644
--- a/modules/hidserv/src/org/torproject/metrics/hidserv/
+++ b/modules/hidserv/src/org/torproject/metrics/hidserv/
@@ -96,33 +96,7 @@ public class Parser {
   public void readParseHistory() {
     if (this.parseHistoryFile.exists()
         && this.parseHistoryFile.isFile()) {
-      SortedMap<String, Long> excludedFiles =
-          new TreeMap<String, Long>();
-      try {
-        BufferedReader br = new BufferedReader(new FileReader(
-            this.parseHistoryFile));
-        String line;
-        while ((line = br.readLine()) != null) {
-          try {
-            /* Each line is supposed to contain the last-modified time and
-             * absolute path of a descriptor file. */
-            String[] parts = line.split(" ", 2);
-            excludedFiles.put(parts[1], Long.parseLong(parts[0]));
-          } catch (NumberFormatException e) {
-            System.err.printf("Illegal line '%s' in parse history.  "
-                + "Skipping line.%n", line);
-          }
-        }
-        br.close();
-      } catch (IOException e) {
-        System.err.printf("Could not read history file '%s'.  Not "
-            + "excluding descriptors in this execution.",
-            this.parseHistoryFile.getAbsolutePath());
-      }
-      /* Tell the descriptor reader to exclude the files contained in the
-       * parse history file. */
-      this.descriptorReader.setExcludedFiles(excludedFiles);
+      this.descriptorReader.setHistoryFile(this.parseHistoryFile);
@@ -130,33 +104,7 @@ public class Parser {
    * and absolute paths to the parse history file to avoid parsing these
    * files again, unless they change until the next execution. */
   public void writeParseHistory() {
-    /* Obtain the list of descriptor files that were either parsed now or
-     * that were skipped in this execution from the descriptor reader. */
-    SortedMap<String, Long> excludedAndParsedFiles =
-        new TreeMap<String, Long>();
-    excludedAndParsedFiles.putAll(
-        this.descriptorReader.getExcludedFiles());
-    excludedAndParsedFiles.putAll(this.descriptorReader.getParsedFiles());
-    try {
-      this.parseHistoryFile.getParentFile().mkdirs();
-      BufferedWriter bw = new BufferedWriter(new FileWriter(
-          this.parseHistoryFile));
-      for (Map.Entry<String, Long> e
-          : excludedAndParsedFiles.entrySet()) {
-        /* Each line starts with the last-modified time of the descriptor
-         * file, followed by its absolute path. */
-        String absolutePath = e.getKey();
-        long lastModifiedMillis = e.getValue();
-        bw.write(String.valueOf(lastModifiedMillis) + " " + absolutePath
-            + "\n");
-      }
-      bw.close();
-    } catch (IOException e) {
-      System.err.printf("Could not write history file '%s'.  Not "
-          + "excluding descriptors in next execution.",
-          this.parseHistoryFile.getAbsolutePath());
-    }
+    this.descriptorReader.saveHistoryFile(this.parseHistoryFile);
   /** Set of all reported hidden-service statistics.

I'll push a branch to my repository in a moment.

Child Tickets

Change History (12)

comment:1 Changed 3 years ago by karsten

Status: newneeds_review

Please review my branch task-20521.

comment:2 Changed 3 years ago by iwakeh

Cc: iwakeh added

cc'ed myself in order to have this on my list.

comment:3 Changed 3 years ago by iwakeh

Please find a commit on top of your branch with some modernization (reduced number of lines) and a checkstyle complaint removed.

Wouldn't this be the right time to add a test class for DescriptorReader(Impl)?

comment:4 in reply to:  3 Changed 3 years ago by karsten

Status: needs_reviewnew

Replying to iwakeh:

Please find a commit on top of your branch with some modernization (reduced number of lines) and a checkstyle complaint removed.

Oops, I totally missed that checkstyle complaint. But I think your fix only removes the complaint and lets Javadoc think that the @deprecated line is yet another paragraph, which is not what we want. Indenting the two lines below that line is the better fix. Removed that part from your commit and added a fixup commit to remove the checkstyle warning.

I also found that you're using Files methods that are introduced in Java 8, but we're still at Java 7. Well, Eclipse found that for me. I added another fixup commit for those. Can you configure your IDE to use the Java 7 SE library? (But okay, I won't say anything after complaining about unresolved checkstyle warnings in other patches and then overlooking one myself, heh.)

Please find my updated task-20521 branch.

Wouldn't this be the right time to add a test class for DescriptorReader(Impl)?

Hmmmmmmmm, yes. Setting this ticket to new and not merging yet until we have such a test class.

comment:5 Changed 3 years ago by iwakeh

Oops, I was in fact relying on the javac-task catching it, after all source and target are set to 1.7.
Please find the changes to java 7 here.
Will look into that java versioning issue.

comment:6 Changed 3 years ago by karsten

Hmm, are you sure that the smaller fixes in my branch are insufficient for Java 7 compatibility?

comment:7 in reply to:  6 Changed 3 years ago by iwakeh

Replying to karsten:

Hmm, are you sure that the smaller fixes in my branch are insufficient for Java 7 compatibility?

That branch is fine java 7. Just ignore my comment:5 (I need more coffee ;-)

comment:8 Changed 3 years ago by karsten

Haha, okay. In that case we're back at needing unit tests for the DescriptorReader/DescriptorReaderImpl before merging my (squashed) task-20521 branch.

comment:9 Changed 3 years ago by iwakeh

Keywords: metrics-help added

These tests would also be a good start for a volunteer. Thus, adding the keyword.

comment:10 Changed 3 years ago by karsten

Status: newneeds_review

Please review my updated task-20521 branch with another fixup commit and the requested tests.

comment:11 Changed 3 years ago by iwakeh

Looks fine.

I added another test and cure for a corrupt history file.
Please see my branch task-20521-3.

comment:12 Changed 3 years ago by karsten

Resolution: fixed
Status: needs_reviewclosed

Looks good! Rebased, squashed, merged, pushed, closing! Thanks!

Note: See TracTickets for help on using tickets.