uncaught NFE and other bugs in weight status document processing

changed milestone to %Onionoo 3.1-1.0.0

added component::metrics/onionoo milestone::Onionoo 3.1-1.0.0 owner::iwakeh priority::high resolution::fixed severity::major status::closed type::defect labels

The data I used were the tars listed in #19249 (moved).

And, also scan for trys w/o a catch. The try-with-resources closes the resource, but if an exception was the reason for that it should be logged. (the above code excerpt is surrounded by a try-with-resources statement).

I see two tasks here:

checking for parsing statements that are not safeguarded against erroneous or unexpected input
check for trys w/o catch

maybe, have a general try-catch inside Main.run to log unexpected exceptions?

Good catch (no pun intended) and good suggestions there. I started looking through the code to find similar issues. Here's what I found:

diff --git a/src/main/java/org/torproject/onionoo/docs/BandwidthStatus.java b/src/main/java/org/torproject/onionoo/docs/BandwidthStatus.java
index 06f295b..9e4664c 100644
--- a/src/main/java/org/torproject/onionoo/docs/BandwidthStatus.java
+++ b/src/main/java/org/torproject/onionoo/docs/BandwidthStatus.java
@@ -62,7 +62,14 @@ public class BandwidthStatus extends Document {
               + "bandwidth history.  Skipping.");
           break;
         }
-        long bandwidth = Long.parseLong(parts[5]);
+        long bandwidth;
+        try {
+          bandwidth = Long.parseLong(parts[5]);
+        } catch (NumberFormatException e) {
+          log.error("Could not parse bandwidth value while reading "
+              + "bandwidth history.  Skipping.");
+          break;
+        }
         long previousEndMillis = history.headMap(startMillis).isEmpty()
             ? startMillis
             : history.get(history.headMap(startMillis).lastKey())[1];
diff --git a/src/main/java/org/torproject/onionoo/docs/WeightsStatus.java b/src/main/java/org/torproject/onionoo/docs/WeightsStatus.java
index 5d7f23e..b663fc4 100644
--- a/src/main/java/org/torproject/onionoo/docs/WeightsStatus.java
+++ b/src/main/java/org/torproject/onionoo/docs/WeightsStatus.java
@@ -67,13 +67,20 @@ public class WeightsStatus extends Document {
         }
         long[] interval = new long[] { validAfterMillis,
             freshUntilMillis };
-        double[] weights = new double[] { -1.0,
-            Double.parseDouble(parts[5]),
-            Double.parseDouble(parts[6]),
-            Double.parseDouble(parts[7]),
-            Double.parseDouble(parts[8]), -1.0, -1.0 };
-        if (parts.length == 11) {
-          weights[6] = Double.parseDouble(parts[10]);
+        double[] weights;
+        try {
+          weights = new double[] { -1.0,
+              Double.parseDouble(parts[5]),
+              Double.parseDouble(parts[6]),
+              Double.parseDouble(parts[7]),
+              Double.parseDouble(parts[8]), -1.0, -1.0 };
+          if (parts.length == 11) {
+            weights[6] = Double.parseDouble(parts[10]);
+          }
+        } catch (NumberFormatException e) {
+          log.error("Could not parse weights values in line '" + line
+              + "' while reading weights status file.  Skipping.");
+          break;
         }
         this.history.put(interval, weights);
       }

Of course, we'll also need to fix the parts where strings are written using the system locale rather than Locale.US.

And then we'll need unit tests for these. Would you be able to write one of those that breaks with the current code?

However, one thing I don't necessarily agree with: Not every try needs a catch, if we're only using the try as try-with-resources. For example, NumberFormatException is a runtime exception, and in some cases it's simply not possible for one of those to be thrown. So there may be valid cases for using try without catch. We might want to add a comment to those.

Yes, tests first is the right approach. These will identify the places where the error is home-made due to the unspecified locale.

I'll add all that to my todo-list.

The nfe was not the reason I wanted the catch-clause, I just noticed that in addition. Try-with-resources is used in order to make sure that a resource is closed properly. Resources can fail b/c of many reasons (full disk, weird input, etc.) and we should at least log the failures there, as they might be important for operation or bug detection.

Trac:
Status: new to accepted
Owner: N/A to iwakeh

Before reading through all code and documentation I ask:

Are the following lines acceptable WeightsStatus document lines. Currently they pass as such:

  private static final String[] correctLines = new String[]{
    "1970-01-02 22:00:00 1970-01-01 23:00:00  "
      + "0.000178279826 0.000261827632 0.000200304932 0.000000000000  6980.000000000000\n",
    "1970-02-01 22:00:00 1970-02-01 23:00:00  "
      + "-0.000178279826 -0.000261827632 0.000200304932 0.000000000000  -6980.000000000000\n",
    "1971-01-01 20:00:00 1971-01-01 10:00:00  "
      + "0.000178279826 0.000261827632 0.000200304932 0.000000000000  6980.000000000000\n"
  };

First line second date before first date, second line negative values, third line second time before first time on the same day?

Trac:
Status: accepted to needs_information

Fine questions! Let's see:

The first timestamp should not precede the second, though that might not be checked in the current code. This is independent of whether timestamps are on the same day or not, so should contain your lines 1 and 3. I could imagine that these lines don't work well with the compressHistory() method, but I didn't try that out. We could add a check to make sure these lines are rejected.
Negative values are fine and simply indicate that a value is missing. Normally, we'd put in -1.0, but any other negative value should do just fine. No need to break in such cases.

Note that I took a quick look at the code to answer these questions, but that doesn't guarantee that I'm right.

OK, I thought you could have a specification (maybe, on paper ;-) somewhere, that's why I asked :-)

Now, I basically just looked at the code and tried to infer what was intended. The resulting test draft is added for review. This is just concerned with WeightsStatus; locale issue follows.

Here some findings:

First, weights are only counted as missing if less than -.5:

        if (weights[i] < -0.5) {
          missingValues += 1 << i;
        }

See also testMissingValues. This should be moved into a separate method to make it testable. Does it really do what is intended?

There are design choices that could cause trouble:

The setHistory method could introduce a history set with a totally different comparator. As this method is not used anywhere, it can just be removed. Or, if it should be kept, just take the given history set and add the elements to the history after clearing it. But, better remove unused code.

The second more intricate issue is the Comparator itself:

  private SortedMap<long[], double[]> history =
      new TreeMap<long[], double[]>(new Comparator<long[]>() {
    public int compare(long[] a, long[] b) {
      return a[0] < b[0] ? -1 : a[0] > b[0] ? 1 : 0;
    }
  });

(As an aside, the comparison of long values should be accomplished with Long.compare() and analogously with other numbers.)

The used comparison is not consistent with 'equals', as no other measures are taken (except partially in addToHistory) this could lead to an inconsistent history list. The test compareTest prints an example (cf. the key value combination).

So, the comparator (and related code) needs to be revised in order to reflect all the assumptions that can be inferred from addToHistory and compressHistory. Here the beginning of such a list:

All history intervals at most overlap in one point or have an empty intersection. If the overlap contains more than a point the interval is rejected (should such a rejection be logged?).
History intervals are supposed to be a minimum size, which?
what else?

History compression: Here the condition for compressing, i.e. averaging respective values:

1: lastEndMillis == startMillis &&
2:          ((lastEndMillis - 1L) / intervalLengthMillis) ==
3:          ((endMillis - 1L) / intervalLengthMillis) &&
4:          lastMonthString.equals(monthString) &&
5:          lastMissingValues == missingValues

Looking more closely at lines 2 and 3. This condition evaluates to true, if endMillis - 1 == k *intervalLengthMillis + r_em and lastEndMillis - 1 == k * intervalLengthMillis + r_lem for a suitable natural Number k, and r_em, r_lem natural Numbers strictly less than intervalLengthMillis. With line 1 and the given that interval starts should be less than their ends (cf. wrong date test) this implies r_em >= r_lem and lines 2, 3 are equivalent to endMillis - lastEndMillis < intervalLengthMillis.

What did I overlook and what other assumptions are not reflected in the current code?

I didn't yet check. But, other comparator implementations and histories should be checked, too. All these could be the cause of inconsistencies. Add new trac issues for that?

Trac:
WeightsStatusTest.java

Replying to iwakeh:

OK, I thought you could have a specification (maybe, on paper ;-) somewhere, that's why I asked :-)

Ah, no, only in my head.

Now, I basically just looked at the code and tried to infer what was intended. The resulting test draft is added for review.

I noticed that some of those tests fail. Mind if I respond to your questions below and you figure out whether the code or the test is wrong?

This is just concerned with WeightsStatus; locale issue follows.

Here some findings:

First, weights are only counted as missing if less than -.5: {{{ if (weights[i] < -0.5) { missingValues += 1 << i; } }}} See also testMissingValues. This should be moved into a separate method to make it testable. Does it really do what is intended?

WeightsStatus is mostly an internal status class, and it seemed okay to use -1.0 to encode "missing value". The check for < -0.5 was to picked to avoid any trouble with double value comparisons, like < 0.0 or == -1.0. Under normal circumstances, values would never have any other negative value than -1.0. I'd say let's test this as is and then consider changing weights to Double[] to be able to store null values.

The missingValues += 1 << i part makes sure that we only combine two lines with the exact same pattern of missing values. We could have used a BitSet or anything else, but using bits in an int seemed okay here, also with regard to performance and avoiding the overhead of using an object there. Documentation would not have hurt, of course. ;) I'm not saying that this is perfect, but at least I believe the code does what it's intended to do.

There are design choices that could cause trouble:

The setHistory method could introduce a history set with a totally different comparator. As this method is not used anywhere, it can just be removed. Or, if it should be kept, just take the given history set and add the elements to the history after clearing it. But, better remove unused code.

Huh, I didn't know it's unused. Yes, unused code should just go away.

The second more intricate issue is the Comparator itself:

{{{ private SortedMap<long[], double[]> history = new TreeMap<long[], double[]>(new Comparator<long[]>() { public int compare(long[] a, long[] b) { return a[0] < b[0] ? -1 : a[0] > b[0] ? 1 : 0; } }); }}}

(As an aside, the comparison of long values should be accomplished with Long.compare() and analogously with other numbers.)

The used comparison is not consistent with 'equals', as no other measures are taken (except partially in addToHistory) this could lead to an inconsistent history list. The test compareTest prints an example (cf. the key value combination).

Yep, I can see us changing this towards using Long.compare() and also being more consistent with equals() and the like.

So, the comparator (and related code) needs to be revised in order to reflect all the assumptions that can be inferred from addToHistory and compressHistory. Here the beginning of such a list:

All history intervals at most overlap in one point or have an empty intersection. If the overlap contains more than a point the interval is rejected (should such a rejection be logged?).

This seems correct. I can't say whether a rejection should be logged, because sometimes it might be useful to simply attempt to add an interval and not have the code yell if that is not possible.

History intervals are supposed to be a minimum size, which?

Huh, why?

what else?

Fine question. Nothing comes to mind, but that might change when reviewing the test class more closely.

History compression: Here the condition for compressing, i.e. averaging respective values:

{{{ 1: lastEndMillis == startMillis && 2: ((lastEndMillis - 1L) / intervalLengthMillis) == 3: ((endMillis - 1L) / intervalLengthMillis) && 4: lastMonthString.equals(monthString) && 5: lastMissingValues == missingValues }}}

Looking more closely at lines 2 and 3. This condition evaluates to true, if endMillis - 1 == k *intervalLengthMillis + r_em and lastEndMillis - 1 == k * intervalLengthMillis + r_lem for a suitable natural Number k, and r_em, r_lem natural Numbers strictly less than intervalLengthMillis. With line 1 and the given that interval starts should be less than their ends (cf. wrong date test) this implies r_em >= r_lem and lines 2, 3 are equivalent to endMillis - lastEndMillis < intervalLengthMillis.

Errrr, can you give an example for when this could go wrong? :) Or was that a suggestion for rewriting that code to make it "even" more readable?

What did I overlook and what other assumptions are not reflected in the current code?

Again, can't say off the top of my head.

I didn't yet check. But, other comparator implementations and histories should be checked, too. All these could be the cause of inconsistencies. Add new trac issues for that?

WeightsStatus "shares some code" with other classes in the same package. The reason is that it was easier to take an existing class and change it as needed than generalizing and writing subclasses. I'd say let's clean up WeightsStatus first and then see what changes we can adapt in other classes. In theory, by simplifying WeightsStatus, we'll get closer to a state where we can generalize classes and put common parts in a superclass. But let's not aim for that just yet, if that's okay for you?

Thanks for working your way through all this! First some responses and below some notes about the tests that might save you some time reviewing the test class.

... WeightsStatus is mostly an internal status class, and it seemed okay to use -1.0 to encode "missing value". The check for < -0.5 was to picked to avoid any trouble with double value comparisons, like < 0.0 or == -1.0. Under normal circumstances, values would never have any other negative value than -1.0. I'd say let's test this as is and then consider changing weights to Double[] to be able to store null values.

The missingValues += 1 << i part makes sure that we only combine two lines with the exact same pattern of missing values. We could have used a BitSet or anything else, but using bits in an int seemed okay here, also with regard to performance and avoiding the overhead of using an object there. Documentation would not have hurt, of course. ;) I'm not saying that this is perfect, but at least I believe the code does what it's intended to do.

I'm not concerned about the bit approach, that's fine. The condition should be tested; indirectly this is suggested in testMissingValues, so putting it into a separate method might not be necessary.

Regarding the symbol for missing values: Double.Nan is a double, so it could be used here; but of course this needs some adaptions in the methods that convert to doc string.

This seems correct. I can't say whether a rejection should be logged, because sometimes it might be useful to simply attempt to add an interval and not have the code yell if that is not possible. Maybe logging it in debug level will help with future troubleshooting?

History intervals are supposed to be a minimum size, which?

Huh, why?

I'll get back to this question later.

Errrr, can you give an example for when this could go wrong? :) Or was that a suggestion for rewriting that code to make it "even" more readable?

The latter :-) but I also wanted to make sure that it describes what was intended once upon the time.

WeightsStatus "shares some code" with other classes in the same package. The reason is that it was easier to take an existing class and change it as needed than generalizing and writing subclasses. I'd say let's clean up WeightsStatus first and then see what changes we can adapt in other classes. In theory, by simplifying WeightsStatus, we'll get closer to a state where we can generalize classes and put common parts in a superclass. But let's not aim for that just yet, if that's okay for you?

I think, we should limit this for the first release to verifying other comparator implementations and singling out unused methods.

Test class commentary

testSetup for warm-up, passes

testWeightHistory fails, document contains NaN after compression

testSetFromDocumentString passes

testMissingValues fails, b/c the small negative values are not recognized as missing

testSetFromDocumentStringWrongInput fails, error b/c of nfe

testSetFromDocumentStringWrongDateTime failes, b/c line with later beginning than end is not rejected.

testCompare fails, b/c there is an entry with the key from the first and values from the second element.

Sorry for the long delay in responding. This patch is not easy to review for me, because we're using somewhat different approaches to writing unit tests. Can we discuss those approaches first before going into the details of reviewing and merging this particular test class? I think there's a benefit for future test classes here.

I'm having a hard time connecting the test data at the beginning of the class with the methods where they're used. I see how it's useful to separate the two, but may I suggest that test data preceding actual tests should be self-explanatory without having to read subsequent tests? For example, the name compareLines doesn't really make much sense to me without reading the corresponding test method. And correctLines and correctHistory could be described a little bit better.
I prefer tests to be as small as possible and not mix different test scenarios. For example, rather than stuffing four (related) test input lines into wrongDateLines and commenting them there, I'd probably write four test methods with self-explanatory method names that simply include the test data themselves. That would also solve the previous aspect where test data preceding test methods is not self-explanatory to the reader.
Most of the test data lines violate the code style by being longer than 70 or 80 or whatever we said characters. And I think there should be a space in new String[]{, but I'm not sure. And there are missing newlines between variable declarations.
To me, test method names should be self-explanatory, even if that makes them rather lengthy. Rather than testSetup, I'd prefer something like testEmptyWeightsStatus.

There, these are some of the things I'd do differently. I can be convinced that your approach has advantages over mine. Or maybe some of these suggestions make sense to you and you can send me an updated patch? Happy to look again, and, as always, thanks for your hard work!

Trac:
Status: needs_information to needs_revision

I didn't phrase my question here clearly, sorry. The issue was/is set to needs information intentionally.

So the tests do not constitute a patch (that's why there a bit sloppy). I wanted information about the assumptions expressed in the tests, and then intended to write the real test patch, which will be provided as a branch from my repo. I agree with what you point out above.

Is it possible for you to make out the tests content/assumptions and indicate what makes sense and what should be different? What added?

Trac:
Status: needs_revision to needs_information

Oh, sorry for misunderstanding. Let me reply to this comment instead:

Replying to iwakeh:

Test class commentary

testSetup for warm-up, passes

Sounds good. :)

testWeightHistory fails, document contains NaN after compression

It seems there are two problems with this test case:

Intervals should be at least one hour long for the internal math to work. This is unfortunate, though the assumption is realistic for real data. But let's relax this requirement and accept intervals shorter than 1 hour. A plausible minimum would be 1 second, because we cannot represent anything shorter than that.
Entries are only compressed if they are a given number of days in the past, calculated from the current system time. A better approach would be to compress based on the end time of the most recent history entry to remove yet another dependency on system time. For the test to work you'll have to date entries back by 1 year or maybe even more.

testSetFromDocumentString passes

Okay.

testMissingValues fails, b/c the small negative values are not recognized as missing

How about we represent missing values on disk as empty string (""), which also saves disk space, and in memory as -1.0 to keep the current code working?

testSetFromDocumentStringWrongInput fails, error b/c of nfe

Right, let's fix that nfe.

testSetFromDocumentStringWrongDateTime failes, b/c line with later beginning than end is not rejected.

Right, let's reject those lines. That means additional tests, but those are cheap and might make the code more robust.

testCompare fails, b/c there is an entry with the key from the first and values from the second element.

Let's reject those lines, too.

Is that more what you had in mind?

That was the information I was intending to elicit :-) Thanks!

Just a few things left:

How about we represent missing values on disk as empty string (""), which also saves disk space, and in memory as -1.0 to keep the current code working?

I still would prefer Double.NaN (cf. comment:9) as being more obvious in code/in memory. On disk it could be the empty string even though not very nice, but it's used already anyway.

testCompare fails, b/c there is an entry with the key from the first and values from the second element.

Let's reject those lines, too. It's not that easy here; the test demonstrates would could happen b/c of the comparator not being compatible with equal. So the missing part is an ordering for the intervals. The only one I can think of is the lexicographical ordering according to interval ends, e.g., [10, 20] before [10, 21] before [12, 25]. That should capture the intend of the current approach?

Replying to iwakeh:

That was the information I was intending to elicit :-) Thanks!

Just a few things left:

How about we represent missing values on disk as empty string (""), which also saves disk space, and in memory as -1.0 to keep the current code working?

I still would prefer Double.NaN (cf. comment:9) as being more obvious in code/in memory.

Okay, let's use Double.NaN then.

On disk it could be the empty string even though not very nice, but it's used already anyway.

Okay. Well, the nice part is that it saves disk space. We'll just need to make it compatible with existing status files that contain negative doubles.

testCompare fails, b/c there is an entry with the key from the first and values from the second element.

Let's reject those lines, too. It's not that easy here; the test demonstrates would could happen b/c of the comparator not being compatible with equal. So the missing part is an ordering for the intervals. The only one I can think of is the lexicographical ordering according to interval ends, e.g., [10, 20] before [10, 21] before [12, 25]. That should capture the intend of the current approach?

Okay, sounds good to me.

Thanks for trying hard to understand this code!

All questions answered => assign

Trac:
Status: needs_information to assigned

Trac:
Milestone: N/A to Onionoo 3.1.1

Milestone renamed

Trac:
Milestone: Onionoo 3.1.1 to Onionoo 3.1-1.0.0

Please find my branch for review.

It has four commits on top of the current master. Two tweaking the build environment, one for the tests and another for making the new tests pass.

One question: Should negative values still be recognized as missing values (cf. comment:15)?

Trac:
Status: assigned to needs_review
Summary: uncaught NFE to uncaught NFE and other bugs in weight status document processing

Replying to iwakeh:

Please find my branch for review.

It has four commits on top of the current master. Two tweaking the build environment, one for the tests and another for making the new tests pass.

Looks good. I made a few tweaks in fixup commits in my task-19259 branch. Please take a look, and if you like them, I'll squash them into yours.

One question: Should negative values still be recognized as missing values (cf. comment:15)?

Uhhhhhm, do you mind rephrasing that question and giving a bit more context? I could re-read the comments above, but maybe you can explain this in 2 or 3 sentences? Thanks!

Replying to karsten:

Replying to iwakeh:

Please find my branch for review.

It has four commits on top of the current master. Two tweaking the build environment, one for the tests and another for making the new tests pass.

Looks good. I made a few tweaks in fixup commits in my task-19259 branch. Please take a look, and if you like them, I'll squash them into yours.

All fine. Thanks for checking.

One question: Should negative values still be recognized as missing values (cf. comment:15)?

Uhhhhhm, do you mind rephrasing that question and giving a bit more context? I could re-read the comments above, but maybe you can explain this in 2 or 3 sentences? Thanks!

I'm concerned about deployment: When making this code run on an existing Onionoo server, will it encounter old weight status docs (i.e. with negative values for missing entries) that need to be processed?

Replying to iwakeh:

I'm concerned about deployment: When making this code run on an existing Onionoo server, will it encounter old weight status docs (i.e. with negative values for missing entries) that need to be processed?

Oh, the new code certainly has to be able to process existing data. Would you want to have a dump of the status/ directory of the deployed Onionoo instance or a small subset thereof to try this out?

That'll be great!

https://people.torproject.org/~karsten/volatile/onionoo-status-weights-F-F.tar.bz2

Replying to karsten:

https://people.torproject.org/~karsten/volatile/onionoo-status-weights-F-F.tar.bz2

Thanks, downloaded all.

Please review two commits on top of your branch (comment:20).

Old docs using -1.000000000000 for missing values are read correctly.

Awesome, thanks for trying this out. Please find my task-19259-2 branch with a trivial fix to the unit test data set. If that looks okay to you, I'll merge.

Replying to karsten:

Awesome, thanks for trying this out. Please find my task-19259-2 branch with a trivial fix to the unit test data set. If that looks okay to you, I'll merge.

Thanks for catching the test data line endings! I forgot the commit.

Ready for merge.

Trac:
Status: needs_review to merge_ready

Okay! Merged to master. Closing. Thanks!

Trac:
Status: merge_ready to closed
Resolution: N/A to fixed

closed

uncaught NFE and other bugs in weight status document processing

Child items 0

Activity