Opened 5 years ago

Closed 5 years ago

#12651 closed defect (fixed)

Details documents of non-running nodes may contain "running":true

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Onionoo Version:
Severity: Keywords:
Cc: iwakeh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The "running" field of details documents is not always updated correctly. Example:

https://onionoo.torproject.org/details?fields=running&running=false&limit=10

{"relays_published":"2014-07-18 13:00:00",
"relays":[
{"running":false},
{"running":true},
{"running":true},
{"running":true},
{"running":false},
{"running":false},
{"running":true},
{"running":true},
{"running":false},
{"running":true}
],
"bridges_published":"2014-07-18 12:37:04",
"bridges":[
]}

Here's how this happens: we only write details JSON documents if the node published a descriptor or was listed in a consensus published by the directory authorities. That's the general design for writing JSON documents, and it saves us from updating JSON documents that haven't changed. However, if a node drops out of the consensus, it's not contained in any descriptor, hence we won't rewrite its details JSON document.

I came up with an idea for a new approach that also rewrites the details JSON documents of relays that dropped out of the consensus: all JSON documents are based on the contents of internal status files, and we can keep record of which status files are updated in an execution. If we include whether a node is running in a node status file, we notice when the running state changed and can rewrite the details JSON document. I have a branch with this change, but I'm not brave enough to deploy it, because it has zero unit test coverage. I'll either do another careful review or include this code in unit tests.

Child Tickets

Change History (12)

comment:1 in reply to:  description Changed 5 years ago by karsten

Status: newneeds_review

comment:2 Changed 5 years ago by karsten

This bug doesn't only affect the "running" field, but also others:

https://onionoo.torproject.org/details?fields=guard_probability&running=false&limit=10

{"relays_published":"2014-07-19 11:00:00",
"relays":[
{},
{"guard_probability":0.0},
{"guard_probability":0.0},
{"guard_probability":0.0},
{},
{},
{"guard_probability":5.421985E-6},
{},
{},
{"guard_probability":0.0}
],
"bridges_published":"2014-07-19 10:37:05",
"bridges":[
]}

In theory, the "guard_probability" field should be "Omitted if the relay is not running", says the spec. In short, the details JSON documents of non-running relays must be rewritten.

comment:3 Changed 5 years ago by karsten

#13408 has more details on this issue. Just closed that one as duplicate of this ticket.

comment:4 Changed 5 years ago by karsten

Cc: iwakeh added

I refactored something, and besides resulting in much cleaner code, the refactored code should fix this bug. I'm going to run the refactored code in a local VM for a bit to see if it breaks apart. But I sure wouldn't mind some code review of my branch task-12651-2.

(This branch is unrelated to my branch task-12651 which took a different approach that turned out to be less flexible than what I did in task-12651-2 today. Not going into too many details here, but task-12651-2 is also a building block in improving bulk imports, because it allows us to update status files and write document files in separate runs.)

comment:5 Changed 5 years ago by iwakeh

I didn't do test runs, but finally got some time for reading through the changes.

Three things I want to mention:

Naming

The purpose of method UpdateStatus.fromDocumentString
would be easier to infer if it was named setFromDocumentString.

Without the prefix 'set' I would expect to see some return value other than void.

Equals and Gson Output

The equals method in SummaryDocument relies on the "deterministic"
output of Gson.toJson.
As the Json protocol does not impose a particular order of fields, this behavior is not guaranteed.
It could change with the current implementation depending on circumstances,
that might not be obvioulsly related, or it could change with some
future update. When the output of Gson.toJson differs for the
same input we could see weird consequences in DocumentStore.storeSummaryDocument.

I would prefer a real equals method that compares all the relevant fields.
This might be also performing better than the Gson-approach. Any decent
IDE should generate these equals methods, so not too much typing ;-)
(Well, but of course the short solution looks more elegant.)

It seems currently Gson.toJson reliably outputs the same
Json String for the same input. So, if there is some reason for having
an equals method like this, there should at least be some test cases in
order to confirm this behavior.

Testing

Are there any explicit test cases for the new functionality?
I did only find adaptions, but no new tests.

comment:6 in reply to:  5 Changed 5 years ago by karsten

Replying to iwakeh:

I didn't do test runs, but finally got some time for reading through the changes.

Thanks for reviewing!

Three things I want to mention:

Naming

The purpose of method UpdateStatus.fromDocumentString
would be easier to infer if it was named setFromDocumentString.

Without the prefix 'set' I would expect to see some return value other than void.

Indeed. I just changed this in a new branch based on master, because it's a good idea. I deployed it on the server and will push to master if it doesn't break within the next hour.

Equals and Gson Output

The equals method in SummaryDocument relies on the "deterministic"
output of Gson.toJson.
As the Json protocol does not impose a particular order of fields, this behavior is not guaranteed.
It could change with the current implementation depending on circumstances,
that might not be obvioulsly related, or it could change with some
future update. When the output of Gson.toJson differs for the
same input we could see weird consequences in DocumentStore.storeSummaryDocument.

I would prefer a real equals method that compares all the relevant fields.
This might be also performing better than the Gson-approach. Any decent
IDE should generate these equals methods, so not too much typing ;-)
(Well, but of course the short solution looks more elegant.)

It seems currently Gson.toJson reliably outputs the same
Json String for the same input. So, if there is some reason for having
an equals method like this, there should at least be some test cases in
order to confirm this behavior.

Huh, good point. Fortunately, in a later commit that I didn't push yet, I took out the equals() comparison and did something even smarter. But you're indeed right that implementing equals() like I did here was not a good idea.

Testing

Are there any explicit test cases for the new functionality?
I did only find adaptions, but no new tests.

Tests are the reason why I didn't push the other changes I'm talking about above. :) The other reason is that I wanted to merge #13673 first. Now that this is done, I should rebase what I have to master, look into unit tests again, and push the result for another review round. This might not happen this week, but I'm working on it.

Again, thanks for the review! Much appreciated.

comment:7 Changed 5 years ago by karsten

I finally found some time to rebase my branch to master: https://gitweb.torproject.org/user/karsten/onionoo.git/shortlog/refs/heads/task-12651-3. It has three commits:

  • 09d7774 should be very similar to the commit you reviewed in task-12651-2. (Maybe confirm that by comparing commits.)
  • e789855 is pretty big, but it greatly simplifies current complexity between node statuses and details statuses. It's also an important improvement towards supporting bulk imports better.
  • 61817b9 is an attempt to unit-test the changes in e789855, but it's not finished.

Would you want to take a look at this branch and maybe even complete the unit tests?

I'm starting some local test runs now. If nothing breaks horribly, I might feel brave enough to deploy the branch on the mirror later this week.

comment:8 Changed 5 years ago by iwakeh

I might have time on the weekend to take a closer look at the changes.

If local tests succeed, good luck on the mirror!

comment:9 Changed 5 years ago by karsten

Local tests did succeed, and tests on the mirror revealed a couple more bugs. I'm now running the latest task-12651-3 branch on the mirror which looks pretty good.

I'd like to merge to master (without the temp commit with unit tests) and deploy on the main instance this week. Please let me know if you think this is a bad idea.

comment:10 in reply to:  9 Changed 5 years ago by iwakeh

You did do quite some testing on the mirror. So, it should be ok to merge to master and deploy.

One issue I see is the "hostname-hack":
Is there a way to avoid the unnecessary additional lookup (after an update
to the new format) computationally rather than by changing the code after the first run?

Other than that I only have some minor things and coding style issues which could be implemented at some point later.

a) In NodeStatus:124

     if (orAddressAndPort.contains(":") &&
            orAddressAndPort.length() > 0)

wouldn't
if (orAddressAndPort.contains(":"))
suffice?

b) NodeStatus:432

      ...
      written = 0;
      for (String orAddressAndPort : this.orAddressesAndPorts) {
        sb.append((written++ > 0 ? "+" : "") + orAddressAndPort);
      }
      ...

As commons-lang3 is available, why not use StringUtils.join?

      sb.append(StringUtils.join(this.exitAddresses, "+"));

This would apply to a few other places, too:

src/main/java/org/torproject/onionoo/docs/NodeStatus.java:433:        sb.append((written++ > 0 ? "+" : "") + orAddressAndPort);
src/main/java/org/torproject/onionoo/docs/NodeStatus.java:440:        sb.append((written++ > 0 ? "+" : "") + exitAddress);
src/main/java/org/torproject/onionoo/docs/NodeStatus.java:449:      sb.append((written++ > 0 ? "," : "") + relayFlag);
src/main/java/org/torproject/onionoo/docs/NodeStatus.java:482:        sb.append((written++ > 0 ? ";" : "") + familyFingerprint);
src/main/java/org/torproject/onionoo/docs/ClientsHistory.java:125:      sb.append((written++ > 0 ? "," : "") + e.getKey() + "="
src/main/java/org/torproject/onionoo/server/PerformanceMetrics.java:67:          sb.append((written++ > 0 ? ", " : "") + string + " ("
src/main/java/org/torproject/onionoo/server/PerformanceMetrics.java:100:          sb.append((j > 0 ? ", " : "") + "." + permilles[j]
src/main/java/org/torproject/onionoo/server/PerformanceMetrics.java:111:        sb.append((j > 0 ? ", " : "") + "." + permilles[j] + "<null");
src/main/java/org/torproject/onionoo/server/ResponseBuilder.java:154:      addressesBuilder.append((written++ > 0 ? "," : "") + "\""

Coding Style

Well, when reading through code, the coding style also gets some focus.
Thus, a few suggestions for the contributer guide:

NodeDetailsStatusUpdater:236

      int orPort = entry.getOrPort(), dirPort = entry.getDirPort();

Such declarations are a little less readable imho;
and, they can trip up IDEs during refactoring.
I'd prefer

      int orPort = entry.getOrPort(); 
      int dirPort = entry.getDirPort();

instead.
So, maybe we could add this to the contributing guide?

In addition, there are many places where 'null' values are validated,
e.g. if(someVariable == null) ... .
It might be better to always use if(null == someVariable) ...
in order to avoid accidental assigment by style?

In quite a few places I encountered if-statements that lead to a silent method exit, e.g.

   if(someBooleanExpression){
	return;
   }

A debug logging statement with a reason might be of value in these cases;
for both reading the code and later on when programming maintenance or enhancements.

comment:11 Changed 5 years ago by karsten

Cool, thanks for the review! Let's first agree how to handle the hostname hack, and then squash, merge, deploy, and hope for the best.

To be honest, I don't like the hostname hack very much. But I wrote it, because I wanted to add as little temporary code as possible. And I figured this hack is okay, because a) host names are newly resolved after 12 hours after which nobody will figure out there was a hack, b) there's a yet unknown bug with host names in master (3270 of the current details files using master have a host_name field, compared to 9290 files on the mirror running the new branch), c) neither Atlas nor Globe show the hostname.

But let's talk about alternatives, also for the next time when we plan to deploy a hack like this: we could go through all node statuses and write their host names to details status files. That's probably another 10 lines of code which will only be executed once. Though we should probably test them before we deploy them. What do you think?

Your other suggestions all make sense and we should implement them all after merging this branch. The only thing that I'm not certain about yet is the debug log line whenever we silently exit a method. I'd rather postpone adding new debug log statements until we define common guidelines for when to log on what level. Adding a code comment seems very reasonable, though. I can make these changes after merging the other stuff to master, unless you'd rather want to make them. Let me know.

comment:12 Changed 5 years ago by karsten

Resolution: fixed
Status: needs_reviewclosed

I squashed and merged my branch and implemented most of the changes you suggested. That concludes this ticket. Thanks for all the input! Closing.

Note: See TracTickets for help on using tickets.