Opened 3 years ago

Closed 7 months ago

#16596 closed enhancement (implemented)

Change database queries towards making only a single query per request

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/ExoneraTor Version:
Severity: Normal Keywords:
Cc: karsten Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We're currently making up to four database requests to handle a single user request. We're re-using the same database connection for them, but we can very likely do better.

Maybe we can reduce this to making only a single database query per request. This might require some mild hacking, because we'd want the query response to contain at least the following information:

  • table rows for the technical details part,
  • addresses in the same /24 or /48 without details
  • whether there were any consensuses in the database for the requested day or previous/next day, and
  • dates of first and last consensus in the database.

Child Tickets

Change History (17)

comment:1 Changed 11 months ago by rose

Severity: Normal

comment:2 Changed 9 months ago by karsten

Cc: karsten added

This might be related to the recently pushed fixes/enhancements. I'll let irl do a quick check whether that's the case. If it's potentially Onionoo-related, please let me know and I'll take a closer look!

comment:3 Changed 9 months ago by karsten

Owner: set to karsten
Status: newaccepted

Err, wrong ticket, this one just happened to be open. By the way, I picked up work on this one yesterday.

comment:4 Changed 8 months ago by karsten

Priority: LowMedium
Status: acceptedneeds_review

Please review my branch task-16596 which reduces database access to a single query and which prepares for separating ExoneraTor into a back-end and a front-end part.

comment:5 Changed 8 months ago by karsten

I just pushed another commit to that branch which starts using a JSP in preparation for moving the page to metrics-web.

comment:6 Changed 8 months ago by iwakeh

Already found a bit. There are also a few things that are part of other tickets (#19623, #19624, #21145), which I tried to omit here.

ExoneraTorServlet: exoneratorHost should not be configured via web.xml, rather use simple java properties file or even simpler hard code. After all it shouldn't change that often, should it?

In 'step 3' I see some problems with null values, for example, "".equals(null) would evaluate to false (line 147 following).

For most exceptions caught the error message should be logged; and, it might be time to switch to slf4j-api and logback, now.

In addition, reading the response query should also catch RuntimeExceptions (possibly from Gson).

The version received should also be logged, if it differs from the known version.
The known version(s) could be a constant in ExoneraTorServlet; either just the major part or all. This makes it more obvious.

QueryServlet: Helper methods private String parseTimestampParameter and private String parseIpParameter should be public static and tests should be added. Similarly, private String convertIpV*ToHex, which could be combined by simply adding another argument, i.e., public static String convertIpV4ToHex(String relayIp, boolean isIpV4).
A MILLISEC_IN_DAY = 24L * 60L * 60L * 1000L constant would be useful.

Other: Using the object Boolean exit for a triplet state is a bit too much of a hack. There could be a simple enum, as simple as, for example, public enum Exit { U, Y, N; }.

Regarding SQL:
Order-by statement should be using names not numbers.
Couldn't the exit-information be part of the query already?

(I could also look into providing some patches, if we agree on the changes and you don't have the time?)

comment:7 Changed 8 months ago by iwakeh

Status: needs_reviewneeds_revision

comment:8 Changed 8 months ago by karsten

Replying to iwakeh:

Thanks for starting this review!

Already found a bit. There are also a few things that are part of other tickets (#19623, #19624, #21145), which I tried to omit here.

Okay.

ExoneraTorServlet: exoneratorHost should not be configured via web.xml, rather use simple java properties file or even simpler hard code. After all it shouldn't change that often, should it?

Hmm, right. My idea was that we'll later copy over this servlet to metrics-web where it would make a little more sense to make the host name configurable. But on second thought that's not really the case. I'll just hard-code it.

In 'step 3' I see some problems with null values, for example, "".equals(null) would evaluate to false (line 147 following).

Note that we're using null for invalid parameter values and "" for empty parameter values. Do you see actual problems or potential problems there?

In the latter case (potential problems), maybe we can resolve them by documenting things a bit better, or by making things clearer in subsequent commits.

In the former case (actual problems), let's of course fix the issues now. But I'd have to see the actual problem first.

For most exceptions caught the error message should be logged; and, it might be time to switch to slf4j-api and logback, now.

Yes, we can switch to slf4j, though we should do that in a separate commit on top of these, probably in a new ticket.

In addition, reading the response query should also catch RuntimeExceptions (possibly from Gson).

Ugh, indeed. Will fix. Good catch. :)

The version received should also be logged, if it differs from the known version.

Yep, good idea.

The known version(s) could be a constant in ExoneraTorServlet; either just the major part or all. This makes it more obvious.

Also a good idea.

QueryServlet: Helper methods private String parseTimestampParameter and private String parseIpParameter should be public static and tests should be added. Similarly, private String convertIpV*ToHex, which could be combined by simply adding another argument, i.e., public static String convertIpV4ToHex(String relayIp, boolean isIpV4).

Agreed, though let's save that for another ticket and not overload this ticket with too many improvements all at once. I know it's tempting. Stay strong, we'll eventually fix these things.

A MILLISEC_IN_DAY = 24L * 60L * 60L * 1000L constant would be useful.

Agreed. That's probably simple enough for a separate commit on top of this branch.

Other: Using the object Boolean exit for a triplet state is a bit too much of a hack. There could be a simple enum, as simple as, for example, public enum Exit { U, Y, N; }.

Ugh, that would turn the JSON field into a string, which doesn't seem very intuitive for "is an exit". And whoever processes this JSON will need to check for null anyway, regardless of boolean or string. In fact, I think that we're using the boolean field exactly in the way it's supposed to be used: true means "is an exit", false means "is not an exit", and null means "we don't know". I think I'd like to leave this one unchanged. It's not a hack.

Regarding SQL:
Order-by statement should be using names not numbers.

Hmm, now that you mention that, I wonder if we even need results to be sorted at all. I'll try to take that out. Otherwise I'd change numbers to names in a subsequent commit, because it's not directly related to this ticket.

Couldn't the exit-information be part of the query already?

Yes, but I'd like to save database changes for a later ticket. There's so, so much more we can do to make the database schema more efficient. (I'd have to look at my notes from last year, but I believe that we're storing exit information directly in the database rather than the entire raw status entry.)

(I could also look into providing some patches, if we agree on the changes and you don't have the time?)

I'll try to make changes tomorrow morning. Thanks again for the initial review!

comment:9 in reply to:  8 Changed 8 months ago by karsten

Status: needs_revisionneeds_review

Replying to karsten:

Replying to iwakeh:

Thanks for starting this review!

Please find my updated task-16596 branch with 3 squash commits and 2 additional commits.

Already found a bit. There are also a few things that are part of other tickets (#19623, #19624, #21145), which I tried to omit here.

Okay.

ExoneraTorServlet: exoneratorHost should not be configured via web.xml, rather use simple java properties file or even simpler hard code. After all it shouldn't change that often, should it?

Hmm, right. My idea was that we'll later copy over this servlet to metrics-web where it would make a little more sense to make the host name configurable. But on second thought that's not really the case. I'll just hard-code it.

Fixed.

In 'step 3' I see some problems with null values, for example, "".equals(null) would evaluate to false (line 147 following).

Note that we're using null for invalid parameter values and "" for empty parameter values. Do you see actual problems or potential problems there?

In the latter case (potential problems), maybe we can resolve them by documenting things a bit better, or by making things clearer in subsequent commits.

In the former case (actual problems), let's of course fix the issues now. But I'd have to see the actual problem first.

For most exceptions caught the error message should be logged; and, it might be time to switch to slf4j-api and logback, now.

Yes, we can switch to slf4j, though we should do that in a separate commit on top of these, probably in a new ticket.

In addition, reading the response query should also catch RuntimeExceptions (possibly from Gson).

Ugh, indeed. Will fix. Good catch. :)

Fixed.

The version received should also be logged, if it differs from the known version.

Yep, good idea.

The known version(s) could be a constant in ExoneraTorServlet; either just the major part or all. This makes it more obvious.

Also a good idea.

Both fixed.

QueryServlet: Helper methods private String parseTimestampParameter and private String parseIpParameter should be public static and tests should be added. Similarly, private String convertIpV*ToHex, which could be combined by simply adding another argument, i.e., public static String convertIpV4ToHex(String relayIp, boolean isIpV4).

Agreed, though let's save that for another ticket and not overload this ticket with too many improvements all at once. I know it's tempting. Stay strong, we'll eventually fix these things.

A MILLISEC_IN_DAY = 24L * 60L * 60L * 1000L constant would be useful.

Fixed.

Agreed. That's probably simple enough for a separate commit on top of this branch.

Other: Using the object Boolean exit for a triplet state is a bit too much of a hack. There could be a simple enum, as simple as, for example, public enum Exit { U, Y, N; }.

Ugh, that would turn the JSON field into a string, which doesn't seem very intuitive for "is an exit". And whoever processes this JSON will need to check for null anyway, regardless of boolean or string. In fact, I think that we're using the boolean field exactly in the way it's supposed to be used: true means "is an exit", false means "is not an exit", and null means "we don't know". I think I'd like to leave this one unchanged. It's not a hack.

Regarding SQL:
Order-by statement should be using names not numbers.

Hmm, now that you mention that, I wonder if we even need results to be sorted at all. I'll try to take that out. Otherwise I'd change numbers to names in a subsequent commit, because it's not directly related to this ticket.

Fixed.

Couldn't the exit-information be part of the query already?

Yes, but I'd like to save database changes for a later ticket. There's so, so much more we can do to make the database schema more efficient. (I'd have to look at my notes from last year, but I believe that we're storing exit information directly in the database rather than the entire raw status entry.)

(I could also look into providing some patches, if we agree on the changes and you don't have the time?)

I'll try to make changes tomorrow morning. Thanks again for the initial review!

Want to take another look at the revised branch? Anything else that needs changing before this can go into master (after squashing the squash commits)? Thanks!

comment:10 Changed 8 months ago by iwakeh

Status: needs_reviewmerge_ready

Ok, I also took the time to make some changes, some of which I also found necessary for the reviewing itself.
More inline ...

Replying to karsten:

...

Hmm, right. My idea was that we'll later copy over this servlet to metrics-web where it would make a little more sense to make the host name configurable. But on second thought that's not really the case. I'll just hard-code it.

Fine.

...

In 'step 3' I see some problems with null values, for example, "".equals(null) would evaluate to false (line 147 following).

Note that we're using null for invalid parameter values and "" for empty parameter values. Do you see actual problems or potential problems there?

In the latter case (potential problems), maybe we can resolve them by documenting things a bit better, or by making things clearer in subsequent commits.

Maybe, think about a clearer way of documenting. There is not actual problem considering the checks before the call to writeForm.

...

For most exceptions caught the error message should be logged; and, it might be time to switch to slf4j-api and logback, now.

Yes, we can switch to slf4j, though we should do that in a separate commit on top of these, probably in a new ticket.

Please find a commit on my task branch for that (link below).

...

QueryServlet: Helper methods private String parseTimestampParameter and private String parseIpParameter should be public static and tests should be added. Similarly, private String convertIpV*ToHex, which could be combined by simply adding another argument, i.e., public static String convertIpV4ToHex(String relayIp, boolean isIpV4).

Agreed, though let's save that for another ticket and not overload this ticket with too many improvements all at once. I know it's tempting. Stay strong, we'll eventually fix these things.

Ok, new ticket.

A MILLISEC_IN_DAY = 24L * 60L * 60L * 1000L constant would be useful.

Fixed.

Fine.

Other: Using the object Boolean exit for a triplet state is a bit too much of a hack. There could be a simple enum, as simple as, for example, public enum Exit { U, Y, N; }.

Ugh, that would turn the JSON field into a string, which doesn't seem very intuitive for "is an exit". And whoever processes this JSON will need to check for null anyway, regardless of boolean or string. In fact, I think that we're using the boolean field exactly in the way it's supposed to be used: true means "is an exit", false means "is not an exit", and null means "we don't know". I think I'd like to leave this one unchanged. It's not a hack.

This is what I added to the third commit in my branch. The U,N,Y abbreviations are handy and more readable.
In addition, setting a value to null will make the default Gson omit the field from the json output. That is not that intuitive.

Regarding SQL:
Order-by statement should be using names not numbers.

Hmm, now that you mention that, I wonder if we even need results to be sorted at all. I'll try to take that out. Otherwise I'd change numbers to names in a subsequent commit, because it's not directly related to this ticket.

Fixed.

Fine, not ordering should also save some query time.

Couldn't the exit-information be part of the query already?

Yes, but I'd like to save database changes for a later ticket. There's so, so much more we can do to make the database schema more efficient. (I'd have to look at my notes from last year, but I believe that we're storing exit information directly in the database rather than the entire raw status entry.)

(I could also look into providing some patches, if we agree on the changes and you don't have the time?)

The changes I added in this branch start adding some tests also for the JSON/Gson related code. The tests document what JSON format to expect. I moved all Gson related into QueryResponse. If we ever switch to some other JSON generator/parser it will be easier to accomplish.
The version information is also all moved there. Exit is an enum now and relevantStatuses a primitive. The VERSION constant is the version implemnted by QueryServlet and 'version' what got read (default VERSION).
The test changes are split in two commits: one for the restructuring and one for the actual tests. Please check the tests carefully, as I simply made everything pass as it is right now.
Another commit streamlines all logging as slf4j and logback were already part of ExoneraTor.

The first commit adapted .gitignore and the fifth sets Java 8, which works fine.

Ready for merge hopefully with tests :-)

comment:11 Changed 7 months ago by iwakeh

Status: merge_readyneeds_information

Actually, after adding a 'minimalistic' test (cf. fixup commit), I'm wondering, if this should be merged as is. Shouldn't the JSON contain all fields?

comment:12 Changed 7 months ago by karsten

Status: needs_informationneeds_review

Thanks for the review. However, I think we'll need to resolve two open discussion points before moving forward here:

  1. I still believe that exit should be a boolean value rather than a one-letter string. This is quite clearly a true/false question, with null being the exception case where we don't know. I think it's valid to define the protocol with a field that may be omitted if unknown. We're doing that in Onionoo, too, and that's not a bug. We wouldn't, for example, change Onionoo's hibernating or recommended_version to string with Y/N/U only to reflect that the field may contain something else than true or false. Even worse, with Y/N/U as string values, an implementation would still have to check for null and handle that case, or risk running into a NullPointerException. In short, I don't see a reason for switching the Boolean there to String or Enum. We shouldn't start doing this in ExoneraTor, and we shouldn't do it elsewhere.
  1. Let's not move around any more code than necessary at this point. As I wrote in one of the commit messages: "Note that some code duplication between ExoneraTorServlet and QueryServlet was deemed acceptable, because ExoneraTorServlet will be moved to metrics-web in the medium term anyway." I understand the desire to create an ExoneraTorUtils class for methods that are otherwise duplicated. But I'd like us to move ExoneraTorServlet away to metrics-web soon after this branch is merged, and in that case a utils class wouldn't make much sense anymore. Similarly, let's hold back adding new unit tests until ExoneraTorServlet is gone. This branch is a work in progress, not the shiny new ExoneraTor that we'll keep unchanged for the next five years to come.

If you agree, I'm happy to cherry-pick the changes from your branch that I think can go in at this point. There are certainly a lot of changes that I'd like to keep. But if you disagree, I'd like to first discuss those two points before moving forward. What do you say?

(Setting to needs_review to indicate that this ticket needs input from the reviewer, not because there's new code to review.)

comment:13 in reply to:  12 Changed 7 months ago by iwakeh

Replying to karsten:

Thanks for the review. However, I think we'll need to resolve two open discussion points before moving forward here:

  1. I still believe that exit should be a boolean value rather than a one-letter string. This is quite clearly a true/false question, with null being the exception case where we don't know. I think it's valid to define the protocol with a field that may be omitted if unknown. We're doing that in Onionoo, too, and that's not a bug. We wouldn't, for example, change Onionoo's hibernating or recommended_version to string with Y/N/U only to reflect that the field may contain something else than true or false. Even worse, with Y/N/U as string values, an implementation would still have to check for null and handle that case, or risk running into a NullPointerException. In short, I don't see a reason for switching the Boolean there to String or Enum. We shouldn't start doing this in ExoneraTor, and we shouldn't do it elsewhere.

Hmm ..., ok. Let 'exit' stay Boolean.

  1. Let's not move around any more code than necessary at this point. As I wrote in one of the commit messages: "Note that some code duplication between ExoneraTorServlet and QueryServlet was deemed acceptable, because ExoneraTorServlet will be moved to metrics-web in the medium term anyway." I understand the desire to create an ExoneraTorUtils class for methods that are otherwise duplicated. But I'd like us to move ExoneraTorServlet away to metrics-web soon after this branch is merged, and in that case a utils class wouldn't make much sense anymore. Similarly, let's hold back adding new unit tests until ExoneraTorServlet is gone. This branch is a work in progress, not the shiny new ExoneraTor that we'll keep unchanged for the next five years to come.

I can do without the ExoneraTorUtils. Still the methods that only return something based on the input they received should become 'static', which will make it easier to integrate this code later in metrics-web (and also find out there what duplication there is).

Anyway, even work-in-progress should be readable, without redundancies, and well encapsulated. Thus, I would want to keep the Gson/JSON encapsulation and the tests for QueryResponse. These are living documentation of the protocol, which we don't want to write a spec for (yet). There has to be a way to determine what was intended to be programmed other than by looking at the running code. How to tell undiscovered bugs apart from features? (And, all what was agreed in tickets doesn't count as documentation, i.e., maybe document the thought about the move of ExoneraTorServlet in the class and explain the current state of code?)

If you agree, I'm happy to cherry-pick the changes from your branch that I think can go in at this point. There are certainly a lot of changes that I'd like to keep. But if you disagree, I'd like to first discuss those two points before moving forward. What do you say?

I assume you would want to keep logging streamlining as well as the QueryResponse tests.
(I noticed that QueryServlet is still using the jul; I'll provide a fixup commit.)

comment:14 Changed 7 months ago by karsten

Sounds good! I'll wait for your fixup commit and begin revising my branch either later this afternoon or tomorrow.

comment:15 Changed 7 months ago by karsten

Alright. I went through your commits and reverted the two parts discussed above. My updated task-16596 branch contains the cherry-picked commits from your branch together with fixup commits.

Please let me know whether this branch looks fine now. Once it does, I'll squash the fixup/squash commits and merge to master. Thanks!

comment:16 Changed 7 months ago by iwakeh

Status: needs_reviewmerge_ready

Looks fine and passes tests (has tests now) and checks.

I added the fixup commit from comment:13 on top of your branch.

Btw: here we could also apply and test the new 'internal release process', which is sketched here.

comment:17 Changed 7 months ago by karsten

Resolution: implemented
Status: merge_readyclosed

Merged to master and deployed with a tweak to the new query that makes use of an existing index. (It's really hard to test these things on a local database only.) Now it should work. Also tagged as 1.0.0 and 1.0.1, following the suggested internal release process. Closing. Thanks for all the input!

Note: See TracTickets for help on using tickets.