Process bridge pool assignments again

added BugSmashFund component::metrics/collector owner::metrics-team priority::medium resolution::fixed reviewer::irl severity::normal status::closed type::enhancement labels

Trac:
Cc: metrics-team to metrics-team, phw

irl, please review commit db2a6bd in my task-31558 branch. Thanks!

Trac:
Status: new to needs_review

Trac:
Reviewer: N/A to irl

Status: -> needs_revision_maybe

There is not another "Processor" class in CollecTor. We already have: Annotation, Builder, Checker, Configuration, Criterium, Downloader, Exception, Hook, Json, Key, Main, Manager, Map, Metadata, Parser, Persistence, Reader, Scheduler, Type, Utils, Weblogs, Writer. That's a lot of concepts, can we reuse one for this? (I put a table later in this comment with the number of times each noun appears in the codebase.)

The startProcessing() function is huge and there are no tests for it. ): This also makes the review difficult. I'm relatively confident that we're not going to accidentally publish a load of raw bridge fingerprints but as to the correctness of the code, and the reliability, I'm less confident.

Saying this, if this huge function was working before then I'm happier about adding it back now and then filing tickets to break it down and add tests later. At least then it's not completely unknown new code.

num  noun
   1 Annotation
   1 Builder
   1 Checker
   1 Configuration
   2 Criterium
   3 Downloader
   1 Exception
   1 Hook
   1 Json
   1 Key
   2 Main
   1 Manager
   1 Map
   1 Metadata
   2 Parser
  15 Persistence
   3 Reader
   1 Scheduler
   1 Type
   1 Utils
   1 Weblogs
   2 Writer

Trac:
Status: needs_review to needs_revision

Replying to notirl:

Status: -> needs_revision_maybe

There is not another "Processor" class in CollecTor. We already have: Annotation, Builder, Checker, Configuration, Criterium, Downloader, Exception, Hook, Json, Key, Main, Manager, Map, Metadata, Parser, Persistence, Reader, Scheduler, Type, Utils, Weblogs, Writer. That's a lot of concepts, can we reuse one for this? (I put a table later in this comment with the number of times each noun appears in the codebase.)

The name comes from the class that we retired a few years back. We could pick a new one, though. What this class does is: read files from disk, possibly split them by bridge pool assignment list, replace bridge fingerprints with sanitized ones, write one file per sanitized bridge pool assignment. I think processor is fine, but if you prefer something else, let's switch.

The startProcessing() function is huge and there are no tests for it. ): This also makes the review difficult. I'm relatively confident that we're not going to accidentally publish a load of raw bridge fingerprints but as to the correctness of the code, and the reliability, I'm less confident.

Saying this, if this huge function was working before then I'm happier about adding it back now and then filing tickets to break it down and add tests later. At least then it's not completely unknown new code.

I agree that the method is too long. I'm currently working on breaking it up. It did work a few years back, but now is a good time to clean it up a little. Will have something to review later this afternoon.

Please review commit db2a6bd in my task-31558 branch.

Trac:
Status: needs_revision to needs_review

Trac:
Keywords: N/A deleted, BugSmashFund added

Replying to karsten:

Please review commit db2a6bd in my task-31558 branch.

Hi karsten,

it is my first review so let me know if I could do better.

Please order statement following Java guidelines . This guidelines was been written in 1999 but is a good point to start and it is used also in static code analysis
Probably it make sense to create a constant for string "BridgePoolAssignments"
Instead of evaluate each time 330L * 60L * 1000L it is better to create a constant with meaningfull name
Instead of evaluate each time 3L * 24L * 60L * 60L * 1000L it is better to create a constant with meaningfull name
startProcessing it seems to complex, is it possible to split in more separated and testeable methods?
there are some for each loop, is it make sense use a functional approach using lambda?

Please let me know if I could help you in some way,

Best Regards

Trac:
Username: fava

Thanks, fava, I'm revising my branch now to incorporate your suggestions.

Trac:
Status: needs_review to needs_revision

Replying to fava:

Replying to karsten:

Please review commit db2a6bd in my task-31558 branch.

Hi karsten,

it is my first review so let me know if I could do better.

Please order statement following Java guidelines . This guidelines was been written in 1999 but is a good point to start and it is used also in static code analysis

We do have guidelines for ordering parts in a class: https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam/MetricsJavaStyleGuide

What we'd really need is a way to automatically check this, just like how we have Checkstyle to check code style violations. Would you want to try out some tools that can do these checks? Requirement is that we can obtain them using ant resolve (using Ant Ivy internally) and that we can run them from Ant just like we're running Checkstyle. (This would be something for a new ticket.)

Probably it make sense to create a constant for string "BridgePoolAssignments"

One might think so, but these two strings actually mean something different. I'd rather not want to use a single constant for them, because that would imply that they're the same thing. What I'd really want is give up on using these string constants, but that's a larger refactoring than I'd want to do at the moment.

Instead of evaluate each time 330L * 60L * 1000L it is better to create a constant with meaningfull name

This is already changed in the more recent branch.

Instead of evaluate each time 3L * 24L * 60L * 60L * 1000L it is better to create a constant with meaningfull name

I changed this in commit 9c1acf8 in my task-31558 branch.

startProcessing it seems to complex, is it possible to split in more separated and testeable methods?

I already changed this in the more recent branch, too.

there are some for each loop, is it make sense use a functional approach using lambda?

Maybe, but to me this falls into the category of prettying code rather than reviewing that it's good to go in. It's more like the try-with-resource stuff that we could do anytime but which shouldn't block the merge in this case.

Please let me know if I could help you in some way,

Best Regards

Thanks for doing this review!

Here's one suggestion for the next review: Be sure to not only review the first commit, but also consider any follow-up commits in that branch. The way I'd do that is check out the branch and look at git diff $beforefirstcommit.., e.g., git diff 8010084.. in this case.

Trac:
Status: needs_revision to needs_review

Looking good. That's a lot easier to read and follow now.

there are some for each loop, is it make sense use a functional approach using lambda?

We should look more at Java 8 Streams API because this gives us some parallel processing of many of the foreach loops. Not now, but something to think about.

https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html

Trac:
Status: needs_review to merge_ready

Great, thanks for checking. Rebased, squashed, tweaked, merged, and preparing a release now.

Regarding loops and streams: we started using streams in some places, but we're indeed not using them as much as we could. Happy to accept patches!

Trac:
Status: merge_ready to closed
Resolution: N/A to fixed

closed

mentioned in issue #31799 (moved)

Process bridge pool assignments again

Child items 0

Activity