Opened 7 weeks ago

Closed 4 weeks ago

#31558 closed enhancement (fixed)

Process bridge pool assignments again

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: BugSmashFund
Cc: metrics-team, phw Actual Points:
Parent ID: Points:
Reviewer: irl Sponsor:

Description

Bridge pool assignments are being rsync'ed from polyanthum to colchicifolium again. What remains is that CollecTor processes them. We should be able to reuse old code for this, but we'll have to update it quite a bit.

Child Tickets

Change History (13)

comment:1 Changed 7 weeks ago by phw

Cc: phw added

comment:2 Changed 7 weeks ago by karsten

Status: newneeds_review

irl, please review commit db2a6bd in my task-31558 branch. Thanks!

comment:3 Changed 7 weeks ago by karsten

Reviewer: irl

comment:4 Changed 7 weeks ago by notirl

Status: -> needs_revision_maybe

There is not another "Processor" class in CollecTor. We already have: Annotation, Builder, Checker, Configuration, Criterium, Downloader, Exception, Hook, Json, Key, Main, Manager, Map, Metadata, Parser, Persistence, Reader, Scheduler, Type, Utils, Weblogs, Writer. That's a lot of concepts, can we reuse one for this? (I put a table later in this comment with the number of times each noun appears in the codebase.)

The startProcessing() function is huge and there are no tests for it. ): This also makes the review difficult. I'm relatively confident that we're not going to accidentally publish a load of raw bridge fingerprints but as to the correctness of the code, and the reliability, I'm less confident.

Saying this, if this huge function was working before then I'm happier about adding it back now and then filing tickets to break it down and add tests later. At least then it's not completely unknown new code.

num  noun
   1 Annotation
   1 Builder
   1 Checker
   1 Configuration
   2 Criterium
   3 Downloader
   1 Exception
   1 Hook
   1 Json
   1 Key
   2 Main
   1 Manager
   1 Map
   1 Metadata
   2 Parser
  15 Persistence
   3 Reader
   1 Scheduler
   1 Type
   1 Utils
   1 Weblogs
   2 Writer

comment:5 Changed 7 weeks ago by irl

Status: needs_reviewneeds_revision

comment:6 in reply to:  4 Changed 7 weeks ago by karsten

Replying to notirl:

Status: -> needs_revision_maybe

There is not another "Processor" class in CollecTor. We already have: Annotation, Builder, Checker, Configuration, Criterium, Downloader, Exception, Hook, Json, Key, Main, Manager, Map, Metadata, Parser, Persistence, Reader, Scheduler, Type, Utils, Weblogs, Writer. That's a lot of concepts, can we reuse one for this? (I put a table later in this comment with the number of times each noun appears in the codebase.)

The name comes from the class that we retired a few years back. We could pick a new one, though. What this class does is: read files from disk, possibly split them by bridge pool assignment list, replace bridge fingerprints with sanitized ones, write one file per sanitized bridge pool assignment. I think processor is fine, but if you prefer something else, let's switch.

The startProcessing() function is huge and there are no tests for it. ): This also makes the review difficult. I'm relatively confident that we're not going to accidentally publish a load of raw bridge fingerprints but as to the correctness of the code, and the reliability, I'm less confident.

Saying this, if this huge function was working before then I'm happier about adding it back now and then filing tickets to break it down and add tests later. At least then it's not completely unknown new code.

I agree that the method is too long. I'm currently working on breaking it up. It did work a few years back, but now is a good time to clean it up a little. Will have something to review later this afternoon.

comment:7 Changed 7 weeks ago by karsten

Status: needs_revisionneeds_review

comment:8 Changed 6 weeks ago by karsten

Keywords: BugSmashFund added

comment:9 in reply to:  7 ; Changed 6 weeks ago by fava

Replying to karsten:

Please review commit db2a6bd in my task-31558 branch.

Hi karsten,

it is my first review so let me know if I could do better.

  1. Please order statement following Java guidelines . This guidelines was been written in 1999 but is a good point to start and it is used also in static code analysis
  1. Probably it make sense to create a constant for string "BridgePoolAssignments"
  1. Instead of evaluate each time 330L * 60L * 1000L it is better to create a constant with meaningfull name
  1. Instead of evaluate each time 3L * 24L * 60L * 60L * 1000L it is better to create a constant with meaningfull name
  1. startProcessing it seems to complex, is it possible to split in more separated and testeable methods?
  1. there are some for each loop, is it make sense use a functional approach using lambda?


Please let me know if I could help you in some way,

Best Regards

comment:10 Changed 5 weeks ago by karsten

Status: needs_reviewneeds_revision

Thanks, fava, I'm revising my branch now to incorporate your suggestions.

comment:11 in reply to:  9 Changed 5 weeks ago by karsten

Status: needs_revisionneeds_review

Replying to fava:

Replying to karsten:

Please review commit db2a6bd in my task-31558 branch.

Hi karsten,

it is my first review so let me know if I could do better.

  1. Please order statement following Java guidelines . This guidelines was been written in 1999 but is a good point to start and it is used also in static code analysis

We do have guidelines for ordering parts in a class: https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam/MetricsJavaStyleGuide

What we'd really need is a way to automatically check this, just like how we have Checkstyle to check code style violations. Would you want to try out some tools that can do these checks? Requirement is that we can obtain them using ant resolve (using Ant Ivy internally) and that we can run them from Ant just like we're running Checkstyle. (This would be something for a new ticket.)

  1. Probably it make sense to create a constant for string "BridgePoolAssignments"

One might think so, but these two strings actually mean something different. I'd rather not want to use a single constant for them, because that would imply that they're the same thing. What I'd really want is give up on using these string constants, but that's a larger refactoring than I'd want to do at the moment.

  1. Instead of evaluate each time 330L * 60L * 1000L it is better to create a constant with meaningfull name

This is already changed in the more recent branch.

  1. Instead of evaluate each time 3L * 24L * 60L * 60L * 1000L it is better to create a constant with meaningfull name

I changed this in commit 9c1acf8 in my task-31558 branch.

  1. startProcessing it seems to complex, is it possible to split in more separated and testeable methods?

I already changed this in the more recent branch, too.

  1. there are some for each loop, is it make sense use a functional approach using lambda?

Maybe, but to me this falls into the category of prettying code rather than reviewing that it's good to go in. It's more like the try-with-resource stuff that we could do anytime but which shouldn't block the merge in this case.

Please let me know if I could help you in some way,

Best Regards

Thanks for doing this review!

Here's one suggestion for the next review: Be sure to not only review the first commit, but also consider any follow-up commits in that branch. The way I'd do that is check out the branch and look at git diff $beforefirstcommit.., e.g., git diff 8010084.. in this case.

comment:12 Changed 4 weeks ago by irl

Status: needs_reviewmerge_ready

Looking good. That's a lot easier to read and follow now.

there are some for each loop, is it make sense use a functional approach using lambda?

We should look more at Java 8 Streams API because this gives us some parallel processing of many of the foreach loops. Not now, but something to think about.

https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html

comment:13 Changed 4 weeks ago by karsten

Resolution: fixed
Status: merge_readyclosed

Great, thanks for checking. Rebased, squashed, tweaked, merged, and preparing a release now.

Regarding loops and streams: we started using streams in some places, but we're indeed not using them as much as we could. Happy to accept patches!

Note: See TracTickets for help on using tickets.