Opened 3 years ago

Closed 2 years ago

#19934 closed enhancement (worksforme)

CollecTor should use new metrics-lib json classes

Reported by: iwakeh Owned by: iwakeh
Priority: Medium Milestone: CollecTor 1.2.0
Component: Metrics/CollecTor Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

As described in #19791.
The new classes should be used for creating index.json

Depends on #19791.

Child Tickets

Change History (12)

comment:1 Changed 3 years ago by iwakeh

Status: newassigned

comment:2 Changed 3 years ago by iwakeh

Status: assignedneeds_review

Please review this branch.

comment:3 Changed 3 years ago by karsten

Without having reviewed the patch in detail, I'm worried about using types from metrics-lib's org.torproject.descriptor.index package which are still in alpha stage. Should we rather try to hide the details from CollecTor by providing a DescriptorIndexGenerator interface that takes a directory as input as well as a list of desired file extensions and that produces index.json* files accordingly?

comment:4 Changed 3 years ago by iwakeh

Some thoughts:

  1. The implementation of #18910 requires CollecTor to have a Java representation of index.json to choose the documents to download from the partner-synch-Collector instance(s), especially with the pick-and-choose requirements from comment:11 in #18910.
  1. Shouldn't a CollecTor instance have more fine grained control when creating index.json? More than just specifying a directory. Currently it includes already recent and archive.
  1. The package was designated as alpha to prevent too early reliance on the new API to have more flexibility when implementing #18910. It is well tested in metrics-lib.

comment:5 in reply to:  4 Changed 3 years ago by karsten

Replying to iwakeh:

Some thoughts:

  1. The implementation of #18910 requires CollecTor to have a Java representation of index.json to choose the documents to download from the partner-synch-Collector instance(s), especially with the pick-and-choose requirements from comment:11 in #18910.

So, this is about using metrics-lib *Node classes for obtaining descriptors, not for providing index.json* files, right?

I don't recall the exact requirements we discussed for #18910, and I think we discussed quite a few variations there. But what we can already do is specify an array of directories to synchronize. The local CollecTor instance would then decide locally from looking at synchronized files which ones to keep and copy over and which ones to ignore.

What we could do is pass a list of excluded paths to DescriptorCollector, which would contain paths of consensuses and votes, possibly even with last modified times, that we already have and that we don't want to synchronize. However, this feels a bit like premature optimization. There's no big harm in downloading the entire recent/ folder from a remote CollecTor instance and decide locally what to do with the data. We're moving around larger chunks of bytes than that.

  1. Shouldn't a CollecTor instance have more fine grained control when creating index.json? More than just specifying a directory. Currently it includes already recent and archive.

Well, we could accept an array of directories instead of just one, if that helps. Or a base directory and an array of contained subdirectories to include. Whatever is most intuitive and does the job.

  1. The package was designated as alpha to prevent too early reliance on the new API to have more flexibility when implementing #18910. It is well tested in metrics-lib.

Oh, I'm not worried that it might not be tested well enough. I'm worried about making the API bigger and having to maintain these parts in the future. This whole index.json* stuff is something that library users ideally shouldn't have to worry about. That's why I'm still trying hard to hide it away as best as I can. If this turns out to be impossible or impracticable, so be it. But I'm not there yet. :)

comment:6 Changed 3 years ago by iwakeh

Status: needs_reviewassigned

All index package *Node classes will be moved to CollecTor (incl. the related tests).

A copy of these some of these classes for accessing the index.json variants is also kept in metrics-lib., see #20039.

comment:7 Changed 3 years ago by iwakeh

Please review this branch containing the changes discussed in Berlin.

Set descriptor dependency version to 1.4.0-dev. For release it should be changed to the released version including the changes of #20039. I'm not that happy about keeping the same code in two products; more in #20039.

A small necessary tweak to SchedulerTest is included.

comment:8 Changed 3 years ago by iwakeh

Status: assignedneeds_review

comment:9 Changed 3 years ago by iwakeh

Status: needs_reviewassigned

Maybe change this based on #20405.
Set to assigned until #20405 is finished.

comment:10 Changed 3 years ago by iwakeh

Milestone: CollecTor 1.1.0CollecTor 1.2.0

comment:11 Changed 2 years ago by karsten

Status: assignedneeds_information

We closed #20405 as wontfix/worksforme, and I believe we can do the same here, right?

comment:12 in reply to:  11 Changed 2 years ago by iwakeh

Resolution: worksforme
Status: needs_informationclosed

Replying to karsten:

We closed #20405 as wontfix/worksforme, and I believe we can do the same here, right?

Yep, that's right.
Closing.

Note: See TracTickets for help on using tickets.