Opened 2 years ago

Closed 2 years ago

#22514 closed enhancement (implemented)

Replace DescriptorReader's setMaxDescriptorFilesInQueue() with setMaxMemory()

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone: metrics-lib 1.9.0
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We currently let applications define a limit for parsed descriptor files in DescriptorReader#setMaxDescriptorFilesInQueue(). If the background can parse descriptor files faster than the application can process them, it pauses when the configured number of descriptor files is contained in the queue and only resumes as soon as the application has dequeued a descriptor file.

As we say in the docs, "the default is 100, but if descriptor files contain hundreds or even thousands of descriptors, that default may be too high". But honestly, it's impossible to define a useful number there that works for all descriptor types, from microdescriptors to votes or even files containing tens or hundreds of concatenated votes.

The real purpose of having a limit is to avoid running out of memory when parsing descriptors. So, we should just replace setMaxDescriptorFilesInQueue() with setMaxMemory(). This can't be a hard limit, because we cannot put partial descriptors in the queue (or not even partial descriptor files right now). But this limit can define a magnitute of available memory, which could be anything between, say, 50M and 1G.

This change is useful on its own, but it also prepares #22141 (after which we won't add DescriptorFile instances to the queue anymore, but Descriptor instances) and #21751 (when we'll likely want to define a setMaxThreads() method).

I'll attach a branch as soon as I have a ticket number. The code there is for discussion, not meant to be merged just yet. It provides another method to define an upper limit for descriptors (not descriptor files) which I briefly considered but deemed less useful than a limit for descriptor bytes.

Child Tickets

Change History (8)

comment:1 Changed 2 years ago by karsten

Status: newneeds_review

Please find my branch task-22514. Let's discuss, not decide about merging just yet.

comment:2 Changed 2 years ago by karsten

Milestone: metrics-lib 1.9.0

Turns out we'll need to resolve this before being able to merge #22141. What do you think about the linked branch?

comment:3 Changed 2 years ago by iwakeh

I really want direct unit tests for the new functionality of BlockingIteratorImpl.

There are some (older) System.err.printf statements that should be turned into log statements.

With this change BlockingIteratorImpl is not generic anymore as it now is tightly coupled with Descriptor and DescriptorFile. So, it seems better to either make it a BlockingDescriptorIterator or derive BlockingDescriptorIterator from the generic old implementation (if possible) or find another way of keeping it decoupled from Descriptor/Desc.File.

comment:4 in reply to:  3 Changed 2 years ago by karsten

Replying to iwakeh:

I really want direct unit tests for the new functionality of BlockingIteratorImpl.

Good idea.

There are some (older) System.err.printf statements that should be turned into log statements.

Makes sense to do that.

With this change BlockingIteratorImpl is not generic anymore as it now is tightly coupled with Descriptor and DescriptorFile. So, it seems better to either make it a BlockingDescriptorIterator or derive BlockingDescriptorIterator from the generic old implementation (if possible) or find another way of keeping it decoupled from Descriptor/Desc.File.

True. I guess we could go for the non-generic BlockingDescriptorIterator, because we'll only put Descriptors in it. That was different with DescriptorFile and DescriptorRequest a while ago. But we're going to stop using BlockingIteratorImpl<DescriptorFile> in 2.0.0 anyway, so we could just start a new class now and delete the existing class next week.

But before I make changes here: what's your preference for having a setting a maximum number of descriptors vs. raw descriptor bytes? The former is easier to implement (possibly even with existing Java Collections classes) whereas the latter might be more intuitive to use (people might have a better idea specifying a bytes limit rather than a more abstract descriptor limit). What do you think?

comment:5 Changed 2 years ago by iwakeh

I'd prefer the 'descriptor count' solution. Specifying bytes is in a way giving the illusion one could fine tune the jvm heap here, it hints a closeness to jvm details we don't have and want.
So, the abstract choice of descriptor count and using a Java collection is best in my view.

comment:6 Changed 2 years ago by karsten

Status: needs_reviewneeds_revision

Works for me. I'll work on an updated branch and will post that here. But given the number of changes that might take until late this evening or tomorrow morning. Thanks for the feedback!

comment:7 Changed 2 years ago by karsten

Turns out it was much simpler. The branch here was not based on the #22141 branch, and with the changes there we don't need to consider two different maxima (descriptor files and descriptors). Following that, I'm pushing a tiny commit to the #22141 branch in a minute (and commenting there), and we can close this ticket when #22141 gets merged. (We can still consider switching to a Java Collections class, but we should really do that with more time, not two days before a pretty big release.)

comment:8 Changed 2 years ago by karsten

Resolution: implemented
Status: needs_revisionclosed

Now that #22141 is merged, we can close this ticket. It's implemented.

Note: See TracTickets for help on using tickets.