Opened 2 years ago

Closed 2 years ago

#22141 closed enhancement (implemented)

Deprecate `DescriptorFile` and add relevant information to `Descriptor`

Reported by: karsten Owned by: karsten
Priority: Medium Milestone: metrics-lib 1.9.0
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: iwakeh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I'd want us to implement #20395 where we'd be able to handle much larger descriptor files without copying all contents to memory before even looking at them. But I realized that DescriptorFile#getDescriptors() makes this rather pointless. If we need to keep a list with all parsed descriptors in memory, each containing raw contents of some sort (see #22140), then what's the point of avoiding to read complete files to memory?

One way to fix this is to deprecate DescriptorFile and add all relevant information to Descriptor. And then DescriptorReader would return an Iterator<Descriptor>, internally backed by BlockingIteratorImpl<Descriptor>, rather than an Iterator<DescriptorFile>. Sounds easy!

Here's a catch though: DescriptorFile#getException() returns "the first exception that was thrown when reading this file or parsing its content". If we'd move this method to Descriptor, would we set an I/O exception to the first descriptor in that file, to all of them, or maybe to none of them? Turns out we don't have to worry too much about this. The only code that actually uses DescriptorFile#getException() is Onionoo's DescriptorQueue, and all it does is log the exception. We could just do the same.

What else could go wrong?

Child Tickets

Change History (40)

comment:1 Changed 2 years ago by karsten

See also some related thoughts on handling I/O exceptions in this ticket.

Also note that DescriptorParser#parseDescriptors() needs to be changed, too, and return Iterator<Descriptor> rather than List<Descriptor>. There may be even more.

comment:2 Changed 2 years ago by iwakeh

Trying to keep the topic's discussion in one place, see #20395 comment:5

comment:3 Changed 2 years ago by iwakeh

Milestone: metrics-lib 1.9.0

comment:4 Changed 2 years ago by karsten

Status: newneeds_review

Please find my branch task-22141 with an initial attempt towards implementing this change. (This ticket somewhat overlaps with #16225, and I wasn't sure where to put this branch.)

comment:5 Changed 2 years ago by iwakeh

Status: needs_reviewneeds_revision

It would be great, if the functionality in readDescriptorFiles could be broken down into
smaller and testable units. I know that there are other places like this in the code, but
we should rather work toward nicer code with new additions like this. (and then apply this
to improve the legacy code)

There are some checkstyle warnings about unused imports and javadoc.

Should the parts concerning #16225 be mentioned there for reference?

comment:6 Changed 2 years ago by karsten

Status: needs_revisionneeds_review

Please find my updated branch with a few tweaks. I also just commented on #16225 to say that this ticket addresses some aspects of what's discussed there. Regarding two suggestions made there:

  • Let's talk about switching to Future. One possible benefit I see is that we could create such an instance and put it into the queue as soon as we have split a descriptor file into descriptor-sized chunks and then parse it in the background. And we might not even do that now but change the interface to doing it in the future. A possible drawback is that the interface might become a little more complicated. I haven't tried out the change yet, so maybe I'm wrong. What other aspects are there?
  • I couldn't think of a good way to avoid returning null in getException(), at least without Optional which we don't have yet. Do you have any ideas?

What do you think of the change made in this ticket more generally? The commit I posted was mostly a conversation starter. It's not too late to do something differently or not change anything at all if there are aspects I have overlooked. Thanks!

comment:7 Changed 2 years ago by iwakeh

DescriptorReaderImpl became more readable. Maybe use Files.walkFileTree and an extension of SimpleFileVisitor to simplify the code which currently is in "readDescriptorFiles"?

Regarding Descriptor.getException: The use case here is to determine, if the Descriptor object is valid and decide whether to use it or drop it?
Then, why not add a Descriptor.valid method returning true or false? The detailed logging of exceptions could be configured if wanted in metrics-lib as an extra logging channel for parse exceptions (this code would need to be added still).

About parsing "future or not" I have more questions than suggestions as it depends highly on the implementation.
What do you have in mind?
It would also introduce some overhead regarding the cancelling, how should that be handled?
How should a client deal with the Futures it receives?

BTW DescriptorImpl: The NoSuchAlgorithmException should be escalated as runtime exceptions, b/c this is a problem with the jdk/jvm and shouldn't be hidden away. I'd suggest using a private method for the two digest calls (please find a suggestion here).

comment:8 in reply to:  7 ; Changed 2 years ago by karsten

Replying to iwakeh:

DescriptorReaderImpl became more readable. Maybe use Files.walkFileTree and an extension of SimpleFileVisitor to simplify the code which currently is in "readDescriptorFiles"?

Good idea! If you don't mind, let us focus on the interface part for the moment, and I'll make this implementation change later on (before we merge). Okay?

Regarding Descriptor.getException: The use case here is to determine, if the Descriptor object is valid and decide whether to use it or drop it?
Then, why not add a Descriptor.valid method returning true or false? The detailed logging of exceptions could be configured if wanted in metrics-lib as an extra logging channel for parse exceptions (this code would need to be added still).

Not quite. The application already knows that it can't use an invalid descriptor, because it's not an instanceof whatever interface it expected. It's just a Descriptor, but not, for example, a RelayServerDescriptor. The application can check whether getException() returns something if it wants to log that or possibly even handle that exception somehow. But it could as well ignore the descriptor of unknown/unexpected subinterface type.

About parsing "future or not" I have more questions than suggestions as it depends highly on the implementation.
What do you have in mind?
It would also introduce some overhead regarding the cancelling, how should that be handled?
How should a client deal with the Futures it receives?

Hmm, to be honest, I don't really know. Maybe we can postpone that discussion, even at the risk of having to wait for 3.0.0.

BTW DescriptorImpl: The NoSuchAlgorithmException should be escalated as runtime exceptions, b/c this is a problem with the jdk/jvm and shouldn't be hidden away. I'd suggest using a private method for the two digest calls (please find a suggestion here).

Good idea. As stated above, I'll do this as soon as we have an interface we like.

I just had a related thought on #22196 which I'll post there in the next hour or so. That thought might affect what we're doing here, so let's briefly switch over there and return here after that. :)

comment:9 in reply to:  8 Changed 2 years ago by iwakeh

Replying to karsten:

Replying to iwakeh:

DescriptorReaderImpl became more readable. Maybe use Files.walkFileTree and an extension of SimpleFileVisitor to simplify the code which currently is in "readDescriptorFiles"?

Good idea! If you don't mind, let us focus on the interface part for the moment, and I'll make this implementation change later on (before we merge). Okay?

Fine, I'll add a new ticket.

Regarding Descriptor.getException: The use case here is to determine, if the Descriptor object is valid and decide whether to use it or drop it?
Then, why not add a Descriptor.valid method returning true or false? The detailed logging of exceptions could be configured if wanted in metrics-lib as an extra logging channel for parse exceptions (this code would need to be added still).

Not quite. The application already knows that it can't use an invalid descriptor, because it's not an instanceof whatever interface it expected. It's just a Descriptor, but not, for example, a RelayServerDescriptor. The application can check whether getException() returns something if it wants to log that or possibly even handle that exception somehow. But it could as well ignore the descriptor of unknown/unexpected subinterface type.

Ok, in that case I'd vote for the InvalidDescriptor interface, which would make the problems with the faulty descriptor object more visible.
With the current setup getException is only useful for logging b/c there is only the mighty DescriptorParseException having the real problem hidden in the message text. The logging would be better achieved by having a "ParseExceptionsLog" logger that can be dis- or enabled via configuration and will spare the client to code logging on top of metrics-lib.

About parsing "future or not" I have more questions than suggestions as it depends highly on the implementation.
What do you have in mind?
It would also introduce some overhead regarding the cancelling, how should that be handled?
How should a client deal with the Futures it receives?

Hmm, to be honest, I don't really know. Maybe we can postpone that discussion, even at the risk of having to wait for 3.0.0.

Yes, there are so many improvements planned and recently introduced, it might be good to let these settle and then decide later what else can be improved.

BTW DescriptorImpl: The NoSuchAlgorithmException should be escalated as runtime exceptions, b/c this is a problem with the jdk/jvm and shouldn't be hidden away. I'd suggest using a private method for the two digest calls (please find a suggestion here).

Good idea. As stated above, I'll do this as soon as we have an interface we like.

I'll add this to the ticket mentioned above.

I just had a related thought on #22196 which I'll post there in the next hour or so. That thought might affect what we're doing here, so let's briefly switch over there and return here after that. :)

Already commented there :-)

comment:10 Changed 2 years ago by karsten

Owner: changed from metrics-team to karsten
Status: needs_reviewassigned

Okay, I'll resume coding on this ticket now and incorporate your suggestion/changes. No need to open a new ticket for the suggestions, I'll do them as part of this ticket. (I just wanted to focus on interface parts earlier, because that discussion is already complex enough.) Thanks!

comment:11 Changed 2 years ago by iwakeh

The new ticket for things to be done after this ticket is #22583
(oh, I didn't notice comment:10 before adding this; well, it won't hurt as these are really unrelated.)

Last edited 2 years ago by iwakeh (previous) (diff)

comment:12 in reply to:  11 Changed 2 years ago by karsten

Replying to iwakeh:

The new ticket for things to be done after this ticket is #22583
(oh, I didn't notice comment:10 before adding this; well, it won't hurt as these are really unrelated.)

Maybe you're right and they should happen independently. Leaving the other ticket as is and excluding these two aspects here.

comment:13 Changed 2 years ago by karsten

I'm afraid I found two flaws in the design above and the suggestion on #22196:

  1. We're using the Iterable<Descriptor> and InvalidDescriptor as one possible returned element as channel for all kinds of exceptions thrown while reading descriptor files. But we don't have such a channel in DescriptorCollector#collectDescriptors which returns, well, void. That means that the design above doesn't solve anything of this issue raised on the related ticket #16225. And ideally we'd handle exceptions in the same way in all descriptor sources.
  2. The idea of replacing the parse history with a minLastModified timestamp doesn't handle I/O problems very well. For example, in the current implementation, if there's a network problem with downloading a potentially large file from CollecTor, we would not include that in the parse history and retry collecting and reading it next time. But with only a timestamp we would simply skip that descriptor file, which is pretty bad.

New plan:

  • We rename InvalidDescriptor to UnparseableDescriptor and let it return the DescriptorParseException that made it unparseable in our view. This instance might be useful to the application, at least by containing the raw descriptor byte[] or descriptor File reference. And if the application produced the input descriptor itself, like CollecTor, knowing about it being unparseable is more important than for a consumer, like Onionoo!
  • We leave the parse history in place and postpone the idea of stateless, overloaded methods. Sad.
  • We add a method removeFromHistoryFile(File) that can be called by the application if it wants to reprocess a descriptor file containing an UnparseableDescriptor. This would not be necessary for I/O exceptions, because those descriptor files would not go into the parse history and retried in the next execution anyway.
  • We log any exceptions on warn level that we caught while collecting or reading or parsing descriptors. The application can't do anything to handle these issues anyway, except for telling the operator that something went wrong. DescriptorParseExceptions thrown while parsing invalid/unparseable descriptors are exempt from this and will only be logged on info level.
  • We could introduce an ExceptionListener for exceptions are than DescriptorParseException, or the "ParseExceptionsLog" that you mentioned (even though I'm not entirely certain what that would be). I don't think it's necessary though, because we wouldn't it use that ourselves, and we don't know whether it would be used by anyone.

Before I write more code (or longer comments!), what do you think? :)

comment:14 Changed 2 years ago by karsten

Status: assignedneeds_review

Okay, I wrote more code based on the plan mentioned earlier today. The only part I left out was the removeFromHistoryFile(File) method, because I didn't find that too important on second thought. And we could still add it if we need it or somebody asks for it.

Please find my updated task-22141 branch with four new commits that might be easiest to review together as a single diff.

comment:15 in reply to:  13 Changed 2 years ago by iwakeh

Replying to karsten:

I'm afraid I found two flaws in the design above and the suggestion on #22196:

  1. We're using the Iterable<Descriptor> and InvalidDescriptor as one possible returned element as channel for all kinds of exceptions thrown while reading descriptor files. But we don't have such a channel in DescriptorCollector#collectDescriptors which returns, well, void. That means that the design above doesn't solve anything of this issue raised on the related ticket #16225. And ideally we'd handle exceptions in the same way in all descriptor sources.

DescriptorCollectorImpl is deprecated:

* @deprecated Replaced by {@link DescriptorIndexCollector} which uses the
 *     remote instance's index.json file as a more robust alternative to parsing
 *     the remote instance's directory listings.

Using the DescriptorIndexCollector implementation no descriptors are parsed during download.
What did I overlook?

  1. The idea of replacing the parse history with a minLastModified timestamp doesn't handle I/O problems very well. For example, in the current implementation, if there's a network problem with downloading a potentially large file from CollecTor, we would not include that in the parse history and retry collecting and reading it next time. But with only a timestamp we would simply skip that descriptor file, which is pretty bad.

New plan:

  • We rename InvalidDescriptor to UnparseableDescriptor and let it return the DescriptorParseException that made it unparseable in our view. This instance might be useful to the application, at least by containing the raw descriptor byte[] or descriptor File reference. And if the application produced the input descriptor itself, like CollecTor, knowing about it being unparseable is more important than for a consumer, like Onionoo!

I assumed that InvalidDescriptor would provide the bytes and other descriptor meta-data, the renaming makes sense.

  • We leave the parse history in place and postpone the idea of stateless, overloaded methods. Sad.

Not so sad, actually. From an API point of view hiding away the history bookkeeping reduces complexity.

  • We add a method removeFromHistoryFile(File) that can be called by the application if it wants to reprocess a descriptor file containing an UnparseableDescriptor. This would not be necessary for I/O exceptions, because those descriptor files would not go into the parse history and retried in the next execution anyway.
  • We log any exceptions on warn level that we caught while collecting or reading or parsing descriptors. The application can't do anything to handle these issues anyway, except for telling the operator that something went wrong. DescriptorParseExceptions thrown while parsing invalid/unparseable descriptors are exempt from this and will only be logged on info level.

I think INFO logging to the regular log could bloat logs unnecessarily; more below.

  • We could introduce an ExceptionListener for exceptions are than DescriptorParseException, or the "ParseExceptionsLog" that you mentioned (even though I'm not entirely certain what that would be). I don't think it's necessary though, because we wouldn't it use that ourselves, and we don't know whether it would be used by anyone.

I think an exception listener is not really necessary. The 'logging channel' suggestion was meant as introducing another Logger as for example is used in Onionoo for statistics: LoggerFactory.getLogger("statistics")....
Such a LoggerFactory.getLogger("ParseExceptionsLog") for all parsing problems would 'channel' these log statements and the final receiver (be it log or /dev/null) can be changed via runtime configuration.
Does that make sense?

Before I write more code (or longer comments!), what do you think? :)

I didn't look at the branch below, yet. Should I now?

comment:16 Changed 2 years ago by karsten

DescriptorIndexCollector indeed does not parse any descriptors, and neither did DescriptorCollectorImpl. But I didn't refer to DescriptorParseException here. We wouldn't be able to include other exceptions than DescriptorParseException, like IOException, in an output by DescriptorIndexCollector, because there's no output. And that was the plan for DescriptorReader, where we were planning to include IOException in a returned InvalidDescriptor. (The code may make this more obvious.)

I also now understand your "ParseExceptionsLog" idea better. Basically, we would create a new structure for logging that is less dependent on classes and more related to operation. Not opposed to that idea. I could imagine that we'd want to look more at the other log statements and see what other channels would be useful. How about we leave that for after 2.0.0 is released?

Can you look at the branch? I think it makes sense to look now. Thanks!

comment:17 in reply to:  16 ; Changed 2 years ago by iwakeh

Replying to karsten:

DescriptorIndexCollector indeed does not parse any descriptors, and neither did DescriptorCollectorImpl. But I didn't refer to DescriptorParseException here. We wouldn't be able to include other exceptions than DescriptorParseException, like IOException, in an output by DescriptorIndexCollector, because there's no output. And that was the plan for DescriptorReader, where we were planning to include IOException in a returned InvalidDescriptor. (The code may make this more obvious.)

Hmm, maybe I forgot about something, but IOE rather belongs to a file than to a descriptor unless the two coincide.

I also now understand your "ParseExceptionsLog" idea better. Basically, we would create a new structure for logging that is less dependent on classes and more related to operation. Not opposed to that idea. I could imagine that we'd want to look more at the other log statements and see what other channels would be useful. How about we leave that for after 2.0.0 is released?

Yes, that's fine or later.

Can you look at the branch? I think it makes sense to look now. Thanks!

I think UnparseableDescriptor should extend Descriptor otherwise there is no access to the raw bytes etc. from the API.

  • getDescriptorFile, getRawDescriptorBytes: should work in Un.Desc. as in other Descr.
  • getAnnotations, getUnrecognizedLines: This behavior needs to be defined and documented in small tests. (Annotations might be corrupt, unrecognized lines might not be identifiable etc.)

Other than these this looks ok as a 1.9.0 release solution.

comment:18 in reply to:  17 ; Changed 2 years ago by karsten

Replying to iwakeh:

Replying to karsten:

DescriptorIndexCollector indeed does not parse any descriptors, and neither did DescriptorCollectorImpl. But I didn't refer to DescriptorParseException here. We wouldn't be able to include other exceptions than DescriptorParseException, like IOException, in an output by DescriptorIndexCollector, because there's no output. And that was the plan for DescriptorReader, where we were planning to include IOException in a returned InvalidDescriptor. (The code may make this more obvious.)

Hmm, maybe I forgot about something, but IOE rather belongs to a file than to a descriptor unless the two coincide.

Sorry for not being clear. Let me try to rephrase.

The issue with DescriptorCollector returning void is that we cannot return any exceptions caught while fetching descriptor files from the remote CollecTor host to the caller. This could be anything from fetching the index.json file or an actual descriptor file to differences between expected and actual file size or even indexing the local target directory.

What we can do is log these exceptions/issues, so that the operator gets aware of them. But we can't hand over exceptions to the application. At least not easily (see ExceptionListener).

But if we can't pass exceptions while collecting descriptors to the application, why should we try hard to do this while reading descriptors? My point is that we can similarly log these warnings and don't have to start creating InvalidDescriptor just to let the application know about these exceptions.

Regardless, we can pass parse exceptions encapsulated in UnparseableDescriptor to the application. That's limited to DescriptorReader and DescriptorParser, because DescriptorCollector doesn't parse and return descriptors.

Does this make more sense?

I also now understand your "ParseExceptionsLog" idea better. Basically, we would create a new structure for logging that is less dependent on classes and more related to operation. Not opposed to that idea. I could imagine that we'd want to look more at the other log statements and see what other channels would be useful. How about we leave that for after 2.0.0 is released?

Yes, that's fine or later.

Okay!

Can you look at the branch? I think it makes sense to look now. Thanks!

I think UnparseableDescriptor should extend Descriptor otherwise there is no access to the raw bytes etc. from the API.

  • getDescriptorFile, getRawDescriptorBytes: should work in Un.Desc. as in other Descr.
  • getAnnotations, getUnrecognizedLines: This behavior needs to be defined and documented in small tests. (Annotations might be corrupt, unrecognized lines might not be identifiable etc.)

Ugh, totally. That's an oversight. Fixed in a new commit in the same branch.

Other than these this looks ok as a 1.9.0 release solution.

Awesome! I'm also optimistic that this will work and that we won't run into too many yet unforseen issues. I'll test this branch (with the extends Descriptor fix) in Onionoo and a few other applications.

Thanks!

comment:19 in reply to:  18 ; Changed 2 years ago by iwakeh

Replying to karsten:

[snip]

The issue with DescriptorCollector returning void is that we cannot return any exceptions caught while fetching descriptor files from the remote CollecTor host to the caller. This could be anything from fetching the index.json file or an actual descriptor file to differences between expected and actual file size or even indexing the local target directory.

What we can do is log these exceptions/issues, so that the operator gets aware of them. But we can't hand over exceptions to the application. At least not easily (see ExceptionListener).

Well, throwing an Exception is a way to communicate problems to the caller. Anyway, there are not that many distinguishable error situations; e.g. w/o index.json there is no descriptor downloading -> Exception.

But if we can't pass exceptions while collecting descriptors to the application, why should we try hard to do this while reading descriptors? My point is that we can similarly log these warnings and don't have to start creating InvalidDescriptor just to let the application know about these exceptions.

Regardless, we can pass parse exceptions encapsulated in UnparseableDescriptor to the application. That's limited to DescriptorReader and DescriptorParser, because DescriptorCollector doesn't parse and return descriptors.

I chose the name InvalidDescriptor in the beginning just to have a name; 'unparsable' is a better prefix. In general this type should be there to communicate that there is a descriptor that couldn't be parsed and provide all information accessible w/o proper parsing. The exception itself is not (yet) of much use.

Does this make more sense?

I think our ideas/concepts converge :-)

I also now understand your "ParseExceptionsLog" idea better. Basically, we would create a new structure for logging that is less dependent on classes and more related to operation. Not opposed to that idea. I could imagine that we'd want to look more at the other log statements and see what other channels would be useful. How about we leave that for after 2.0.0 is released?

Yes, that's fine or later.

Okay!

Added a comment to #16225 as reminder.

Can you look at the branch? I think it makes sense to look now. Thanks!

I think UnparseableDescriptor should extend Descriptor otherwise there is no access to the raw bytes etc. from the API.

  • getDescriptorFile, getRawDescriptorBytes: should work in Un.Desc. as in other Descr.
  • getAnnotations, getUnrecognizedLines: This behavior needs to be defined and documented in small tests. (Annotations might be corrupt, unrecognized lines might not be identifiable etc.)

Ugh, totally. That's an oversight. Fixed in a new commit in the same branch.

Taking a look at the update next.

Other than these this looks ok as a 1.9.0 release solution.

Awesome! I'm also optimistic that this will work and that we won't run into too many yet unforseen issues. I'll test this branch (with the extends Descriptor fix) in Onionoo and a few other applications.

Thanks!

comment:20 Changed 2 years ago by iwakeh

Oh, only the interface was added.
What could getAnnotations, getUnrecognizedLines return when called on an UnparseableDescriptor?

comment:21 in reply to:  19 Changed 2 years ago by karsten

Replying to iwakeh:

Replying to karsten:

[snip]

The issue with DescriptorCollector returning void is that we cannot return any exceptions caught while fetching descriptor files from the remote CollecTor host to the caller. This could be anything from fetching the index.json file or an actual descriptor file to differences between expected and actual file size or even indexing the local target directory.

What we can do is log these exceptions/issues, so that the operator gets aware of them. But we can't hand over exceptions to the application. At least not easily (see ExceptionListener).

Well, throwing an Exception is a way to communicate problems to the caller. Anyway, there are not that many distinguishable error situations; e.g. w/o index.json there is no descriptor downloading -> Exception.

A missing index.json might indeed qualify for throwing an Exception, because DescriptorIndexCollector cannot recover from that. But there are other exception cases, like an issue with downloading a specific file, that DescriptorIndexCollector can work around by trying the next file. And if we want to provide best effort results, we'll have to recover from as many issues as possible. In such a case, where would we throw an Exception?

comment:22 in reply to:  20 ; Changed 2 years ago by karsten

Replying to iwakeh:

Oh, only the interface was added.
What could getAnnotations, getUnrecognizedLines return when called on an UnparseableDescriptor?

Right now they don't return anything, that is, they return an empty list or null.

But in theory we could return whatever annotations or unrecognized lines we found until running into a DescriptorParseException. I have an idea how to build this with relatively little code changes, but I'll hold off until I hear whether that's even a good idea.

Another solution that I don't like as much is that we introduce a new interface ParsedDescriptor that extends Descriptor and that other (parsed) descriptors extend. That new interface could then contain these two methods, so they're not available in UnparseableDescriptor. The part that I don't like there is that we'd have to deprecate the current methods in Descriptor, and applications depending on them would have to downcast their Descriptor instances to ParsedDescriptor instances just to get the annotations (or unrecognized lines). Slightly eww.

What do you think?

comment:23 in reply to:  22 Changed 2 years ago by iwakeh

Replying to karsten:

Replying to iwakeh:

Oh, only the interface was added.
What could getAnnotations, getUnrecognizedLines return when called on an UnparseableDescriptor?

Right now they don't return anything, that is, they return an empty list or null.

But in theory we could return whatever annotations or unrecognized lines we found until running into a DescriptorParseException. I have an idea how to build this with relatively little code changes, but I'll hold off until I hear whether that's even a good idea.

No client would really want to use the possibly mangled content of these two methods, would they?
A client would -- if at all -- use the raw bytes.

Another solution that I don't like as much is that we introduce a new interface ParsedDescriptor that extends Descriptor and that other (parsed) descriptors extend. That new interface could then contain these two methods, so they're not available in UnparseableDescriptor. The part that I don't like there is that we'd have to deprecate the current methods in Descriptor, and applications depending on them would have to downcast their Descriptor instances to ParsedDescriptor instances just to get the annotations (or unrecognized lines). Slightly eww.

What do you think?

I'd rather state in the javadoc that there won't be any useful content to expect.
The methods could be simply returning the default or even be implemented to throw an exception "Not available". That'll be sufficient.

comment:24 Changed 2 years ago by karsten

Where in the JavaDocs would you state that there won't be any useful content to expect? In Descriptor or in overridden methods in UnparseableDescriptor?

By the way, I'm still busy testing the changed branch in applications. One issue I ran into is that I didn't find an easy way to use the new readDescriptors() method in the CollecTor sync part. Would you mind replacing that code there, just to see that the new interface works for it, too?

comment:25 in reply to:  24 Changed 2 years ago by iwakeh

Replying to karsten:

Where in the JavaDocs would you state that there won't be any useful content to expect? In Descriptor or in overridden methods in UnparseableDescriptor?

Only in the overridden methods.
Actually, it might be better to throw the "not available" runtime exceptions in order to avoid future bug-hunts when the getAnnotations, getUnrecognizedLines methods get called on an Unp.Descr.
The exceptions would make the code fail early and clearly point to the problem.

By the way, I'm still busy testing the changed branch in applications. One issue I ran into is that I didn't find an easy way to use the new readDescriptors() method in the CollecTor sync part. Would you mind replacing that code there, just to see that the new interface works for it, too?

I'll take a look.

comment:26 Changed 2 years ago by karsten

Hmm, I'm not a fan of throwing "not available" runtime exceptions. That feels a lot like bad design, which we can, in theory, fix with the move to 2.0.0. But if we don't, any objections if we just document this in the overridden method and just return the default null or empty something?

comment:27 Changed 2 years ago by iwakeh

Throwing such an exception is a valid design decision, see for example collection interface, the `UnsupportedOperationException`.

The advantage is early failure with a direct message avoiding lengthy bug-hunts for empty/null values.
I also would prefer to avoid (accidentally) relying on the empty/null return values for whatever future use-case solution.

(This behavior should maybe also be documented in the Descriptor interface.)

Last edited 2 years ago by iwakeh (previous) (diff)

comment:28 Changed 2 years ago by iwakeh

Cc: iwakeh added

CC'ing myself in order to receive notifications.

comment:29 Changed 2 years ago by karsten

Okay. Can you write a patch?

comment:30 Changed 2 years ago by iwakeh

sure.

comment:31 Changed 2 years ago by iwakeh

Please review this commit.

comment:32 Changed 2 years ago by karsten

Thanks! Please review my task-22141-2 branch, which contains your patch, is rebased to master, contains more detailed change log entries, has a meaningful commit message for the big commit, and includes another commit with a small improvement found while testing.

comment:33 Changed 2 years ago by iwakeh

I partially reviewed, but would now want to wait for the rebase on the code of #22634.

comment:34 Changed 2 years ago by karsten

Makes sense! Please find my task-22141-3 branch that is rebased to master (including the #22634 fix).

comment:35 Changed 2 years ago by karsten

Status: needs_reviewneeds_revision

Hrmm, I just realized that we'll have to implement #22514 before we can get to this ticket. Right now, readDescriptors(File...) uses the int value set in setMaxDescriptorFilesInQueue(int) as maximum queue size, which is wrong. We'll need either a setMaxDescriptorsInQueue(int) or the suggested setMaxRawDescriptorBytesInQueue(long) before we can merge this branch. So, let's work on #22514 first!

(In the meantime it would still be useful to know whether the new interface will work with CollecTor's sync part!)

comment:36 in reply to:  24 Changed 2 years ago by iwakeh

Replying to karsten:
...

By the way, I'm still busy testing the changed branch in applications. One issue I ran into is that I didn't find an easy way to use the new readDescriptors() method in the CollecTor sync part. Would you mind replacing that code there, just to see that the new interface works for it, too?

Please see the branch attached to #22652.

comment:37 Changed 2 years ago by karsten

Status: needs_revisionneeds_review

Regarding #22514, please find another tiny commit on top of my existing task-22141-3 branch. Basically, all we need is a new setter and instance variable. That was easier than expected.

comment:38 Changed 2 years ago by iwakeh

Status: needs_reviewmerge_ready

Tests and checks pass, code looks ok.
I have another commit here.

Many improvements can follow once the deprecated methods are removed. For that there is #22154.

comment:39 Changed 2 years ago by karsten

Hmm, I'm not yet sure whether that commit makes things better or worse. In any case it requires a bit of discussion, and I'd rather postpone it until 1.9.0 and possibly even 2.0.0 are out, if you don't mind.

I'll wait until #22652 is resolved before rebasing, squashing, and merging my task-22141-3 branch. Thanks for checking!

comment:40 Changed 2 years ago by karsten

Resolution: implemented
Status: merge_readyclosed

Tests with #22652 look good, except for an unrelated fix that I'm going to post there in a bit. But it looks like this change won't break existing applications. Great!

Rebased to master, squashed, and pushed. Closing. (I'll open a ticket for that static-methods patch.) Thanks for all the reviews and input!

Note: See TracTickets for help on using tickets.