#25329 closed enhancement (fixed)

Enable metrics-lib to process large (> 2G) logfiles

Reported by: iwakeh Owned by: iwakeh
Priority: Very High Milestone:
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: #25317 Points:
Reviewer: Sponsor:

Description

Metrics-lib receives compressed logs, usually of sizes below 600kB. As this can be dealt with in-memory, this ticket is about handling the logs that deflate to larger files (approx. 2G).

Commons-compressed doesn't provide methods for determining the deflated content size (as the command line tool xz does). Other compression types metrics-lib supports have this option, but it also would require more changes.

Compression can be very effective. Thus, using a cut-off compressed size is sort of arbitrary. An example for xz compression: the 3G deflated log has 589492 compressed input array length; using extreme compression it even shrinks to a length of 405480; on the other hand a deflated 64M file can have an input array of 509212 length.

For handling larger log files with metrics-lib some interface changes will be necessary. Here a suggestion:

 public interface LogDescriptor extends Descriptor {
 
   /**
-   * Returns the decompressed raw descriptor bytes of the log.
+   * Returns the compressed raw descriptor bytes of the log.
+   *
+   * <p>For access to the log's decompressed bytes
+   * use method {@code decompressedByteStream}.</p>
+   *
    * @since 2.2.0
    */

   public byte[] getRawDescriptorBytes();
 
   /**
+   * Returns the decompressed raw descriptor bytes of the log as stream.
+   *
+   * @since 2.2.0
+   */
+  public InputStream decompressedByteStream();
+

I think this might be easiest to understand and use; and of course the implementation wouldn't need to change processing for large and 'normal' logs. It also avoids deciding about the method to find out if a file is large or not.

Thoughts?

Child Tickets

Change History (7)

comment:1 Changed 20 months ago by iwakeh

Cc: metrics-team added
Owner: changed from metrics-team to iwakeh
Priority: MediumVery High
Status: newaccepted

Setting to very high as #25317 depends on this.

comment:2 Changed 20 months ago by iwakeh

Status: acceptedneeds_information

comment:3 Changed 20 months ago by karsten

And the reason to have getRawDescriptorBytes() return compressed raw descriptor bytes is that it has to return something, as it's overridden from Descriptor?

If so, works for me.

If you make this change, please look out for surrounding comments that need changing. Thanks!

comment:4 in reply to:  3 Changed 20 months ago by iwakeh

Status: needs_informationaccepted

Replying to karsten:

And the reason to have getRawDescriptorBytes() return compressed raw descriptor bytes is that it has to return something, as it's overridden from Descriptor?

Exactly!

If so, works for me.

If you make this change, please look out for surrounding comments that need changing. Thanks!

Sure.

comment:5 Changed 20 months ago by iwakeh

Status: acceptedneeds_review

Please review this patch branch based on the current master metrics-lib.

This is the basis for the patch branch of ticket #25317.

comment:6 Changed 20 months ago by karsten

Status: needs_reviewmerge_ready

As stated on #25161, this is ready to be merged. Will do so when merging #23046.

comment:7 Changed 20 months ago by karsten

Resolution: fixed
Status: merge_readyclosed

Merged, closing. Thanks!

Note: See TracTickets for help on using tickets.