Opened 4 years ago

Last modified 4 months ago

#20350 assigned enhancement

Replace create-tarball.sh shell script with Java module

Reported by: iwakeh Owned by: metrics-team
Priority: Medium Milestone: CollecTor 2.0.0
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: metrics-2018
Cc: metrics-team Actual Points:
Parent ID: #20518 Points:
Reviewer: Sponsor:

Description (last modified by iwakeh)

This script's should be transferred to java.

The new createtars module should:

  • provide at least the functionality of the script
  • be configurable as other CollecTor modules
  • not impede other modules

Please collect more features and functionality that the script can't/doesn't provide, but which should be part of this module in the comments below.

Child Tickets

Change History (12)

comment:1 Changed 4 years ago by iwakeh

Parent ID: #20518

comment:2 Changed 3 years ago by iwakeh

Description: modified (diff)
Milestone: CollecTor 2.0.0

This module should also remove files from 'out' once they are archived.

comment:3 Changed 3 years ago by karsten

Summary: replace create-tarball.sh shell script with java moduleReplace create-tarball.sh shell script with Java module

Capitalize some words in the summary.

comment:4 Changed 3 years ago by karsten

Keywords: metrics-2018 added

comment:5 Changed 3 years ago by karsten

Owner: set to metrics-team
Status: newassigned

comment:6 Changed 2 years ago by iwakeh

Reviewer: iwakeh

comment:7 Changed 2 years ago by iwakeh

Owner: changed from metrics-team to iwakeh
Reviewer: iwakeh
Status: assignedaccepted

comment:8 Changed 2 years ago by irl

Cc: metrics-team added

Adding metrics-team to cc

comment:9 Changed 2 years ago by iwakeh

Owner: changed from iwakeh to metrics-team
Status: acceptedassigned

comment:10 Changed 2 years ago by iwakeh

This create-tar code should call index-creation whenever reasonable (see patch in ticket #20351).

comment:11 Changed 2 years ago by iwakeh

The implementation should take the situation of #26193 into account and provide a solution, too.

comment:12 Changed 4 months ago by karsten

Looks like we had two tickets for this issue. Here's what I wrote in #31866 which I'm now closing as duplicate:

We're currently generating tarballs using the tar and xz command-line tools triggered by a cronjob. While this is very fast, it doesn't integrate that well with the rest of our code. For example, it would be much easier to extract descriptor types, publication times, and file digests for #31204 if tarball generation happened in Java.

One possible issue might be that generated tarballs are larger or that compression takes longer. This is something I wanted to figure out early, which is why I ran some tests today:

Compression preset level xz XZ for Java 1.6
1 269M 1m27.522s 269M 1m22.905s
3 77M 1m6.590s 77M 1m15.03s
6 30M 3m8.426s 30M 4m54.837s
9 18M 2m56.801s 18M 5m6.998s
9e 16M 7m2.364s NA

We're currently using xz -9e, but I can't find this option in XZ for Java. The closest is compression preset level 9. That means that our tarballs would be 18M/16M = 12.5% larger and be created in 306.998s/422.364s = 73% of the current time.

Is it a blocker that our tarballs would be 12.5% larger? If so, we might try harder to configure XZ for Java in the same way as xz -9e operates, even though that would very likely increase tarball generation time.

Not working on this at the moment, just leaving my thoughts here for discussion and for picking this up as time permits.

Note: See TracTickets for help on using tickets.