Opened 5 months ago

Last modified 5 months ago

#33692 new task

Add Git repository containing lots of large files

Reported by: karsten Owned by: tor-gitadm
Priority: Medium Milestone:
Component: Internal Services/Service - git Version:
Severity: Normal Keywords:
Cc: metrics-team, pili Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


I have been working on a Git repository that I'm using for running integration tests of metrics code bases. That repository contains libraries (to avoid downloading them from over and over), a given start state (Tor descriptors, files written in a previous execution), and an expected results (CSV files, JSON files). Here are the current file sizes for testing two metrics code bases (metrics-web and Onionoo):

Files Total size (MiB) # of files
Expected results 313 60455
Provided libraries 0.2 2
Provided state 626 402

I'm currently hosting this repository at GitHub, but I'd like to move this over to Tor's Git server at some point. The total file size and possibly the number of files are what stop me right now. But the repository really belongs on the Tor server in some form.

Do we support Git large file storage or something similar? If so, how do I use it? (I never used it before and could try one of the tutorials on the internet, but maybe I should pay special attention to something before hitting git push?)

Is the number of expected results files going to be problematic? If so, I can probably tar them up and un-tar them on disk when running tests. Of course, then it's going to be a single binary large file, and when a single contained file changes, the whole file changes, too. What's the preference here?

Child Tickets

Change History (4)

comment:1 Changed 5 months ago by irl

git.tpo has not support for LFS or git-annex. The gitolite version we are using is manually installed and unlikely to get upgraded in the near future to support these options.

comment:2 Changed 5 months ago by karsten

I see. Are there any alternatives for the use case I described?

comment:3 Changed 5 months ago by irl

One option would be to use git-annex, with web remotes. You would create a git-annex repository and commit the smaller files to it to be version controlled as normal. For larger files, you would upload them to your space and then include them using the web special remote.

This would allow the large files to be fetched with only their metadata stored in the git repository. If at a later time we support git-annex on the gitolite, you'd only have one command to run to migrate it into the gitolite setup.

You might want to play with git-annex a bit locally before you try to upload things, but joeyh has made a bunch of easy to follow tutorials.

It's possible to use git-annex with a git LFS remote, so if we update gitolite in the future then we can stop relying on the people.tpo files.

comment:4 Changed 5 months ago by pili

Cc: pili added
Note: See TracTickets for help on using tickets.