Repack Command not found #8

sepatel · 2023-12-22T21:16:38Z

Description

I have a need to repack a git repo after it has been closed down. This is due to some of the pack files being over 1gb in size and causing our systems to run out of memory as a result. While locally I can achieve this by doing git repack --max-pack-size=100m -Ad, our production systems do not have git installed so no dropping down to shell environment is allowed. The jgit library is the only means I have to make this adjustment (unless there is an option with clone that I didn't find).

Thus I really need the ability to either repack the repo, or some kind of improved memory handling that causes it to not run out of memory in production when walking the tree of a pack that is really large.

Motivation

It is a core bit of git functionality and would help prevent out of memory issues when working with poorly packed git repositories.

Alternatives considered

Locally I can use git repack --max-pack-size=100m -Ad to work around the memory problems but in production that isn't really an option to run by hand as git is not installed. Only jgit can work with the code.

Additional context

Perhaps a way to clone it with a max pack size? Unsure how that would work as I didn't see a way to do that via the git cli.

The text was updated successfully, but these errors were encountered:

msohn · 2024-02-26T20:21:02Z

JGit accesses objects in pack files via the WindowCache loading the raw data in pages. It doesn't load complete pack files into memory, though it fully caches pack indexes in memory.

Page size and cache size can be configured using the options core.packedGitWindowSizeand core.packedGitLimit.
See [1].

Hence I don't understand how running out of memory and repacking pack files is directly related.
At the moment JGit doesn't expose an API to only run the repack part of a full gc.

[1] https://github.com/eclipse-jgit/jgit/blob/master/Documentation/config-options.md

sepatel · 2024-02-27T18:24:05Z

Hence I don't understand how running out of memory and repacking pack files is directly related. At the moment JGit doesn't expose an API to only run the repack part of a full gc.

I cannot say that I can explain why it is, I do know that for some of the repos where there is a single pack file of 800mb or more that the reading of a file from the repo (not always but usually with the older commit ids) leads to an out of memory error. But if I by hand repack the files so that the largest pack file is 100mb, the same commands run fine. It was super hard to track down as the memory of the system (2gb ram systems) jump from around 120mb heap space used to an OOM error within a second or two of some of the known file reads and the stack traces (I don't have any to reference at the moment as this was some time ago now) said something that led me chasing down the pack size as the root issue.

I was never able to tell how much memory was used when I shrunk down the pack size to 100mb because I guess the JVM recovered or did whatever and I don't have a way to intercept how much heap space was being used in the middle of the jgit call only before/after the reading was done.

I'll take a look nd see if the configuration options you've mentioned will be of assistance, maybe it was really those which were the problem and it looked related to the pack size for different reasons.

Edit: @msohn a dumb question but could core.packedIndexGitUseStrongRefs maybe be a thing? It defaults to true, seems to be referenced to packed index, and says in the docs that it'll only drop references when the heap space is low if it is set to false which is not the default? Could perhaps the indexing of the packed files be the thing using up excessive amounts of RAM?

msohn · 2024-03-04T22:10:49Z

If core.packedIndexGitUseStrongRefs=true the jgit pack index cache uses strong references to cache the pack index data. This has the consequence that the JVM cannot free the memory used for caching loaded pack indexes when it runs short on free heap space. You can try to set this option to false to use soft references instead which allows the JVM to reclaim the memory used to cache pack indexes. This may reduce memory consumption but will slow down access to pack index content since it needs to be reloaded from the filesystem if the JVM removed softly referenced objects from the heap.

If you need more details you probably need to create heap dumps and analyze them e.g. using Eclipse memory analyzer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repack Command not found #8

Repack Command not found #8

sepatel commented Dec 22, 2023

msohn commented Feb 26, 2024

sepatel commented Feb 27, 2024 •

edited

msohn commented Mar 4, 2024 •

edited

Repack Command not found #8

Repack Command not found #8

Comments

sepatel commented Dec 22, 2023

Description

Motivation

Alternatives considered

Additional context

msohn commented Feb 26, 2024

sepatel commented Feb 27, 2024 • edited

msohn commented Mar 4, 2024 • edited

sepatel commented Feb 27, 2024 •

edited

msohn commented Mar 4, 2024 •

edited