Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repack Command not found #8

Open
sepatel opened this issue Dec 22, 2023 · 3 comments
Open

Repack Command not found #8

sepatel opened this issue Dec 22, 2023 · 3 comments

Comments

@sepatel
Copy link

sepatel commented Dec 22, 2023

Description

I have a need to repack a git repo after it has been closed down. This is due to some of the pack files being over 1gb in size and causing our systems to run out of memory as a result. While locally I can achieve this by doing git repack --max-pack-size=100m -Ad, our production systems do not have git installed so no dropping down to shell environment is allowed. The jgit library is the only means I have to make this adjustment (unless there is an option with clone that I didn't find).

Thus I really need the ability to either repack the repo, or some kind of improved memory handling that causes it to not run out of memory in production when walking the tree of a pack that is really large.

Motivation

It is a core bit of git functionality and would help prevent out of memory issues when working with poorly packed git repositories.

Alternatives considered

Locally I can use git repack --max-pack-size=100m -Ad to work around the memory problems but in production that isn't really an option to run by hand as git is not installed. Only jgit can work with the code.

Additional context

Perhaps a way to clone it with a max pack size? Unsure how that would work as I didn't see a way to do that via the git cli.

@msohn
Copy link
Member

msohn commented Feb 26, 2024

JGit accesses objects in pack files via the WindowCache loading the raw data in pages. It doesn't load complete pack files into memory, though it fully caches pack indexes in memory.

Page size and cache size can be configured using the options core.packedGitWindowSizeand core.packedGitLimit.
See [1].

Hence I don't understand how running out of memory and repacking pack files is directly related.
At the moment JGit doesn't expose an API to only run the repack part of a full gc.

[1] https://github.com/eclipse-jgit/jgit/blob/master/Documentation/config-options.md

@sepatel
Copy link
Author

sepatel commented Feb 27, 2024

Hence I don't understand how running out of memory and repacking pack files is directly related. At the moment JGit doesn't expose an API to only run the repack part of a full gc.

I cannot say that I can explain why it is, I do know that for some of the repos where there is a single pack file of 800mb or more that the reading of a file from the repo (not always but usually with the older commit ids) leads to an out of memory error. But if I by hand repack the files so that the largest pack file is 100mb, the same commands run fine. It was super hard to track down as the memory of the system (2gb ram systems) jump from around 120mb heap space used to an OOM error within a second or two of some of the known file reads and the stack traces (I don't have any to reference at the moment as this was some time ago now) said something that led me chasing down the pack size as the root issue.

I was never able to tell how much memory was used when I shrunk down the pack size to 100mb because I guess the JVM recovered or did whatever and I don't have a way to intercept how much heap space was being used in the middle of the jgit call only before/after the reading was done.

I'll take a look nd see if the configuration options you've mentioned will be of assistance, maybe it was really those which were the problem and it looked related to the pack size for different reasons.

Edit: @msohn a dumb question but could core.packedIndexGitUseStrongRefs maybe be a thing? It defaults to true, seems to be referenced to packed index, and says in the docs that it'll only drop references when the heap space is low if it is set to false which is not the default? Could perhaps the indexing of the packed files be the thing using up excessive amounts of RAM?

@msohn
Copy link
Member

msohn commented Mar 4, 2024

If core.packedIndexGitUseStrongRefs=true the jgit pack index cache uses strong references to cache the pack index data. This has the consequence that the JVM cannot free the memory used for caching loaded pack indexes when it runs short on free heap space. You can try to set this option to false to use soft references instead which allows the JVM to reclaim the memory used to cache pack indexes. This may reduce memory consumption but will slow down access to pack index content since it needs to be reloaded from the filesystem if the JVM removed softly referenced objects from the heap.

If you need more details you probably need to create heap dumps and analyze them e.g. using Eclipse memory analyzer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants