Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation on huge repositories #27

Open
bozaro opened this issue Dec 9, 2016 · 9 comments
Open

Performance degradation on huge repositories #27

bozaro opened this issue Dec 9, 2016 · 9 comments

Comments

@bozaro
Copy link
Owner

bozaro commented Dec 9, 2016

There are performance degradation on huge repositories (>250 000 objects).
Look like root cause is too big file count. Much better generate pack files per every 10 000 objects.

@mmv
Copy link

mmv commented Dec 23, 2016

Hi @bozaro
I noticed that there was a significant performance degradation between release 0.2.4 and 0.2.5.

For the same repository

# with release 0.2.4:
[main] INFO git.lfs.migrate.Main - Convert time: 53224470

real    887m6.923s
user    1578m23.016s
sys     33m7.852s


# with release 0.2.5
[main] INFO git.lfs.migrate.Main - Convert time: 140884059

real    2348m5.986s
user    2231m8.880s
sys     191m58.924s

Also, while running with 0.2.5, writing of the objects seems to slow down quite a bit after a few of them are written.

While the process was outputting things like

[main] INFO git.lfs.migrate.Main -   processed: 1356541/1523578
[main] INFO git.lfs.migrate.Main -   processed: 1356542/1523578

I took a few file counts on the target directory:

$ find objects/ -type f | wc -l ; \
  find lfs/ -type f | wc -l ; \
  sleep 1 ; \
  find objects/ -type f | wc -l ; \
  find lfs/ -type f | wc -l
2305710
275383
2305711
275383

Regarding memory compsumption, at about the same time,

$ ps -eo vsize,rssize,cmd
25498152 17806716 java -Xmx20g -Xms20g ...

And during this stage I had only 1 CPU core being used at 100% by the conversion essentially in user time. During the whole process I noticed no significant CPU wait time; it was mostly spent on user (and a bit on sys).

The filesystem was Btrfs on a SSD disk.

@comicfans
Copy link

How to pack files per every 10 000 objects ? I tried to convert a big repo with lfs-test-server, but after convert (about 4 days), lfs-test-server only has file name meta, no files appears inside its folder (I've tried with same setup and small repo and it's success), it's too slow to debug.

@leth
Copy link
Contributor

leth commented Feb 3, 2017

Between 0.2.4 and 0.2.5 the commit most likely to impact performance was 974270d; which was aimed at reducing memory consumption.

Looks like it dropped a DAG library in favour of local implementation of commit graph tracking.
If this is the cause perhaps there's another way of solving the memory issue, without dropping the performance gains of using a DAG library.

@pwagland
Copy link

pwagland commented Mar 30, 2017

While trying to convert a 3.6Gb repo to LFS, I noticed a dramatic slowdown at around 1289037/1371200 objects. It might have been slowing down before that… but I see the following:

[main] INFO git.lfs.migrate.Main -   processed: 10984/1371200
[main] INFO git.lfs.migrate.Main -   processed: 11815/1371200

and

[main] INFO git.lfs.migrate.Main -   processed: 1176360/1371200
[main] INFO git.lfs.migrate.Main -   processed: 1176644/1371200

and

[main] INFO git.lfs.migrate.Main -   processed: 1289268/1371200
[main] INFO git.lfs.migrate.Main -   processed: 1289273/1371200

So from around 600-700 objects per second to about 5. After leaving this running for a while it seems to slow even further to about 1 a second:

[main] INFO git.lfs.migrate.Main -   processed: 1307815/1371200
[main] INFO git.lfs.migrate.Main -   processed: 1307816/1371200
[main] INFO git.lfs.migrate.Main -   processed: 1307817/1371200
[main] INFO git.lfs.migrate.Main -   processed: 1307818/1371200

Running the visualVM sampler over it, I see the following percentages:

git.lfs.migrate.Main.main()	100.0	46,804 ms (100%)	46,804 ms
 git.lfs.migrate.Main.processRepository()	100.0	46,804 ms (100%)	46,804 ms
  git.lfs.migrate.Main.processSingleThread()	100.0	46,804 ms (100%)	46,804 ms
   git.lfs.migrate.GitConverter.convertTask()	97.000435	45,400 ms (97%)	45,400 ms
    org.eclipse.jgit.revwalk.RevWalk.parseAny()	96.787125	45,300 ms (96.8%)	45,300 ms
     org.eclipse.jgit.lib.ObjectReader.open()	82.76274	38,736 ms (82.8%)	38,736 ms
      org.eclipse.jgit.internal.storage.file.WindowCursor.open()	82.76274	38,736 ms (82.8%)	38,736 ms
       org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject()	82.76274	38,736 ms (82.8%)	38,736 ms
        org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate()	82.76274	38,736 ms (82.8%)	38,736 ms
         org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject()	82.76274	38,736 ms (82.8%)	38,736 ms
          org.eclipse.jgit.internal.storage.file.PackFile.get()	82.76274	38,736 ms (82.8%)	38,736 ms
           org.eclipse.jgit.internal.storage.file.PackFile.load()	82.76274	38,736 ms (82.8%)	38,736 ms
            org.eclipse.jgit.internal.storage.file.PackFile.decompress()	79.77893	37,339 ms (79.8%)	37,339 ms
             org.eclipse.jgit.internal.storage.file.WindowCursor.inflate()	79.77893	37,339 ms (79.8%)	37,339 ms
              java.util.zip.Inflater.inflate()	61.16066	28,625 ms (61.2%)	28,625 ms
              org.eclipse.jgit.internal.storage.file.WindowCursor.prepareInflater()	16.888159	7,904 ms (16.9%)	7,904 ms
               java.util.zip.Inflater.reset()	16.888159	7,904 ms (16.9%)	7,904 ms

So 96.8% of the time is in parsing the revision, and it becomes really slow at a certain point. Hope that this information helps.

@eberlid
Copy link

eberlid commented Jun 25, 2017

Are there any ideas how to fix the problem?

@szakib
Copy link

szakib commented Mar 3, 2022

I am having this problem in 2022 with the latest version. It is using ~1 out of 32 cores, ~1GB RAM out of 64GB available, and <10%disk I/O. And the import has been running for 13 days so far. :D Is there a workaround for forcing it to be more parallel and/or use more RAM?

@leth
Copy link
Contributor

leth commented Mar 3, 2022

This repository hasn't changed since 2016, it's a miracle it still works!
You could try reverting 974270d yourself locally, or just using version 0.2.4.
Or look at the 'network' around this repo to find forks which people have picked up and fixed/maintained!

It has been 5 years, I'm no longer certain which I did at the time, but have a vague recollection that the commit reverted cleanly!

@szakib
Copy link

szakib commented Mar 3, 2022

@leth Thanks for attempting to help. This seems to be a part of the current official release of Git (I have git-lfs/3.0.2 (GitHub; windows amd64; go 1.17.2)), what would be the way of getting this change reverted in a new release?

@leth
Copy link
Contributor

leth commented Mar 3, 2022

Sorry, I have a contributor badge here because my PR was merged once, I have no permissions on this project!

It sounds like you'd need to find the project you downloaded git/github/git-lfs from and let them know they're bundling an unmaintained tool with known bugs 🤷🏻‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants