Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM when processing really large git logs #23

Open
rjayasinghe opened this issue Sep 17, 2015 · 8 comments
Open

OOM when processing really large git logs #23

rjayasinghe opened this issue Sep 17, 2015 · 8 comments
Labels

Comments

@rjayasinghe
Copy link

Hi!

I tried to process a pretty large git log from a private git repo. I increased to max heap to 4GB but it still did not help. Much more heap would not go as my laptop's memory is limited.

Best Regards,
Robin

@adamtornhill
Copy link
Owner

Hi @rjayasinghe

I've analyzed fairly rich Git repositories (e.g. Rails with 10 years history, Mono with +10 years) and Code Maat's memory usage stays around 1.3 GB on those. I think your issue has to do with some pattern in your input data combined with some inefficiency in the analysis algorithms.

What analysis did you run?

Would it be possible for you to send me the git log? That would allow me to debug it. In the meantime I'd recommend that you use a shorter analysis time span until I've addressed the real problem.

@rjayasinghe
Copy link
Author

Hi!

Sorry, I cannot share the git log. It's built from a +10GB repository with ~15 years of history.

This is how I called code-maat:

java -Xmx4g -jar code-maat-0.9.2-SNAPSHOT-standalone.jar -l 

I know it's not very helpful if I cannot share the git log - but I at least wanted to share that your analysis algorithms run into problems when analyzing really large data sets..

Best Regards,
Robin

@adamtornhill
Copy link
Owner

Alright, no problem. I will see if I can find some even larger open-source project where I can reproduce the problem.

Did any of the analyses work? For example, try -a identity. That would help me to isolate the potential problem.

@rjayasinghe
Copy link
Author

-a identity resulted in OOM as well:

WARNING: update already refers to: #'clojure.core/update in namespace: incanter.core, being replaced by: #'incanter.core/update
Exception in thread "main" java.lang.OutOfMemoryError
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:536)
        at java.util.concurrent.ForkJoinTask.reportResult(ForkJoinTask.java:596)
        at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:640)
        at java.util.concurrent.ForkJoinPool.invoke(ForkJoinPool.java:1521)
        at clojure.core.reducers$fjinvoke.invoke(reducers.clj:49)
        at clojure.core.reducers$foldvec.invoke(reducers.clj:341)
        at clojure.core.reducers$fn__1915.invoke(reducers.clj:362)
        at clojure.core.reducers$fn__1798$G__1793__1809.invoke(reducers.clj:81)
        at clojure.core.reducers$fold.invoke(reducers.clj:98)
        at code_maat.parsers.hiccup_based_parser$parse_from.invoke(hiccup_based_parser.clj:139)
        at code_maat.parsers.hiccup_based_parser$parse_log.invoke(hiccup_based_parser.clj:158)
        at code_maat.parsers.git2$parse_log.invoke(git2.clj:74)
        at code_maat.app.app$git2__GT_modifications$fn__9421.invoke(app.clj:133)
        at code_maat.app.app$run_parser_in_error_handling_context.invoke(app.clj:97)
        at code_maat.app.app$git2__GT_modifications.invoke(app.clj:132)
        at code_maat.app.app$parse_commits_to_dataset.invoke(app.clj:202)
        at code_maat.app.app$run.invoke(app.clj:215)
        at code_maat.cmd_line$_main.doInvoke(cmd_line.clj:66)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at code_maat.cmd_line.main(Unknown Source)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at clojure.lang.PersistentHashMap.cloneAndSet(PersistentHashMap.java:1169)
        at clojure.lang.PersistentHashMap.access$000(PersistentHashMap.java:28)
        at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:418)
        at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:142)
        at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:28)
        at clojure.lang.RT.assoc(RT.java:778)
        at clojure.core$assoc__4142.invoke(core.clj:191)
        at clojure.lang.Atom.swap(Atom.java:65)
        at clojure.core$swap_BANG_.invoke(core.clj:2240)
        at instaparse.gll$node_get.invoke(gll.clj:286)
        at instaparse.gll$push_listener.invoke(gll.clj:339)
        at instaparse.gll$non_terminal_parse.invoke(gll.clj:818)
        at instaparse.gll$_parse.invoke(gll.clj:119)
        at instaparse.gll$push_listener$fn__1307.invoke(gll.clj:348)
        at instaparse.gll$step.invoke(gll.clj:409)
        at instaparse.gll$run.invoke(gll.clj:427)
        at instaparse.gll$run.invoke(gll.clj:413)
        at instaparse.gll$parse.invoke(gll.clj:894)
        at instaparse.core$parse.doInvoke(core.clj:91)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at code_maat.parsers.hiccup_based_parser$parse_with.invoke(hiccup_based_parser.clj:27)
        at clojure.core$partial$fn__4527.invoke(core.clj:2493)
        at code_maat.parsers.hiccup_based_parser$parse_entry.invoke(hiccup_based_parser.clj:40)
        at code_maat.parsers.hiccup_based_parser$parse_entry_from.invoke(hiccup_based_parser.clj:47)
        at code_maat.parsers.hiccup_based_parser$parse_from$fn__1950.invoke(hiccup_based_parser.clj:144)
        at clojure.core.protocols$iter_reduce.invoke(protocols.clj:49)
        at clojure.core.protocols$fn__6510.invoke(protocols.clj:112)
        at clojure.core.protocols$fn__6452$G__6447__6465.invoke(protocols.clj:13)
        at clojure.core.reducers$reduce.invoke(reducers.clj:79)
        at clojure.core.reducers$foldvec.invoke(reducers.clj:335)
        at clojure.core.reducers$foldvec$fc__1904$fn__1905.invoke(reducers.clj:340)
        at clojure.core.reducers$foldvec$fn__1908.invoke(reducers.clj:345)

@adamtornhill
Copy link
Owner

Thanks for the info, @rjayasinghe !
I've tested the last released version of Code Maat, 0.9.1, on a large repository and it seems to be able to handle it. If you have the possibility, please try version 0.9.1 (available here and let me know if that solves your problem; We did some parallelization in the parsing stage of 0.9.2 and it might have introduced the problem (but I'm not sure yet).

@rjayasinghe
Copy link
Author

OK. I downloaded and built 0.9.1 from github. This time it ran longer. However, after ~1,5 hours the process died with

WARNING: update already refers to: #'clojure.core/update in namespace: incanter.core, being replaced by: #'incanter.core/update
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at clojure.lang.PersistentHashMap.cloneAndSet(PersistentHashMap.java:1169)
        at clojure.lang.PersistentHashMap.access$000(PersistentHashMap.java:28)
        at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:414)
        at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:415)
        at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:415)
        at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:142)
        at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:28)
        at clojure.lang.RT.assoc(RT.java:778)
        at clojure.core$assoc__4142.invoke(core.clj:191)
        at clojure.lang.Atom.swap(Atom.java:65)
        at clojure.core$swap_BANG_.invoke(core.clj:2240)
        at instaparse.gll$node_get.invoke(gll.clj:286)
        at instaparse.gll$push_listener.invoke(gll.clj:339)
        at instaparse.gll$CatListener$fn__1340.invoke(gll.clj:487)
        at instaparse.gll$push_message$f__1269.invoke(gll.clj:238)
        at instaparse.gll$step.invoke(gll.clj:409)
        at instaparse.gll$run.invoke(gll.clj:427)
        at instaparse.gll$run.invoke(gll.clj:413)
        at instaparse.gll$parse.invoke(gll.clj:894)
        at instaparse.core$parse.doInvoke(core.clj:91)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at code_maat.parsers.hiccup_based_parser$parse_with.invoke(hiccup_based_parser.clj:26)
        at clojure.core$partial$fn__4527.invoke(core.clj:2493)
        at code_maat.parsers.hiccup_based_parser$parse_entry.invoke(hiccup_based_parser.clj:47)
        at code_maat.parsers.hiccup_based_parser$parse_entry_from.invoke(hiccup_based_parser.clj:55)
        at code_maat.parsers.hiccup_based_parser$extend_when_complete.invoke(hiccup_based_parser.clj:62)
        at code_maat.parsers.hiccup_based_parser$as_entry_tokens.invoke(hiccup_based_parser.clj:82)
        at code_maat.parsers.hiccup_based_parser$parse_from.invoke(hiccup_based_parser.clj:158)
        at code_maat.parsers.hiccup_based_parser$parse_log.invoke(hiccup_based_parser.clj:172)
        at code_maat.parsers.git2$parse_log.invoke(git2.clj:74)
        at code_maat.app.app$git2__GT_modifications$fn__9279.invoke(app.clj:133)
        at code_maat.app.app$run_parser_in_error_handling_context.invoke(app.clj:97)

Best Regards,
Robin

@janisz
Copy link
Contributor

janisz commented Apr 11, 2016

@rjayasinghe @adamtornhill How about tuning GC.
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
https://groups.google.com/forum/#!topic/clojure/yPaQN7JuKFY
http://nyeggen.com/post/2012-04-16-tuning-jvm-gc-for-a-big/

@Meffi42
Copy link

Meffi42 commented Sep 13, 2018

My heap space runs out of memory for wikimedia/mediawiki

The evo-log file, produced as described in the book, has 23MB.

Setting up the JVM heap size in the .bat-file does not fix this problem:

java -Xmx512M -Xms64M -jar t\winmaat0.8.5\code-maat-0.8.5-standalone.jar -l ../mediawiki/maat_evo.log -c git -a summary
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at clojure.lang.PersistentVector.rangedIterator(PersistentVector.java:238)
at clojure.lang.PersistentVector.iterator(PersistentVector.java:261)
at clojure.lang.Murmur3.hashOrdered(Murmur3.java:105)
at clojure.lang.APersistentVector.hasheq(APersistentVector.java:166)
at clojure.lang.Util.dohasheq(Util.java:177)
at clojure.lang.Util.hasheq(Util.java:168)
at clojure.lang.PersistentHashMap.hash(PersistentHashMap.java:120)
at clojure.lang.PersistentHashMap.valAt(PersistentHashMap.java:152)
at clojure.lang.RT.get(RT.java:672)
at instaparse.gll$push_message.invoke(gll.clj:172)
at instaparse.gll$push_result.invoke(gll.clj:255)
at instaparse.gll$NodeListener$fn__588.invoke(gll.clj:374)
at instaparse.gll$push_message$f__524.invoke(gll.clj:173)
at instaparse.gll$step.invoke(gll.clj:328)
at instaparse.gll$run.invoke(gll.clj:344)
at instaparse.gll$run.invoke(gll.clj:332)
at instaparse.gll$parse.invoke(gll.clj:758)
at instaparse.core$parse.doInvoke(core.clj:83)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at code_maat.parsers.hiccup_based_parser$parse_with.invoke(hiccup_based_parser.clj:26)
at clojure.lang.AFn.applyToHelper(AFn.java:156)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.core$apply.invoke(core.clj:626)
at clojure.core$partial$fn__4228.doInvoke(core.clj:2468)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at code_maat.parsers.hiccup_based_parser$parse_entry.invoke(hiccup_based_parser.clj:48)
at code_maat.parsers.hiccup_based_parser$parse_entry_from.invoke(hiccup_based_parser.clj:54)
at code_maat.parsers.hiccup_based_parser$extend_when_complete.invoke(hiccup_based_parser.clj:62)
at code_maat.parsers.hiccup_based_parser$as_entry_tokens.invoke(hiccup_based_parser.clj:82)
at code_maat.parsers.hiccup_based_parser$parse_from.invoke(hiccup_based_parser.clj:157)
at code_maat.parsers.hiccup_based_parser$parse_log.invoke(hiccup_based_parser.clj:172)
at code_maat.parsers.git$parse_log.invoke(git.clj:62)

This is running the version downloaded from https://www.adamtornhill.com/code/crimescenetools.htm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants