Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit / reduce disk space usage #37

Open
chrishah opened this issue Jan 24, 2022 · 3 comments
Open

limit / reduce disk space usage #37

chrishah opened this issue Jan 24, 2022 · 3 comments

Comments

@chrishah
Copy link

Expected Behavior

successfully run easy-predict on large chromosome-level genome assembly (within BUSCO)

Current Behavior

metaeuk runs, but runs out of disk space (5TB), even if I impose a --disk-space-limit of 3TB

Steps to Reproduce (for bugs)

Don't think there's a bug - just looking for a way to limit disk space usage. I have access to a server with 2x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (14 Cores / 28 threads per CPU), with 1.5T RAM and atm 5TB of disk space.

Command (within BUSCO):
metaeuk easy-predict --threads 14 Neoceratodus_forsteri.fna run_vertebrata_odb10/metaeuk_output/refseq_db_rerun.faa run_vertebrata_odb10/metaeuk_output/rerun_results/Neoceratodus_forsteri.fna run_vertebrata_odb10/metaeuk_output/tmp --max-intron 130000 --max-seq-len 160000 --min-exon-aa 5 --max-overlap 5 --min-intron 1 --overlap 1 -s 6 --slice-search 1 --remove-tmp-files 1 --disk-space-limit 3000G --split-mode 0 --split-memory-limit 1500G

last few parameters from 'slice-search' onwards, were my attempts to limit/reduce disk space usage and limit RAM usage. The rest I can't control - this is BUSCO behaviour.

Context

Running metaeuk as part of the BUSCO pipeline (v5.2.1) on a publicly available large Eukaryote genome (Australian lungfish)

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MetaEuk Version:" when you execute MetaEuk without any parameters):
    metaeuk Version: 4.a0f584d
@chrishah
Copy link
Author

Hi,
I saw that there is the --compress 1 option, and that it has been fixed in the latest release (issue #20). I am assuming this will reduce disk usage and will try it as soon as my current run is done - I have one running with 6TB disk space available now. If you have any other suggestions on how to reduce disk space, please let me know - thanks!

Could I ask also what --slice-search 1 is actually doing. I found it in some post somewhere as a suggestion when RAM is limiting, so I am using it, but don't really know how it affects the run or if it really is helpful in my situation. Thanks!

cheers,
Christoph

@chrishah
Copy link
Author

chrishah commented Feb 2, 2022

Hi,
So, the last run with Version: 4.a0f584d actually finished successfully. The only thing I changed was to reduce the number of threads from 14 to 10. I noticed before that metauk writes large files in tmp directories (*.pred, *.aln) during the process and that the files are numbered 0 - nthreads-1, so I thought if I reduce the number of threads this might reduce the amount of data written to disk. With 14 threads I ran out of disk space at 5T disk usage. With 10 threads the maximum disk usage was 2.6T. I don't really understand but these were my observations and I am happy that it ran through in the end.
It ran for about 170 hours. With respect to RAM the limit I imposed with --split-memory-limit 1500G seemed to have worked nicely - metaeuk maxed out the RAM totally at times (rss 1.5T) but didn't run out.
Thanks!

cheers,
Christoph

@elileka
Copy link
Member

elileka commented Feb 7, 2022

Thank you very much for the feedback and I apologize for the late reply. I am glad you got it to run. We implemented the logic to limit disk space usage in MMseqs2 (the library MetaEuk uses) and it was quite demanding in terms of the possible scenarios it had to cover. The behavior you describe strongly indicates something is not fully working there. I will open an issue for MMseqs2 and refer to this issue. I hope we can get to this in future versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants