Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High RAM usage when using meta.retrieval prevents usage on low end machines #40

Open
barrel0luck opened this issue May 29, 2019 · 4 comments

Comments

@barrel0luck
Copy link

This issue depends on the numbers of genomes/ cds/ rna/ etc that are to be downloaded.
As the process goes on the amount of RAM used by the r-session progressively increases to a point where the entire system slows down.
On a system with 8 GB RAM (low-end systems), I've managed to download ~1600 successfully, but I need to restart system after the process is done as everything is super slow afterwards. So far I've not managed to download higher numbers than that as the system becomes non responsive.
I think it must be some variable that must be increasing in size as the process goes on and can be easily cleaned up after each download (maybe) to reduce RAM usage.
Also note that the RAM usage at times intermittently decreases, i.e, it's not continuously increasing, but over a long period of time, it increases a lot, eventually overpowering the system.

@HajkD
Copy link
Member

HajkD commented May 30, 2019

Hi @barrel0luck,

Many thanks for contacting me and for making me aware of this issue.

Would you mind sharing a small example where this occurs? This will make my life much easier when troubleshooting.

Your help is very much appreciated.

Many thanks,
Hajk

@barrel0luck
Copy link
Author

Sure! And thanks for developing this awesome package! I hope you can maintain it for long!

Here's the code you can use to reproduce the issue on a low end system (no biggie):
This should download ~1600 files:

meta.retrieval(kingdom = "bacteria", group = "Gammaproteobacteria", db = "refseq", type = "rna", reference = FALSE) %>%
  clean.retrieval()

This should dowload a much greater number of files (not sure about the number, I've failed so far):

meta.retrieval(kingdom = "bacteria", group = "Gammaproteobacteria", db = "genbank", type = "rna", reference = FALSE) %>%
  clean.retrieval()

@HajkD
Copy link
Member

HajkD commented May 31, 2019

Perfect! Thank you so much :-)

I will have a look at it now.

Cheers,
Hajk

@barrel0luck
Copy link
Author

barrel0luck commented May 31, 2019

I must note that I'm using R on Fedora Linux. However, if the issue is with the code, maybe a variable (or more) that grows with each iteration of a loop, then it should be reproducible on other OSes as well...
I think the problem results as r-session loads and stores everything in the RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants