Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leaks when phenopath is run inside of a for loop? #12

Open
krinsman opened this issue Feb 26, 2021 · 0 comments
Open

Possible memory leaks when phenopath is run inside of a for loop? #12

krinsman opened this issue Feb 26, 2021 · 0 comments

Comments

@krinsman
Copy link

First, I acknowledge I'm not currently providing a MWE for this, I just wanted to get a sense for whether it's already a known issue. In the likely scenario the issue turns out to be a development priority, I can supply a MWE as well.

Second, I acknowledge that running phenopath in an embarrassingly parallel for loop on multiple instances of simulated data is not the primary intended use case for phenopath, so I imagine that even if this issue is already known, it seems unlikely to be a high development priority.

Anyway, my understanding is that R handles any internal memory garbage collection automatically, but that when a package uses RCpp/C++ like phenopath, a lot of that code might fall outside of the "scope" of R's automatic garbage collection, and so might not be cleared without manual calls to gc. (At least this is my impression based in part on similar seeming issues found elsewhere, e.g. [1] [2] [3].) So it seems possible something similar might be happening with phenopath.

Specifically, when running a for loop with 100 iterations, each iteration calling the phenopath function four times (each of the four times on an observations matrix and a covariates matrix both of size 500000 x 11), initially the memory allocated to/required by each R process doing so is fewer than 2GB on my computer, but towards the end of the for loop it has now become greater than 6GB.

I also think it is probably a memory leak, because the iterations do not depend on each other (so they should be embarrassingly parallelizable), and moreover my laptop hasn't crashed despite only having 8GB RAM (and having 6 processes doing this), because Mac OS X seems to be pretty good at transferring the leaked memory to swap (around 14GB so far). And if this memory being transferred to swap was being re-used again (even though a priori there's no reason to expect that it would), my understanding is that the resulting repeated transfers between swap and RAM would slow the process down considerably, but my computer is still running more or less at normal speed.

I have yet to confirm whether manually ending a call to gc at the end of each iteration of a for loop fixes this, and haven't looked through the codebase in detail to identify any specific location in the C++ part that might be causing a memory leak if external garbage collection isn't applied. Also I guess I just explained above why the memory leak, even if it exists, might not be a big deal in practice, since the leaked memory can just be transferred to swap before finally being garbage collected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant