Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory usage bottleneck #49

Open
hy395 opened this issue Nov 22, 2019 · 4 comments
Open

memory usage bottleneck #49

hy395 opened this issue Nov 22, 2019 · 4 comments

Comments

@hy395
Copy link

hy395 commented Nov 22, 2019

Hi Avanti,

I've been trying to figure out the memory bottleneck when using tfmodisco. It turns out the initially dense matrix created by seqlets2patterns doesn't take that much memory with 40k seqlets per metacluster (~40gb). I then narrow it down to the graph2binary() in modisco/cluster/phenograph/core.py. graph2binary() creates a really large list before writing it out to a binary file:

181 f.writelines([e for t in zip(ij, s) for e in t])

For a 4gb sparse matrix, the list can be ~60gb. avoid creating this list, I can run tfmodisco with 40k seqlets per metacluster with 200G memory.

I've submitted a PR to make a revision on this. I'm not too familiar with the codebase yet. So let me know if I miss anything.

@AvantiShri
Copy link
Collaborator

Sounds good to me! To clarify, does it currently take 200G for 40k seqlets even with this modification put in?

@hy395
Copy link
Author

hy395 commented Nov 23, 2019

For 40k seqlets per metacluster, currently the peak memory usage is ~120G with this modification put in, according to slurm seff. Thanks!

@akundaje
Copy link

akundaje commented Nov 23, 2019 via email

@AvantiShri
Copy link
Collaborator

Yes, agreed. I don't recall seeing any evidence that the implementation of Louvain/Leiden is causing the issue after Han's fix? Han's fix specifically addressed an issue in the borrowed-from-phenograph code that wrote the binary file that was subsequently called by Louvain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants