Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is are ways to decrease the memory requirement of coverage2cytosine at the expense of computation time? #650

Open
onurcanbektas opened this issue Feb 2, 2024 · 2 comments

Comments

@onurcanbektas
Copy link

onurcanbektas commented Feb 2, 2024

Hi,

We do 10x deeper scNMTseq than that are used in typical scNMTseq experiments.
However, during the coverage2cytosine portion of the pipeline, for each cell, I need at least 400GB RAM, otherwise the job fails due to not having enough memory.
We have few the servers with this many RAM, but since we receive data from hundreds of cells, takes weeks to process all of the cells, one-by-one.
But the process of each cells takes about 5 hours.

I was wondering, whether there is a way to trade the memory requirements with computational time. For example, if for each cell, the process took 1 day but required 100GB RAM, because we have many servers with at least 100GB ram, I could process all cells at once.

I use the following parameters for coverage2cytosine --nome-seq --gc

@FelixKrueger
Copy link
Owner

wow that sounds like a huge amount of RAM. I don't think I have every heard about such excessive amounts... In theory, coverage2cytosine should hold the genome in memory (typically some 3-4GB for the human or mouse genome), and then all positions that were covered per chromosome. Since this operation should be chromosome-by-chromosome you should never really see the memory requirements to go all that high... (also 5h seems a bit on the slow side....)

Is there a way for you to monitor the memory consumption in some more detail (as in: does it keep creeping up constantly over time?). We just quickly looked for an answer and found the PIDSTAT tool might be able to do this (with -r for memory, possibly combined with --interval?). Alternatively, could you provide me with a sample coverage file and the genome you used for this so I can try out some things myself?

@onurcanbektas
Copy link
Author

Dear Felix, thanks a lot for the promptly reply.
I sent you an email with a sample data and the genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants