Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge 10X scRNA-seq mouse data #70

Open
alexfernandes8a opened this issue Jul 20, 2022 · 1 comment
Open

Huge 10X scRNA-seq mouse data #70

alexfernandes8a opened this issue Jul 20, 2022 · 1 comment

Comments

@alexfernandes8a
Copy link

Hi! I have a huge 10X scRNA-seq mouse data (~60Gb BAM file | ~50K cells from 12 mice) that I am trying to run on cellSNP-lite. I compiled cellSNP-lite in an HPC environment and I am running it from there on the mode 2A.
The problem is, no matter how much RAM I am using, I am constantly getting the message "Combined max depth is above 1M. Potential memory hog!" and it has been running for 11 days already.
I know it is a lot of data and I am wondering what would be the best approach in that scenario? Perhaps split the cell barcodes file?
Any help is highly appreciated!
Thank you so very much.

@hxj5
Copy link
Collaborator

hxj5 commented Jul 21, 2022

Hi, Mode 2a is more suitable for small datasets. For large datasets, you may try Mode 2b + Mode 1a. Mode 2a does joint calling and genotyping, but it is substantially slower than calling first in a bulk manner by Mode 2b followed by genotyping in Mode 1a. To speed up, you may try --minMAF 0.1 --minCOUNT 100 options in both modes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants