Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime and expected output #78

Open
ahoffrichter opened this issue Dec 5, 2022 · 6 comments
Open

runtime and expected output #78

ahoffrichter opened this issue Dec 5, 2022 · 6 comments

Comments

@ahoffrichter
Copy link

Hi,
I ran cellsnp-lite with one bam file. It is now already running since 3 weeks. I was wondering if this is to be expected.
Is there somewhere any information on the expected output files? There are already several files in my output folder and I am not sure if the program is actually already finished and it just appears as still running.
Best,
Anne

@hxj5
Copy link
Collaborator

hxj5 commented Dec 5, 2022

Hi, the files in the output folder should probably be the temporary files (with suffix such as .0, .1, ... etc). When the program finishes, the output folder should look like this example.

How many cells does the bam file contain? Sometimes it could take a long time for cellsnp-lite to genotype a big dataset, especially in mode 2a (i.e., to pileup whole chromosomes for 10x data). Could you also share your command line and the version of cellsnp-lite?

Best,
Xianjie

@ahoffrichter
Copy link
Author

Hi,
the bam file should contain about 15,600 cells.
I used mode 1a with the region vcf from here.
I'm running this on a cluster with cellsnp-lite version 1.2.2. I'm not exactly sure what you mean with "share your command line".

Best,
Anne

@hxj5
Copy link
Collaborator

hxj5 commented Dec 5, 2022

The command line contains all the parameters you used to run cellsnp-lite, e.g.,

cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p 20 --minMAF 0.1 --minCOUNT 20 --gzip.

The 15,600 cells indicate the bam is probably a big 10x dataset. To speedup, you may

  • check whether the cell barcodes are "filtered", i.e., from filtered_gene_bc_matrices instead of from raw_gene_bc_matrices in the cellranger output folder (update -b);
  • try to use the SNP list from AF5e2 VCF file instead of AF5e4 in this folder (update -R);
  • use more threads or cores (update -p).

@ahoffrichter
Copy link
Author

Ah ok, the command I used looks like this:

cellsnp-lite -s path/to/possorted_genome_bam.bam -b /path/to/raw_feature_bc_matrix/barcodes.tsv.gz -O /vireo/test -R /vireo/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz -p 20 --minMAF 0.1 --minCOUNT 20 --gzip

Yes indeed, it is a 10x dataset and I used the raw barcodes.
I will try out your suggestions, thank you very much for your help!

@roshni-b
Copy link

For 15k cells, roughly how long did this take to run?

@nansne
Copy link

nansne commented Mar 18, 2024

hello, i ran cellsnp-lite, and it failed with"[E::idx_find_and_load] Could not retrieve index file for '/h/sunnan/VKHDATA/vkha/gex_possorted_bam.bam'", what should i do next? thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants