Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in running scansnv #3

Open
qian0001 opened this issue Dec 7, 2019 · 10 comments
Open

Errors in running scansnv #3

qian0001 opened this issue Dec 7, 2019 · 10 comments

Comments

@qian0001
Copy link

qian0001 commented Dec 7, 2019

I got the foolowing error messages for about 1/3 of my single cell wgs after running latest scansnv as of today (12/6/19). Any ideas? For those cells with errors in the log, some produced file somatic_genotypes.rda while the others didn't. Over 90% of steps were done for those problem cells.
Thanks for assistance.

slurm-47955.out:Error in 0:dp : NA/NaN argument
slurm-47955.out:Error in rule scansnv_genotype_spikein_scatter:
slurm-47955.out:CalledProcessError in line 778 of /home/xxxxx/miniconda3/envs/scansnv/lib/scansnv/Snakefile:

@jluquette
Copy link
Member

@qian0001 Would you mind posting the entire slurm-47955.out file? Or is that the entire thing?

@jluquette
Copy link
Member

@qian0001 Are you still experiencing this issue?

@qian0001
Copy link
Author

qian0001 commented Dec 18, 2019 via email

@qian0001
Copy link
Author

qian0001 commented Dec 27, 2019 via email

@jluquette
Copy link
Member

Hi @qian0001,

Happy to hear that your run worked. A note about your parameter changes: I highly recommend that you do not decrease --abmodel-hsnp-chunk-size below the default value. We have found that sizes smaller than 100 can cause significant changes in the AB model fit.

Thank you for your question about the final output. All of the mutations called by SCAN-SNV are heterozygous. The columns you've shown here are actually just the raw genotype calls produced by GATK in the first step of SCAN-SNV's pipeline. These genotype calls are unused by SCAN-SNV and you should completely ignore them. The way to interpret the output is: for each "PASS" mutation in the data frame, there is a heterozygous somatic mutation at the indicated location.

Sorry for the confusion--this is clearly a case of lacking documentation. Please let me know if you have any other questions!

@qian0001
Copy link
Author

qian0001 commented Jan 8, 2020 via email

@jluquette
Copy link
Member

The called variants are in the R file

/path_to_your_output/scansnv/[single_cell_sample_name]/somatic_genotypes.rda

If you load this into R, the total number of heterozygous somatic mutations is simply:

R> load("/path_to_your_output/scansnv/[single_cell_sample_name]/somatic_genotypes.rda")
R> sum(somatic$pass)

If you want a rough estimate of the sensitivity of your calls, you can also look at the hSNP spikein analysis. In this analysis, a small number of hSNPs are withheld from the training set and then analyzed as if they were somatic mutation candidates.

R> load("/path_to_your_output/scansnv/[single_cell_sample_name]/hsnp_spikein_genotypes.rda")
R> mean(spikein$pass)

Typical values for sensitivity are 30-50%.

I should also mention that SCAN-SNV is only designed for high coverage, whole genome experiments (not exomes or WGS with mean coverage <15-20x). 100 such cells is a very large data set if it is indeed high depth and whole genome.

@qian0001
Copy link
Author

qian0001 commented Jan 9, 2020 via email

@jluquette
Copy link
Member

jluquette commented Jan 9, 2020

I see. SCAN-SNV does not attempt to call non-somatic mutations except for the small number of hSNPs used in the spike-in analysis I described above. If you would like to do an ad-hoc analysis, the relevant files are:

gatk/hc_raw.mmq60.vcf - the raw output of joint GATK HaplotypeCaller on all samples (includes indels)
scansnv/mmq60.vcf - the above VCF filtered down to biallelic SNPs that were also callable in bulk. callable in bulk just means not no-call ("./." genotype), so this includes sites where bulk is called homozygous reference ("0/0" genotype).

Just to be clear: these files are just GATK output. They are the input to SCAN-SNV but are not representative of SCAN-SNV mutation calls.

If your ultimate goal is to estimate the total number of somatic mutations, then I recommend using the hSNP spikein sensitivity method from the previous comment. This method is rough because hSNPs are not uniformly distributed over the genome (and especially not the phasable ones that SCAN-SNV uses), but is usually a pretty good guess. What fraction of your single cell genomes is usually at very low read depth? If it is reasonably low (ideally <10-20% but maybe even 50%), then I'd recommend trying the hSNP spikein sensitivity before an ad hoc analysis.

@qian0001
Copy link
Author

qian0001 commented Jan 9, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants