Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fewer heterozygous variants #183

Open
jsha129 opened this issue Oct 19, 2022 · 4 comments
Open

fewer heterozygous variants #183

jsha129 opened this issue Oct 19, 2022 · 4 comments

Comments

@jsha129
Copy link

jsha129 commented Oct 19, 2022

Dear FACETS team, thank you for developing this tool. I have been getting errors when running emcncf() because of an insufficient number of heterozygous variants. The following command reported 931 'het' from a vcf containing 5 samples.
bcftools filter -i "FILTER = 'PASS' & FORMAT/GT = '0/1' & FORMAT/AF > 0.75 & FORMAT/AD > 25" 3_filtered.vcf.gz | grep -v "#" | wc -l
I tried segmentation of 100, 1000 and 10000 when running pileup (-g -q15 -Q20 -P100 -r25,0) and get roughly 45 'hets'. Median MQ is 60 in INFO field. Could you please help clarify this and any suggestions on improving number of hets? I tried reducing values for '-Q'and -'r' and saw modest improvement.
Thanks

@veseshan
Copy link
Collaborator

Are you using a targeted panel? Typical whole exome sequencing data will have more than 20k het SNPs. The targeted panel we use has more than a 1000. FACETS uses loci that are sufficiently spaced to avoid serial correlation. I wonder if your panel is covering such a limited space of the genome that you only get 45 hets.

@jsha129
Copy link
Author

jsha129 commented Oct 19, 2022 via email

@veseshan
Copy link
Collaborator

Then it can be due low depth of coverage or a mismatch between the genome build of the bam and the snp file

@jsha129
Copy link
Author

jsha129 commented Oct 20, 2022

I see. thanks for pointing that out. I used hg38 and have 1000G snp file. Is there a way to supply the newer snp file? I tried preProcSample(rcmat, gbuild = "hg38") which made no difference. Median NOR.DP for the example data is ~100 vs ~20 for our data. is that sufficient for CNV calling? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants