fewer heterozygous variants #183

jsha129 · 2022-10-19T07:48:42Z

Dear FACETS team, thank you for developing this tool. I have been getting errors when running emcncf() because of an insufficient number of heterozygous variants. The following command reported 931 'het' from a vcf containing 5 samples.
bcftools filter -i "FILTER = 'PASS' & FORMAT/GT = '0/1' & FORMAT/AF > 0.75 & FORMAT/AD > 25" 3_filtered.vcf.gz | grep -v "#" | wc -l
I tried segmentation of 100, 1000 and 10000 when running pileup (-g -q15 -Q20 -P100 -r25,0) and get roughly 45 'hets'. Median MQ is 60 in INFO field. Could you please help clarify this and any suggestions on improving number of hets? I tried reducing values for '-Q'and -'r' and saw modest improvement.
Thanks

The text was updated successfully, but these errors were encountered:

veseshan · 2022-10-19T18:56:26Z

Are you using a targeted panel? Typical whole exome sequencing data will have more than 20k het SNPs. The targeted panel we use has more than a 1000. FACETS uses loci that are sufficiently spaced to avoid serial correlation. I wonder if your panel is covering such a limited space of the genome that you only get 45 hets.

jsha129 · 2022-10-19T21:10:29Z

Thank you for response. This is WGS.

…

On Thu, 20 Oct 2022, 5:56 am Venkatraman E. Seshan, < ***@***.***> wrote: Are you using a targeted panel? Typical whole exome sequencing data will have more than 20k het SNPs. The targeted panel we use has more than a 1000. FACETS uses loci that are sufficiently spaced to avoid serial correlation. I wonder if your panel is covering such a limited space of the genome that you only get 45 hets. — Reply to this email directly, view it on GitHub <#183 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACRLYSB6THQ2LW2CTHYVIO3WEA76JANCNFSM6AAAAAARI2PFVY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

veseshan · 2022-10-19T21:50:52Z

Then it can be due low depth of coverage or a mismatch between the genome build of the bam and the snp file

jsha129 · 2022-10-20T02:58:56Z

I see. thanks for pointing that out. I used hg38 and have 1000G snp file. Is there a way to supply the newer snp file? I tried preProcSample(rcmat, gbuild = "hg38") which made no difference. Median NOR.DP for the example data is ~100 vs ~20 for our data. is that sufficient for CNV calling? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fewer heterozygous variants #183

fewer heterozygous variants #183

jsha129 commented Oct 19, 2022

veseshan commented Oct 19, 2022

jsha129 commented Oct 19, 2022 via email

veseshan commented Oct 19, 2022

jsha129 commented Oct 20, 2022 •

edited

fewer heterozygous variants #183

fewer heterozygous variants #183

Comments

jsha129 commented Oct 19, 2022

veseshan commented Oct 19, 2022

jsha129 commented Oct 19, 2022 via email

veseshan commented Oct 19, 2022

jsha129 commented Oct 20, 2022 • edited

jsha129 commented Oct 20, 2022 •

edited