Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong AO/RO for adjacent SNPs #148

Closed
imgag opened this issue Apr 20, 2016 · 3 comments
Closed

Wrong AO/RO for adjacent SNPs #148

imgag opened this issue Apr 20, 2016 · 3 comments
Labels

Comments

@imgag
Copy link

imgag commented Apr 20, 2016

Hi,

we have found a bug in the reference/allele count calculation that occurs when a homozygous and a heterozygous SNP are adjacent to each other. Here the screenshot:

screenshot

This is the VCF line obtained from FreeBayes:

chr2 215632255 . CA TG,CG 3038.7 . AB=0.550459,0.422018;ABP=5.42083,8.76769;AC=1,1;AF=0.5,0.5;AN=2;AO=60,46;CIGAR=2X,1M1X;DP=109;DPB=109;DPRA=0,0;EPP=27.4756,21.8927;EPPR=3.0103;GTI=0;LEN=2,1;MEANALT=3,3;MQM=60,60;MQMR=60;NS=1;NUMALT=2;ODDS=261.153;PAIRED=1,1;PAIREDR=1;PAO=0,0;PQA=0,0;PQR=0;PRO=0;QA=2270,1724;QR=75;RO=2;RPL=26,19;RPP=5.32654,6.03148;RPPR=7.35324;RPR=34,27;RUN=1,1;SAF=17,14;SAP=27.4756,18.305;SAR=43,32;SRF=1;SRP=3.0103;SRR=1;TYPE=mnp,snp;technology.ILLUMINA=1,1 GT:DP:RO:QR:AO:QA:GL 1/2:109:2:75:60,46:2270,1724:-320.576,-135.081,-123.163,-179.972,0,-172.268

Note that the RO/AO annotations (bold) are correct.

For the annotation with information from different databases (1000g, ExAC, etc.), we now need to decompose the two SNPs. We tried that with different orders of vcfbreakmulti and vcfallelicprimitives (with option -kg).

The output of first vcfallelicprimitives then vcfbreakmulti is:

chr2 215632255 . C T 3038.7 . AB=0.550459;ABP=5.42083;AC=1;AF=0.5;AN=2;AO=60;CIGAR=2X;DP=109;DPB=109;DPRA=0;EPP=27.4756;EPPR=3.0103;GTI=0;LEN=1;MEANALT=3;MQM=60;MQMR=60;NS=1;NUMALT=2;ODDS=261.153;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=2270;QR=75;RO=2;RPL=26;RPP=5.32654;RPPR=7.35324;RPR=34;RUN=1;SAF=17;SAP=27.4756;SAR=43;SRF=1;SRP=3.0103;SRR=1;TYPE=snp;technology.ILLUMINA=1 GT:DP:RO:QR:AO:QA:GL 1|0:109:2:75:60,46:2270,1724:-320.576,-135.081,-123.163,-179.972,0,-172.268
chr2 215632256 . A G 3038.7 . AB=0.422018;ABP=8.76769;AC=2;AF=1;AN=2;AO=46;CIGAR=1M1X;DP=109;DPB=109;DPRA=0;EPP=21.8927;EPPR=3.0103;GTI=0;LEN=1;MEANALT=3;MQM=60;MQMR=60;NS=1;NUMALT=2;ODDS=261.153;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=1724;QR=75;RO=2;RPL=19;RPP=6.03148;RPPR=7.35324;RPR=27;RUN=1;SAF=14;SAP=18.305;SAR=32;SRF=1;SRP=3.0103;SRR=1;TYPE=snp;technology.ILLUMINA=1 GT:DP:RO:QR:AO:QA:GL 1|1:109:2:75:60,46:2270,1724:-320.576,-135.081,-123.163,-179.972,0,-172.268

Here, the variants are correctly decomposed in a herozygous variant and a homozygous variant, but the RO/AO annotations are wrong.

The output of first vcfbreakmulti then vcfallelicprimitives is:

chr2 215632255 . C T 3038.7 . AB=0.550459;ABP=5.42083;AC=1;AF=0.5;AN=2;AO=60;CIGAR=2X;DP=109;DPB=109;DPRA=0;EPP=27.4756;EPPR=3.0103;GTI=0;LEN=1;MEANALT=3;MQM=60;MQMR=60;NS=1;NUMALT=2;ODDS=261.153;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=2270;QR=75;RO=2;RPL=26;RPP=5.32654;RPPR=7.35324;RPR=34;RUN=1;SAF=17;SAP=27.4756;SAR=43;SRF=1;SRP=3.0103;SRR=1;TYPE=snp;technology.ILLUMINA=1 GT:DP:RO:QR:AO:QA:GL **.|**1:109:2:75:60:2270:-320.576,-135.081,-123.163
chr2 215632256 . A G 3038.7 . AB=0.550459;ABP=5.42083;AC=1;AF=0.5;AN=2;AO=60;CIGAR=2X;DP=109;DPB=109;DPRA=0;EPP=27.4756;EPPR=3.0103;GTI=0;LEN=1;MEANALT=3;MQM=60;MQMR=60;NS=1;NUMALT=2;ODDS=261.153;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=2270;QR=75;RO=2;RPL=26;RPP=5.32654;RPPR=7.35324;RPR=34;RUN=1;SAF=17;SAP=27.4756;SAR=43;SRF=1;SRP=3.0103;SRR=1;TYPE=snp;technology.ILLUMINA=1 GT:DP:RO:QR:AO:QA:GL .|1:109:2:75:60:2270:-320.576,-135.081,-123.163
chr2 215632256 . A G 3038.7 . AB=0.422018;ABP=8.76769;AC=1;AF=0.5;AN=2;AO=46;CIGAR=1M1X;DP=109;DPB=109;DPRA=0;EPP=21.8927;EPPR=3.0103;GTI=0;LEN=1;MEANALT=3;MQM=60;MQMR=60;NS=1;NUMALT=2;ODDS=261.153;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=1724;QR=75;RO=2;RPL=19;RPP=6.03148;RPPR=7.35324;RPR=27;RUN=1;SAF=14;SAP=18.305;SAR=32;SRF=1;SRP=3.0103;SRR=1;TYPE=snp;technology.ILLUMINA=1 GT:DP:RO:QR:AO:QA:GL .|1:109:2:75:46:1724:-320.576,-179.972,-172.268

Here, the homzygous variant is split into two heterozygous variants and the RO anntation is wrong.
Is there a way to generate correct counts (we need it to compute the real AF because the given AF is always 0.5 or 1.0).

A working example including the BAM file and a Makefile can be found here:
vcflib_bug_allele_counts.zip

Best,
Marc

@zeeev
Copy link
Collaborator

zeeev commented Apr 20, 2016

@imgag Thank you for the detailed bug report. You're not the only one (see #147). It is very high on our priority list.

@github-actions
Copy link

This issue is marked stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the stale label Dec 17, 2020
@github-actions
Copy link

This issue was closed for lack of activity. Feel free to re-open if someone feels like working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant