Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vt estimate converts QUAL to '-nan' for multi-allelic sites #94

Open
oleraj opened this issue Oct 22, 2018 · 0 comments
Open

vt estimate converts QUAL to '-nan' for multi-allelic sites #94

oleraj opened this issue Oct 22, 2018 · 0 comments

Comments

@oleraj
Copy link

oleraj commented Oct 22, 2018

Hi,

I noticed recently that when I use vt estimate command, the QUAL score is convert to -nan for multi-allelic sites. This is fine with variants with single allele. An example variant is here:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	S1	S2	S3
21	10906293	rs28616746	T	C,G	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum	GT:AD:DP:GQ:PL	1/2:0,133,52:185:99:5668,1373,975,4289,0,4133	1/2:0,146,63:209:99:6136,1706,1270,4431,0,4242	1/2:0,196,66:262:99:8107,1791,1159,6111,0,5911

After vt estimate -e AB it looks like this:

21	10906293	rs28616746	T	C,G	-nan	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;HWEAF=0.5,0.5;HWEGF=0,0,0.25,0,0.5,0.25;MLEAF=0,1;AB=0.0985985	GT:AD:DP:GQ:PL	1/2:0,133,52:185:99:5668,1373,975,4289,0,4133	1/2:0,146,63:209:99:6136,1706,1270,4431,0,4242	1/2:0,196,66:262:99:8107,1791,1159,6111,0,5911

AB field is computed fine, but the QUAL is converted to -nan, which is not a valid value for QUAL column and this has issues when I try to run GATK VariantsToTable in a subsequent step.

Since I only saw this with multi-allelic variants, I thought maybe if I decomposed and normalized the variants first that might help. Decomposing and normalizing retains the QUAL score in tact:

bcftools view $vcf 21:10906293 | vt decompose - | vt normalize -r $ref - | grep -v "^#"
21	10906293	rs28616746	T	C	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G	GT:DP:PL	1/.:185:5668,1373,975	1/.:209:6136,1706,1270	1/.:262:8107,1791,1159
21	10906293	rs28616746	T	G	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G	GT:DP:PL	./1:185:5668,4289,4133	./1:209:6136,4431,4242	./1:262:8107,6111,5911

However, when I use that as input to vt estimate, the results are worse -- the QUAL score is still converted to either -0 or 5 and AB is now -nan:

21	10906293	rs28616746	T	C	-0	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G;HWEAF=-nan;HWEGF=-nan,-nan,-nan;MLEAF=-nan;AB=-nan	GT:DP:PL	1/.:185:5668,1373,975	1/.:209:6136,1706,1270	1/.:262:8107,1791,1159
21	10906293	rs28616746	T	G	5	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G;HWEAF=-nan;HWEGF=-nan,-nan,-nan;MLEAF=-nan;AB=-nan	GT:DP:PL	./1:185:5668,4289,4133	./1:209:6136,4431,4242	./1:262:8107,6111,5911

Any help you can provide or suggestions on how to avoid converting the QUAL score to invalid values?

Thanks!

Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant