Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract fields that are not ##INFO from gnomAD vcf? #124

Open
francoiskroll opened this issue Jul 9, 2020 · 2 comments
Open

How to extract fields that are not ##INFO from gnomAD vcf? #124

francoiskroll opened this issue Jul 9, 2020 · 2 comments

Comments

@francoiskroll
Copy link

That must be me; but I can't seem to find how to pull some specific fields from a gnomAD vcf.

gnomAD vcf file header:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##hailversion=0.2.24-9cd88d97bedd
##FILTER=<ID=AC0,Description="Allele count is zero after filtering out low-confidence genotypes (GQ < 20; DP < 10; and AB < 0.2 for het calls)">
##FILTER=<ID=AS_VQSR,Description="Failed Allele-Specific Variant Quality Score Recalibration threshold">
##FILTER=<ID=InbreedingCoeff,Description="InbreedingCoeff < -0.3">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count for samples">
...

Example of entry from gnomAD vcf file:

chr20 4694655 rs143783853 C G 1.51257e+06 PASS AC=1490
...

Say I want to extract all of the 8 fields above. I'm starting with rs ID / FILTER / AC as a test.

My conf.toml file is:

[[annotation]]
file="gnomad_chr20.vcf.gz"
fields = ["ID", "FILTER", "AC"]
ops=["self", "self", "self"]
names=["ID", "gnomad_FILTER", "AC"]

And my command:

vcfanno -p 4 conf.toml variants.vcf > annotated.vcf

rs ID and AC works great, but I can't seem to get the FILTER out. I have also tried with fields = ["PASS"].

Example of entry of my own vcf:

  • before annotation (variants.vcf):

chr20 4694655 . C G 59835.5 PASS BaseCalledReadsWithVariant=4037;BaseCalledFraction=0.401173;TotalReads=8792;AlleleCount=1;SupportFraction=0.487189;SupportFractionByBase=0.047,0.483,0.443,0.027 GT 0/1

  • after annotation (annotated.vcf):

chr20 4694655 . C G 59835.5 PASS BaseCalledReadsWithVariant=4037;BaseCalledFraction=0.401173;TotalReads=8792;AlleleCount=1;SupportFraction=0.487189;SupportFractionByBase=0.047,0.483,0.443,0.027;ID=rs143783853;AC=1490 GT 0/1

Can you help?

@brentp
Copy link
Owner

brentp commented Jul 9, 2020

I think that the FILTER only gets added if it is not PASS (or .).

@francoiskroll
Copy link
Author

I see. Seems like the field is empty in the gnomAD vcf if not PASS, but from what I can see the filter AC0 is always present if PASS is absent? So that should be okay.

Example

from gnomAD vcf:

chr20 4694192 . A T 182 AC0;AS_VQSR AC=0
...

vcfanno adds annotation:

chr20 4694192 . A T 31.5 PASS ... ;gnomadv3_FILTER=AC0,AS_VQSR;gnomadv3_AC=0 GT 1/1


If useful for a future reader – I'm also able to pull the lcr (low complexity region) filter; for example:

My conf.toml:

[[annotation]]
file="gnomad_chr20_4680000to4705000.vcf.gz"
fields = ["FILTER", "lcr", "AC"]
ops=["self", "self", "self"]
names=["gnomadv3_FILTER", "gnomadv3_lcr", "gnomadv3_AC"]

Annotation looks like (this variant is PASS in both my vcf & on gnomAD):

chr20 4694861 . C A 31.5 PASS ... ;gnomadv3_lcr;gnomadv3_AC=86 GT 1/1

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants