Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensembl Variant Effect Predictor annotations support #2150

Open
znikasz opened this issue May 14, 2019 · 3 comments · Fixed by #2278
Open

Ensembl Variant Effect Predictor annotations support #2150

znikasz opened this issue May 14, 2019 · 3 comments · Fixed by #2278

Comments

@znikasz
Copy link

znikasz commented May 14, 2019

I have already been using Adam with snpEff annotations and it worked great. However, when I've tried to use it with a VCF file annotated using Ensembl VEP it didn't work. I was expecting, that ANN format is standardized enough, so that transcriptEffects will be populated.

So I wanted to ask whether I do something wrong and there is a way to make Adam work with VEP or it's not yet supported?

Thank you.

@heuermh
Copy link
Member

heuermh commented May 14, 2019

Hello @znikasz

Ensembl VEP only outputs in ANN format if you use the --vcf_info_field ANN --terms so output options

http://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html#output

This is how we call Ensembl VEP in Cannoli

https://github.com/bigdatagenomics/cannoli/blob/master/core/src/main/scala/org/bdgenomics/cannoli/Vep.scala#L83

@znikasz
Copy link
Author

znikasz commented May 14, 2019

Dear @heuermh ,

thank you for an answer.
I checked cannoli way of running VEP and I've tried to do it myself.

I've used a very simple VCF file, containing one change:

chr1    2309937 .       C       CTT     281.04  .       AC=2;AF=1;AN=2   GT:AD:DP:GQ:PL  1/1:0,8:8:24:295,24,0

I've annotated it with VEP, release 96:

vep         --dir /opt/vep/.vep/         -i /data/input.vcf.gz -o /data/output2.vep.vcf --format vcf --vcf --vcf_info_field ANN --terms SO --no_stats --offline

Annotated VCF had lines:

##VEP="v96" time="2019-05-14 17:42:50" cache="/opt/vep/.vep/homo_sapiens/96_GRCh38" ensembl=96.7a35428 ensembl-funcgen=96.9c3a0cd ensembl-variation=96.70d2777 ensembl-io=96.6e65b30 1000genomes="phase3" COSMIC="87" ClinVar="201901" ESP="V2-SSA137" HGMD-PUBLIC="20184" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 30" genebuild="2014-07" gnomAD="r2.1" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=ANN,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID">
(...)
chr1    2309937 .       C       CTT     281.04  .       AC=2;AF=1;AN=2;DP=18;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=59.4;QD=28.73;SOR=1.863;ANN=TT|3_prime_UTR_variant|MODIFIER|SKI|ENSG00000157933|Transcript|ENST00000378536|protein_coding|7/7||||5807-5808|||||||1||HGNC|HGNC:10896       GT:AD:DP:GQ:PL  1/1:0,8:8:24:295,24,0

But now running:

ac.loadVcf("/tmp/output2.vep.vcf").toVariants.dataset.show

throws:

Caused by: java.lang.NumberFormatException: For input string: "5807-5808"
  at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
  at java.lang.Integer.parseInt(Integer.java:580)
  at java.lang.Integer.parseInt(Integer.java:615)
  at org.bdgenomics.adam.converters.TranscriptEffectConverter$.parseFraction(TranscriptEffectConverter.scala:99)
  at org.bdgenomics.adam.converters.TranscriptEffectConverter$.parseTranscriptEffect(TranscriptEffectConverter.scala:143)

as 5807-5808 is not a number.

Is there anything that I'm doing wrong?

@heuermh
Copy link
Member

heuermh commented May 23, 2019

Thank you for the detailed reply. VEP must not be using the same ANN format string as SnpEff or are otherwise different from the ANN specification. I'll take a closer look.

heuermh referenced this issue in dmaziec/cannoli Oct 26, 2020
@heuermh heuermh added this to the 0.33.0 milestone Oct 26, 2020
@heuermh heuermh reopened this Oct 28, 2020
@heuermh heuermh modified the milestones: 0.33.0, 0.34.0 Dec 16, 2020
@heuermh heuermh modified the milestones: 0.34.0, 0.35.0 Mar 10, 2021
@heuermh heuermh modified the milestones: 0.35.0, 0.36.0 Apr 26, 2021
@heuermh heuermh modified the milestones: 0.36.0, 0.37.0 Jul 23, 2021
@heuermh heuermh modified the milestones: 0.37.0, 0.38.0 Jan 12, 2022
@heuermh heuermh removed this from the 0.38.0 milestone May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants