Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-Annotation for new SpliceAI VCFs #138

Open
Phillip-a-richmond opened this issue Feb 22, 2021 · 8 comments
Open

Post-Annotation for new SpliceAI VCFs #138

Phillip-a-richmond opened this issue Feb 22, 2021 · 8 comments

Comments

@Phillip-a-richmond
Copy link

Hello,

I'm using VCFAnno with the new SpliceAI VCFs from Illumina, and they've recoded the VCFs with this header (I...have...no...idea...why):

##fileformat=VCFv4.2
##fileDate=20191004
##reference=GRCh38/hg38
##contig=<ID=1,length=248956422>
##contig=<ID=2,length=242193529>
##contig=<ID=3,length=198295559>
##contig=<ID=4,length=190214555>
##contig=<ID=5,length=181538259>
##contig=<ID=6,length=170805979>
##contig=<ID=7,length=159345973>
##contig=<ID=8,length=145138636>
##contig=<ID=9,length=138394717>
##contig=<ID=10,length=133797422>
##contig=<ID=11,length=135086622>
##contig=<ID=12,length=133275309>
##contig=<ID=13,length=114364328>
##contig=<ID=14,length=107043718>
##contig=<ID=15,length=101991189>
##contig=<ID=16,length=90338345>
##contig=<ID=17,length=83257441>
##contig=<ID=18,length=80373285>
##contig=<ID=19,length=58617616>
##contig=<ID=20,length=64444167>
##contig=<ID=21,length=46709983>
##contig=<ID=22,length=50818468>
##contig=<ID=X,length=156040895>
##contig=<ID=Y,length=57227415>
##INFO=<ID=SpliceAI,Number=.,Type=String,Description="SpliceAIv1.3 variant annotation. These include delta scores (DS) and delta positions (DP) for acceptor gain (AG), acceptor loss (AL), donor gain (DG), and donor loss (DL). Format: ALLELE|SYMBOL|DS_AG|DS_AL|DS_DG|DS_DL|DP_AG|DP_AL|DP_DG|DP_DL">

Where each line looks like this:

1       69091   .       A       C       .       .       SpliceAI=C|OR4F5|0.01|0.00|0.00|0.00|42|25|24|2

When I annotate, I get a line that looks like this:

1	3709604	.	G	A	52.7	PASS	CADD_v1.4=3.232;gnomad_genome_af_global=0.1707;gnomad_genome_hom_global=2233;gnomad_genome_ac_global=24444;gnomad_genome_an_global=143176;rs_ids=rs12130809;cosmic_coding_ids=COSV60699775;cosmic_count_observed=1,1;SpliceAI_NonSeparated_snv=A|TP73|0.00|0.00|0.00|0.00|11|0|11|-1	GT:GQ:DP:AD:VAF:PL	0/1:45:32:17,15:0.46875:44,0,67	1/1:52:38:0,37:0.973684:52,62,0	0/1:34:32:20,12:0.375:33,0,60

However, since I want this to be annotated and then placed into GEMINI, I need each of the spliceAI values as floats in the GEMINI database. I figured I could add this annotation from the native SpliceAI files, and then split the annotation by "|". I think my problem is an easy fix, but I haven't seen any examples of the split function called to return an array of annotations. My best attempt (and failure) below:

The Lua I'm using is the same as you provide for the "split" call:

function split(str, sep)
        local sep, fields = sep or ":", {}
        local pattern = string.format("([^%s]+)", sep)
        str:gsub(pattern, function(c) fields[#fields+1] = c end)
        return fields
end

My config.toml looks like:

#SpliceAI SNV
[[annotation]]
file="SPLICEAI/spliceai_scores.masked.snv.hg38.vcf.gz"
fields = ["SpliceAI"]
names = ["SpliceAI_NonSeparated_snv"]
ops = ["self"]

##SpliceAI SNV Final
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')"
name =["ALLELE_spliceai_snv","SYMBOL_spliceai_snv","DS_AG_spliceai_snv","DS_AL_spliceai_snv","DS_DG_spliceai_snv","DS_DL_spliceai_snv","DP_AG_spliceai_snv","DP_AL_spliceai_snv","DP_DG_spliceai_snv","DP_DL_spliceai_snv"]

An annotated line looks like:

1	3709604	.	G	A	52.7	PASS	CADD_v1.4=3.232;gnomad_genome_af_global=0.1707;gnomad_genome_hom_global=2233;gnomad_genome_ac_global=24444;gnomad_genome_an_global=143176;rs_ids=rs12130809;cosmic_coding_ids=COSV60699775;cosmic_count_observed=1,1;SpliceAI_NonSeparated_snv=A|TP73|0.00|0.00|0.00|0.00|11|0|11|-1	GT:GQ:DP:AD:VAF:PL	0/1:45:32:17,15:0.46875:44,0,67	1/1:52:38:0,37:0.973684:52,62,0	0/1:34:32:20,12:0.375:33,0,60

And the error I get looks like:

=============================================
vcfanno version 0.3.2 [built with go1.12.1]

see: https://github.com/brentp/vcfanno
=============================================
panic: toml: cannot load TOML value of type []interface {} into a Go string

goroutine 1 [running]:
main.main()
	/home/brentp/go/src/github.com/brentp/vcfanno/vcfanno.go:85 +0x192e

Thanks for your help,
Phil

@brentp
Copy link
Owner

brentp commented Feb 22, 2021

Hi, you can probably do this with a post annotation block for each field so (untested):

[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[1]"
name ="ALLELE_spliceai_snv"

[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[2]"
name ="SYMBOL_spliceai_snv"
 ...

and so on. And if you want the resulting field to be a float type you can use e.g. name="DS_AL_spliceai_snv_float" and vcfanno will drop the "_float" and make it a Float field.

@Phillip-a-richmond
Copy link
Author

Thanks for the quick reply! Turns out your fix helped! Odd that they don't count with zero-based, but your example highlighted that.

I also got an error when I didn't specify type, so I put in the float as the type and it kept the float in the output line:

1	3118822	.	C	T	47.2	PASS	CADD_v1.4=1.543;gnomad_genome_af_global=0.0056798;gnomad_genome_hom_global=11;gnomad_genome_ac_global=814;gnomad_genome_an_global=143314;rs_ids=rs139690036;SpliceAI_NonSeparated_snv=T|PRDM16|0.01|0.00|0.00|0.00|20|34|20|-44;ALLELE_spliceai_snv=T;SYMBOL_spliceai_snv=PRDM16;DS_AG_spliceai_snv_float=0.01;DS_AL_spliceai_snv_float=0.00;DS_DG_spliceai_snv_float=0.00;DS_DL_spliceai_snv_float=0.00;DP_AG_spliceai_snv_float=20;DP_AL_spliceai_snv_float=34;DP_DG_spliceai_snv_float=20;DP_DL_spliceai_snv_float=-44	GT:GQ:DP:AD:VAF:PL	0/1:39:35:21,14:0.4:38,0,55	0/1:47:49:20,28:0.571429:47,0,56	0/0:.:.:.:.:.

I'll drop the float from the names below for my final annotation but for now I'm happy with the fix! Next into VCF2DB! I'll continue to play with slivar for the next pipeline version but for now I'm happy with the VCFAnno-->VCF2DB-->GEMINI queries.

If it's useful for anyone the code chunk in .toml:

# The below SpliceAI code adds the entire string from the original VCFs, then post-annotation to grab individual scores into separate objects
# The scores are as follows:
# Format: ALLELE|SYMBOL|DS_AG|DS_AL|DS_DG|DS_DL|DP_AG|DP_AL|DP_DG|DP_DL">
# I'll refer to them as follows:
# "ALLELE_spliceai_snv","SYMBOL_spliceai_snv","DS_AG_spliceai_snv","DS_AL_spliceai_snv","DS_DG_spliceai_snv","DS_DL_spliceai_snv","DP_AG_spliceai_snv","DP_AL_spliceai_snv","DP_DG_spliceai_snv","DP_DL_spliceai_snv"
# These are 1-based, so ALLELE comes out as: 

# #SpliceAI SNV Allele
# [[postannotation]]
# fields = ["SpliceAI_NonSeparated_snv"]
# op = "lua:split(SpliceAI_NonSeparated_snv,'|')[1]"
# type="String"
# name = "ALLELE_spliceai_snv"

#SpliceAI SNV
[[annotation]]
file="SPLICEAI/spliceai_scores.masked.snv.hg38.vcf.gz"
fields = ["SpliceAI"]
names = ["SpliceAI_NonSeparated_snv"]
ops = ["self"]

#SpliceAI Indel
[[annotation]]
file="SPLICEAI/spliceai_scores.masked.indel.hg38.vcf.gz"
fields = ["SpliceAI"]
names = ["SpliceAI_NonSeparated_indel"]
ops = ["self"]

#SpliceAI SNV Allele
[[postannotation]]
fields = ["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[1]"
type="String"
name = "ALLELE_spliceai_snv"

#SpliceAI SNV Allele
[[postannotation]]
fields = ["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[2]"
type="String"
name = "SYMBOL_spliceai_snv"

#SpliceAI SNV DS_AG
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[3]"
type="Float"
name = "DS_AG_spliceai_snv_float" 

#SpliceAI SNV DS_AL
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[4]"
type="Float"
name = "DS_AL_spliceai_snv_float"

#SpliceAI SNV DS_DG
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[5]"
type="Float"
name = "DS_DG_spliceai_snv_float"

#SpliceAI SNV DS_DL
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[6]"
type="Float"
name = "DS_DL_spliceai_snv_float"

#SpliceAI SNV DP_AG
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[7]"
type="Float"
name = "DP_AG_spliceai_snv_float"

#SpliceAI SNV DP_AL
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[8]"
type="Float"
name = "DP_AL_spliceai_snv_float"

#SpliceAI SNV DP_DG
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[9]"
type="Float"
name = "DP_DG_spliceai_snv_float"

#SpliceAI SNV DP_DL
[[postannotation]]
fields=["SpliceAI_NonSeparated_snv"]
op = "lua:split(SpliceAI_NonSeparated_snv,'|')[10]"
type="Float"
name = "DP_DL_spliceai_snv_float"

#SpliceAI Indel Allele
[[postannotation]]
fields = ["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[1]"
type="String"
name = "ALLELE_spliceai_indel"

#SpliceAI Indel Allele
[[postannotation]]
fields = ["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[2]"
type="String"
name = "SYMBOL_spliceai_indel"

#SpliceAI Indel DS_AG
[[postannotation]]
fields=["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[3]"
type="Float"
name = "DS_AG_spliceai_indel_float" 

#SpliceAI Indel DS_AL
[[postannotation]]
fields=["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[4]"
type="Float"
name = "DS_AL_spliceai_indel_float"

#SpliceAI Indel DS_DG
[[postannotation]]
fields=["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[5]"
type="Float"
name = "DS_DG_spliceai_indel_float"

#SpliceAI Indel DS_DL
[[postannotation]]
fields=["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[6]"
type="Float"
name = "DS_DL_spliceai_indel_float"

#SpliceAI Indel DP_AG
[[postannotation]]
fields=["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[7]"
type="Float"
name = "DP_AG_spliceai_indel_float"

#SpliceAI Indel DP_AL
[[postannotation]]
fields=["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[8]"
type="Float"
name = "DP_AL_spliceai_indel_float"

#SpliceAI Indel DP_DG
[[postannotation]]
fields=["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[9]"
type="Float"
name = "DP_DG_spliceai_indel_float"

#SpliceAI Indel DP_DL
[[postannotation]]
fields=["SpliceAI_NonSeparated_indel"]
op = "lua:split(SpliceAI_NonSeparated_indel,'|')[10]"
type="Float"
name = "DP_DL_spliceai_indel_float"

@Phillip-a-richmond
Copy link
Author

Hey Brent,

So the fix worked, but for annotations with multiple overlapping genes it throws an error. I'm tempted to just ignore these for now, but I figured I'd ask if you have any recommendations on how to handle this within post-annotation? How do you handle this for slivar?

Right now the line comes out this:

1 74269135 . G T 59.5 PASS CSQ=T|intron_variant|MODIFIER|TNNI3K|ENSG00000116783|Transcript|ENST00000326637|protein_coding||4/24||||||||||1||HGNC|HGNC:19661,T|intron_variant|MODIFIER|FPGT-TNNI3K|ENSG00000259030|Transcript|ENST00000370895|protein_coding||6/18||||||||||1||HGNC|HGNC:42952,T|intron_variant|MODIFIER|FPGT-TNNI3K|ENSG00000259030|Transcript|ENST00000370899|protein_coding||6/23||||||||||1||HGNC|HGNC:42952,T|intron_variant|MODIFIER|FPGT-TNNI3K|ENSG00000259030|Transcript|ENST00000534632|protein_coding||3/4||||||||||1|cds_end_NF|HGNC|HGNC:42952,T|intron_variant|MODIFIER|FPGT-TNNI3K|ENSG00000259030|Transcript|ENST00000557284|protein_coding||6/26||||||||||1||HGNC|HGNC:42952,T|intron_variant&NMD_transcript_variant|MODIFIER|FPGT-TNNI3K|ENSG00000259030|Transcript|ENST00000648585|nonsense_mediated_decay||9/29||||||||||1||HGNC|HGNC:42952;CADD_v1.4=0.016;gnomad_genome_af_global=0.5605;gnomad_genome_hom_global=25615;gnomad_genome_ac_global=80034;gnomad_genome_an_global=142780;rs_ids=rs1341571;SpliceAI_NonSeparated_snv=T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1,T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 GT:GQ:DP:AD:VAF:PL 0/1:33:24:16,8:0.333333:32,0,54 0/0:.:.:.:.:. 1/1:57:41:0,41:1:59,59,00

For a variant, while throwing the error of:

values were: [[C|LRRC8C|0.00|0.00|0.00|0.00|11|4|10|-1 C|RP11-302M6.4|0.00|0.00|0.00|0.00|11|-13|-28|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[2]
api.go:691: ERROR: in lua postannotation at 1:74269135 for ALLELE_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[1]
api.go:691: ERROR: in lua postannotation at 1:74269135 for SYMBOL_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[2]
api.go:691: ERROR: in lua postannotation at 1:74269135 for DS_AG_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[3]
api.go:691: ERROR: in lua postannotation at 1:89633156 for DS_AG_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[C|LRRC8C|0.00|0.00|0.00|0.00|11|4|10|-1 C|RP11-302M6.4|0.00|0.00|0.00|0.00|11|-13|-28|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[3]
api.go:691: ERROR: in lua postannotation at 1:74269135 for DS_AL_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[4]
api.go:691: ERROR: in lua postannotation at 1:74269135 for DS_DG_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[5]
api.go:691: ERROR: in lua postannotation at 1:74269135 for DS_DL_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[6]
api.go:691: ERROR: in lua postannotation at 1:74269135 for DP_AG_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[7]
api.go:691: ERROR: in lua postannotation at 1:74269135 for DP_AL_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [[T|FPGT-TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1 T|TNNI3K|0.00|0.00|0.00|0.00|13|1|41|-1]]
code is: split(SpliceAI_NonSeparated_snv,'|')[8]
api.go:691: ERROR: in lua postannotation at 1:74269135 for DP_DG_spliceai_snv.
<string>:51: attempt to call a non-function object
stack traceback:
	<string>:51: in function 'split'
	<string>:1: in main chunk
	[G]: ?
empty values were: []

@brentp
Copy link
Owner

brentp commented Feb 23, 2021

in that case, you can check if the argument going to split is a string or a table.
you can see an example of checking for table here.

in slivar, I took the largest value.

@Phillip-a-richmond
Copy link
Author

I'm not familiar with lua at all...any chance you provide me that code chunk for splitting the joined field, checking if it's an array, and then getting the maximum between the two?

Otherwise, do you have the maxed version of this SpliceAI file for GRCh38 available through slivar?

Much appreciated,
Phil

@pj-sullivan
Copy link

I'm assuming @Phillip-a-richmond that this has been solved for you now, but I came across this thread with the same question and just wanted to add the solution I put together. No guarantees it is the most efficient, but it works!
It checks each SpliceAI field to see if it is a table (i.e. multiple records), and if so, loops through each record and calculates the maximum value of Acceptor Gain, Acceptor Loss, Donor Gain or Donor Loss. Then it returns the first record that contains a value equal to the maximum. So, if two records have the same maximum (or are both 0), the first one listed in the pre-computed file will be returned.
A nice bonus is that it outputs an extra variable which can then be used to extract the scores, so you don't need the extra fields for SNVs and Indels.

.lua file

function split(str, sep) -- splits string by delimiter
	local sep, fields = sep or ":", {}
	local pattern = string.format("([^%s]+)", sep)
	str:gsub(pattern, function(c) fields[#fields+1] = c end)
	return fields
end

function indexOf(table, value) -- returns the first index of the table that matched the value
	for i, v in ipairs(table) do
		if v == value then
			return i
		end
	end
	return nil
end

function spliceai(entry) -- processes precomputed SpliceAI scores
	local t = type(entry)
	if t == "string" then
		return entry -- returns original value if single entry
	elseif t == "table" then
		local maximums = {}
		for i=1,#entry do -- calculate the maximum SpliceAI score of AG, AL, DG, DL for each entry
			maximums[i] = math.max(tonumber(split(entry[i], "|")[3]), tonumber(split(entry[i], "|")[4]), tonumber(split(entry[i], "|")[5]), tonumber(split(entry[i], "|")[6]))
		end
		return entry[indexOf(maximums, math.max(unpack(maximums)))] -- returns the full record which contains the maximum SpliceAI score
	end
end

.toml file

## SpliceAI Annotations
#SpliceAI SNV
[[annotation]]
file = "annotations/spliceai_scores.masked.snv.hg38.vcf.gz"
fields = ["SpliceAI"]
names = ["SpliceAI_NonSeparated_snv"]
ops = ["self"]

#SpliceAI Indel
[[annotation]]
file = "annotations/spliceai_scores.masked.indel.hg38.vcf.gz"
fields = ["SpliceAI"]
names = ["SpliceAI_NonSeparated_indel"]
ops = ["self"]

#SpliceAI SNV Processing
[[postannotation]]
fields = ["SpliceAI_NonSeparated_snv"]
op = "lua:spliceai(SpliceAI_NonSeparated_snv)"
type = "String"
name = "SpliceAI_processed"

#SpliceAI Indel Processing
[[postannotation]]
fields = ["SpliceAI_NonSeparated_indel"]
op = "lua:spliceai(SpliceAI_NonSeparated_indel)"
type = "String"
name = "SpliceAI_processed"

#SpliceAI ALLELE
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[1]"
type = "String"
name = "ALLELE_spliceai"

#SpliceAI SYMBOL
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[2]"
type = "String"
name = "SYMBOL_spliceai"

#SpliceAI DS_AG
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[3]"
type = "Float"
name = "DS_AG"

#SpliceAI DS_AL
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[4]"
type = "Float"
name = "DS_AL"

#SpliceAI DS_DG
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[5]"
type = "Float"
name = "DS_DG"

#SpliceAI DS_DL
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[6]"
type = "Float"
name = "DS_DL"

#SpliceAI DP_AG
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[7]"
type = "Float"
name = "DP_AG"

#SpliceAI DP_AL
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[8]"
type = "Float"
name = "DP_AL"

#SpliceAI DP_DG
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[9]"
type = "Float"
name = "DP_DG"

#SpliceAI DP_DL
[[postannotation]]
fields = ["SpliceAI_processed"]
op = "lua:split(SpliceAI_processed,'|')[10]"
type = "Float"
name = "DP_DL"

#Delete SpliceAI fields
[[postannotation]]
fields = ["SpliceAI_processed", "SpliceAI_NonSeparated_snv"], "SpliceAI_NonSeparated_indel"]]
op = "delete"

@Phillip-a-richmond
Copy link
Author

Hey that's cool! I actually just use the VEP plugin for spliceAI annotation and that works nicely https://uswest.ensembl.org/info/docs/tools/vep/script/vep_plugins.html

@kchennen
Copy link

Hi @pj-sullivan,
I am in the same situation. I tried your solution. I have copied your Lua and conf file (updated the file lines) but unfortunately got the following error:

=============================================
vcfanno version 0.3.2 [built with go1.12.1]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 4 sources from 3 files
vcfanno.go:145: using 2 worker threads to decompress bgzip file
vcfanno.go:194: Info Error: SpliceAI_processed not found in INFO >> this error/warning may occur many times. reporting once here...
api.go:691: ERROR: in lua postannotation at 21:10649510 for SpliceAI_processed.
runtime error: invalid memory address or nil pointer dereference
goroutine 2081 [running]:
github.com/yuin/gopher-lua.(*LState).PCall.func1(0x81f2d0, 0xc000280600, 0xc000f8f6a8, 0x0, 0x0, 0x0)
	/home/brentp/go/src/github.com/yuin/gopher-lua/state.go:1622 +0x5b7
panic(0x79da80, 0xb0a240)
	/home/brentp/go/go/src/runtime/panic.go:522 +0x1b5
github.com/yuin/gopher-lua.init.2.func27(0xc000280600, 0x80000002, 0xc000300000, 0x0)
	/home/brentp/go/src/github.com/yuin/gopher-lua/vm.go:714 +0x1f3
github.com/yuin/gopher-lua.mainLoop(0xc000280600, 0xc000300000)
	/home/brentp/go/src/github.com/yuin/gopher-lua/vm.go:31 +0xdc
github.com/yuin/gopher-lua.(*LState).callR(0xc000280600, 0x0, 0xffffffffffffffff, 0x0)
	/home/brentp/go/src/github.com/yuin/gopher-lua/state.go:873 +0x235
github.com/yuin/gopher-lua.(*LState).Call(...)
	/home/brentp/go/src/github.com/yuin/gopher-lua/state.go:1601
github.com/yuin/gopher-lua.(*LState).PCall(0xc000280600, 0x0, 0xffffffffffffffff, 0x0, 0x8896a0, 0xc000f41140)
	/home/brentp/go/src/github.com/yuin/gopher-lua/state.go:1662 +0xe6
github.com/yuin/gopher-lua.(*LState).DoString(0xc000280600, 0xc010cde330, 0x2a, 0xc00002c244, 0x23)
	/home/brentp/go/src/github.com/yuin/gopher-lua/auxlib.go:403 +0x11b
github.com/brentp/goluaez.(*State).Run(0xc000217ed0, 0xc00002c244, 0x23, 0x0, 0x0, 0x0, 0x0, 0xc000f8f953, 0xc000f8f878, 0x5051cc)
	/home/brentp/go/src/github.com/brentp/goluaez/luaez.go:228 +0x3f9
github.com/brentp/vcfanno/api.(*Annotator).PostAnnotate(0xc000220900, 0xc000d00d34, 0x2, 0xa27fa5, 0xa27fa6, 0x893000, 0xc000ef9f80, 0x0, 0x0, 0xae0d2e, ...)
	/home/brentp/go/src/github.com/brentp/vcfanno/api/api.go:683 +0x724
github.com/brentp/vcfanno/api.(*Annotator).AnnotateEnds(0xc000220900, 0x893ae0, 0xc007d0f770, 0x0, 0x0, 0x889500, 0xc010aee9b0)
	/home/brentp/go/src/github.com/brentp/vcfanno/api/api.go:893 +0xc45
main.main.func1(0x893ae0, 0xc007d0f770)
	/home/brentp/go/src/github.com/brentp/vcfanno/vcfanno.go:182 +0x71
github.com/brentp/irelate.PIRelate.func1.1(0xc0015fd680, 0xc00ebff300, 0xca, 0x190, 0xc0105c0ea0)
	/home/brentp/go/src/github.com/brentp/irelate/parallel.go:202 +0x89
created by github.com/brentp/irelate.PIRelate.func1
	/home/brentp/go/src/github.com/brentp/irelate/parallel.go:199 +0x89

stack traceback:
	<string>:1: in main chunk
	[G]: ?
empty values were: []
values were: [T|IGHV1OR21-1|0.00|0.00|0.00|0.00|6|-13|-34|4]
code is: spliceai(SpliceAI_NonSeparated_snv)
api.go:691: ERROR: in lua postannotation at 21:10649604 for SpliceAI_processed.
runtime error: invalid memory address or nil pointer dereference
goroutine 2081 [running]:
github.com/yuin/gopher-lua.(*LState).PCall.func1(0x81f2d0, 0xc000281b00, 0xc000f8f6a8, 0x0, 0x0, 0x0)
	/home/brentp/go/src/github.com/yuin/gopher-lua/state.go:1622 +0x5b7
panic(0x79da80, 0xb0a240)
	/home/brentp/go/go/src/runtime/panic.go:522 +0x1b5
.
.
.

Coud you help me please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants