Skip to content

Latest commit

 

History

History
49 lines (40 loc) · 2.08 KB

gff_sv.md

File metadata and controls

49 lines (40 loc) · 2.08 KB

Annotating with GFF

GFF often contains the location of genes and features. It has a column that contains key=value pairs like the INFO field in a VCF. vcfanno does not support these natively, but one can access particular fields using lua. This is an example of how to do this.

Given a variant in sample.vcf like:

22	1000	DEL000SUR	N	<DEL>	.	PASS	END=1051060

And genomic features annotated in genes.gff3.gz:

22	ensembl_havana	gene	69091	70008	.	+	.	ID=ENST00000335137;geneID=ENSG00000186092;gene_name=OR4F5
22	ensembl	mRNA	182393	184158	.	+	.	ID=ENST00000624431;geneID=ENSG00000279928;gene_name=FO538757.3
22	ensembl	mRNA	184923	195411	.	-	.	ID=ENST00000623834;geneID=ENSG00000279457;gene_name=FO538757.2
22	ensembl	mRNA	184925	195411	.	-	.	ID=ENST00000623083;geneID=ENSG00000279457;gene_name=FO538757.2
22	ensembl	mRNA	184927	200322	.	-	.	ID=ENST00000624735;geneID=ENSG00000279457;gene_name=FO538757.2
22	ensembl_havana	mRNA	450740	451678	.	-	.	ID=ENST00000426406;geneID=ENSG00000278566;gene_name=OR4F29
22	ensembl_havana	mRNA	685716	686654	.	-	.	ID=ENST00000332831;geneID=ENSG00000273547;gene_name=OR4F16
22	havana	mRNA	924880	939291	.	+	.	ID=ENST00000420190;geneID=ENSG00000187634;gene_name=SAMD11
22	havana	mRNA	925150	935793	.	+	.	ID=ENST00000437963;geneID=ENSG00000187634;gene_name=SAMD11
...

We can write a conf file gff_sv.conf

[[annotation]]
file="genes.gff3.gz"
columns=[9]
names=["gene"]
ops=["lua:gff_to_gene_name(vals, 'gene_name')"]

This will extract the full text of the 9th column. Then we can use some lua to get just the gene_name field. The lua uses a table so each observed gene is reported only once. This is in gff_sv.lua

Then we can run vcfanno as:

vcfanno -lua gff_sv.lua gff_sv.conf sample.vcf

and gene=FO538757.3,OR4F29,FO538757.2,OR4F16,OR4F5 is added to the INFO. The lua allows the user to do more complex things like only reporting genes, or taking uniq values.