Skip to content

Commit

Permalink
add sv+gff example
Browse files Browse the repository at this point in the history
  • Loading branch information
brentp committed Aug 1, 2018
1 parent b4f48a1 commit 96742f2
Show file tree
Hide file tree
Showing 8 changed files with 3,498 additions and 0 deletions.
5 changes: 5 additions & 0 deletions api/api.go
@@ -1,6 +1,7 @@
package api

import (
"bytes"
"fmt"
"log"
"os"
Expand Down Expand Up @@ -359,6 +360,10 @@ func collect(v interfaces.IVariant, rels []interfaces.Relatable, src *Source, st
if !ok {
continue
}
if src.Column-1 >= len(o.Fields) {
log.Println(src)
log.Println(string(bytes.Join(o.Fields, []byte{'\t'})))
}
sval := string(o.Fields[src.Column-1])
if src.IsNumber() {

Expand Down
Binary file added docs/examples/gff_sv/genes.gff3.gz
Binary file not shown.
Binary file added docs/examples/gff_sv/genes.gff3.gz.tbi
Binary file not shown.
27 changes: 27 additions & 0 deletions docs/examples/gff_sv/gff_gene_name.lua
@@ -0,0 +1,27 @@
function gff_to_gene_name(infos, name_field)
if infos == nil or #infos == 0 then
return nil
end
local result = {}

for i=1,#infos do
info = infos[i]
local s, e = string.find(info, name_field .. "=")
if s ~= nil then
name = info:sub(e + 1)
s, e = string.find(name, ";")
if e == nil then
e = name:len()
else
e = e - 1
end

result[name:sub(0, e)] = 1
end
end
local keys = {}
for k, v in pairs(result) do
keys[#keys+1] = k
end
return table.concat(keys,",")
end
5 changes: 5 additions & 0 deletions docs/examples/gff_sv/gff_sv.conf
@@ -0,0 +1,5 @@
[[annotation]]
file="sample.gff3.gz"
columns=[9]
names=["xgene"]
ops=["lua:gff_to_gene_name(vals, 'gene_name')"]
49 changes: 49 additions & 0 deletions docs/examples/gff_sv/gff_sv.md
@@ -0,0 +1,49 @@
Annotating with GFF
===================

GFF often contains the location of genes and features.
It has a column that contains key=value pairs like the INFO
field in a VCF. `vcfanno` does not support these natively,
but one can access particular fields using lua. This is an example of
how to do this.

Given a variant in `sample.vcf` like:
```
22 1000 DEL000SUR N <DEL> . PASS END=1051060
```

And genomic features annotated in `genes.gff3.gz`:

```
22 ensembl_havana gene 69091 70008 . + . ID=ENST00000335137;geneID=ENSG00000186092;gene_name=OR4F5
22 ensembl mRNA 182393 184158 . + . ID=ENST00000624431;geneID=ENSG00000279928;gene_name=FO538757.3
22 ensembl mRNA 184923 195411 . - . ID=ENST00000623834;geneID=ENSG00000279457;gene_name=FO538757.2
22 ensembl mRNA 184925 195411 . - . ID=ENST00000623083;geneID=ENSG00000279457;gene_name=FO538757.2
22 ensembl mRNA 184927 200322 . - . ID=ENST00000624735;geneID=ENSG00000279457;gene_name=FO538757.2
22 ensembl_havana mRNA 450740 451678 . - . ID=ENST00000426406;geneID=ENSG00000278566;gene_name=OR4F29
22 ensembl_havana mRNA 685716 686654 . - . ID=ENST00000332831;geneID=ENSG00000273547;gene_name=OR4F16
22 havana mRNA 924880 939291 . + . ID=ENST00000420190;geneID=ENSG00000187634;gene_name=SAMD11
22 havana mRNA 925150 935793 . + . ID=ENST00000437963;geneID=ENSG00000187634;gene_name=SAMD11
...
```

We can write a conf file [gff_sv.conf](https://github.com/brentp/vcfanno/blob/master/docs/examples/gff_sv/gff_sv.conf)
```
[[annotation]]
file="genes.gff3.gz"
columns=[9]
names=["gene"]
ops=["lua:gff_to_gene_name(vals, 'gene_name')"]
```

This will extract the full text of the 9th column. Then we can use some lua to get just the `gene_name` field.
The lua uses a table so each observed gene is reported only once.
This is in [gff_sv.lua](https://github.com/brentp/vcfanno/blob/master/docs/examples/gff_sv/gff_sv.lua)


Then we can run `vcfanno` as:
```
vcfanno -lua gff_sv.lua gff_sv.conf sample.vcf
```

and `gene=FO538757.3,OR4F29,FO538757.2,OR4F16,OR4F5` is added to the INFO. The lua allows the user to do more complex things like only reporting genes, or taking uniq values.

0 comments on commit 96742f2

Please sign in to comment.