Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vcfanno does not use pipes to delimit multiple annotations for a single ALT allele #114

Open
5 tasks
ptn24 opened this issue Jul 30, 2019 · 4 comments
Open
5 tasks

Comments

@ptn24
Copy link

ptn24 commented Jul 30, 2019

The by_alt operation should use pipes (perhaps this could be parameterized) to delimit multiple annotations for a single ALT allele. However when adding BED annotations, vcfanno seems to use commas to delimit annotations

root@job-FZzyJbj03gG7Y2bZGzK4GP39:/tmp# zcat chr1.vcf.gz 
##fileformat=VCFv4.2
##hailversion=0.2.9-8588a25687af
##contig=<ID=1,length=249250621,assembly=GRCh37>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       10177   rs367896724     A       AC      .       .       .
root@job-FZzyJbj03gG7Y2bZGzK4GP39:/tmp# zcat ENCFF171LNJ.sorted.bed.gz
chr1    10135   10285   .       0       .       28      -1      -1      75
chr1    10175   10325   .       0       .       20.0    -1      -1      75
root@job-FZzyJbj03gG7Y2bZGzK4GP39:/tmp# cat by-alt.conf.toml 
[[annotation]]
names = [ "ENCFF171LNJ",]
file = "/tmp/ENCFF171LNJ.sorted.bed.gz"
columns = [ 7,]
ops = [ "by_alt",]
root@job-FZzyJbj03gG7Y2bZGzK4GP39:/tmp# vcfanno by-alt.conf.toml chr1.vcf.gz 

=============================================
vcfanno version 0.3.1 [built with go1.11]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:143: using 2 worker threads to decompress query file
##fileformat=VCFv4.2
##contig=<ID=1,length=249250621,assembly=GRCh37>
##INFO=<ID=ENCFF171LNJ,Number=A,Type=String,Description="calculated by by_alt of overlapping values in column 7 from /tmp/ENCFF171LNJ.sorted.bed.gz">
##hailversion=0.2.9-8588a25687af
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
1       10177   rs367896724     A       AC      .       .       ENCFF171LNJ=28,20.0
vcfanno.go:241: annotated 1 variants in 0.00 seconds (3213.9 / second)

Expected INFO to equal ENCFF171LNJ=28|20.0

If you have encountered an error, please include:

  • minimal conf and lua files that you are using.
  • urls or actual files for annotations in conf file.
  • minimal query file.
  • the command you used to invoke vcfanno
  • the full error message
@brentp
Copy link
Owner

brentp commented Jul 30, 2019

this is an oversight and therefore a deficiency in vcfanno, but it doesn't make sense to use by_alt on a bed file (where you don't have ref and alt columns to indicate the exact allele).

@ptn24
Copy link
Author

ptn24 commented Jul 30, 2019

That makes sense. If the INFO tag is for the whole locus though, then would it be possible to make the metadata line for INFO/ENCFF171LNJ say Number=. (cf. https://samtools.github.io/hts-specs/VCFv4.2.pdf)? It could also be useful to add a line to the documentation and/or print a warning to stdout about BED annotations (just a thought)

Alternatively, what do you think about duplicating the annotations across ALT alleles when users pass in by_alt + BEDs? Not ideal, but users would have control

@brentp
Copy link
Owner

brentp commented Jul 30, 2019

i think it should probably be an error to use by_alt with a file that doesn't have ref, alt. why don't you use op of concat?

@ptn24
Copy link
Author

ptn24 commented Jul 30, 2019

Good suggestion, will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants