Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotating both ends of translocation #135

Open
lconde-ucl opened this issue Jan 29, 2021 · 2 comments
Open

Annotating both ends of translocation #135

lconde-ucl opened this issue Jan 29, 2021 · 2 comments

Comments

@lconde-ucl
Copy link

lconde-ucl commented Jan 29, 2021

Hi,

I was wondering if it's possible to annotate both ends of a BND/TRA structural variant with vcfanno? From what I can see, it seems to annotate only one of the ends, but not the second.

For example, a BND variant in a VCF from Delly looks like this:

[...]
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  E30_CL  E43_B
chr22   38186567        BND00000378     T       T[chr6:35221592[        1980    PASS    PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv0.8.7;END=38186568;CHR2=chr6;POS2=35221592;PE=22;MAPQ=60;CT=3to5;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=60;INSLEN=0;HOMLEN=1;SR=11;SRQ=1;CONSENSUS=TAGAGGGTGAGGAATGCTGGGGAGTAAACACAGGAAGGAGAGACCGGGGAGGGTCACAGTTGCCAAATCCCCAGCACCTAGCACGGTGCTTCGCATGTTGGAGGTGCTCAATAA;CE=1.9414;RDRATIO=1;SOMATIC   GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-50.9548,0,-92.0494:10000:PASS:28322:44779:16457:2:40:24:33:20       0/0:0,-14.7411,-165.291:147:PASS:30566:68123:37557:2:63:0:49:0 

Using the config below (which simply annotates overlapping genes), vcfanno will correctly annotate the variant with the gene that overlaps the chr22:38186567 end, but it doesn't annotate the other breakpoint (chr6:35221592). However, for BNDs, you would want to know which genes are disrupted at both ends, not only one. Is there any option within vcfanno that allows both ends of the breakpoint to be annotated?

[[annotation]]
file="ens102_hg38_protCod_HGNC_sorted.bed.gz"
columns = [4]
names=["Gene"]
ops=["uniq"]

Please note that some SV callers like manta, output 2 entries per BND, for example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  E20_B   E20_CL
chr11   15366906        MantaBND:321658:0:1:0:0:0:1     A       [chr20:49362013[ATCA    .       PASS    SVTYPE=BND;MATEID=MantaBND:321658:0:1:0:0:0:0;SVINSLEN=3;SVINSSEQ=ATC;SOMATIC;SOMATICSCORE=85;BND_DEPTH=87;MATE_BND_DEPTH=74    PR:SR    66,0:96,0       63,10:94,11
chr20   49362013        MantaBND:321658:0:1:0:0:0:0     G       [chr11:15366906[GATG    .       PASS    SVTYPE=BND;MATEID=MantaBND:321658:0:1:0:0:0:1;SVINSLEN=3;SVINSSEQ=GAT;SOMATIC;SOMATICSCORE=85;BND_DEPTH=74;MATE_BND_DEPTH=87    PR:SR    66,0:96,0       63,10:94,11

so in that case, this is not a big issue as both ends will be annotated, but other SV callers like Delly and SURVIVOR only output one line per BND and therefore only one of the breakpoints ends up being annotated

Thanks
Lucia

@brentp
Copy link
Owner

brentp commented Jan 29, 2021

Hi, unfortunately this would be too difficult to support in vcfanno as it expects the entries in sorted order and looking at the other end of a BND would be a random lookup. As you note, it will work when there is a BND entry in the VCF for each end (which I think is what the vcf spec would dictate). perhaps you could write a post-processor that splits single entries into 2 BNDs before sending to vcfanno?

@lconde-ucl
Copy link
Author

Hi Brent
thanks for your quick reply. Yes, I can see how the sorting issue can make it difficult to implement. Splitting the BDNs is an easy workaround, so I'll do that.
Thanks,
Lucia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants