Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overcalling BNDs? #472

Open
tsneddon opened this issue Apr 11, 2024 · 1 comment
Open

Overcalling BNDs? #472

tsneddon opened this issue Apr 11, 2024 · 1 comment

Comments

@tsneddon
Copy link

Hi:
I'm looking at some 30x ONP human genome data and seeing that sniffles appears to overcall variants as translocations (BND) when in fact they contain a duplication. These BND variants are being falsely prioritized in downstream processes. An example is below:

##fileformat=VCFv4.2
##source=Sniffles2_2.2
##command="/nas/longleaf/rhel8/apps/sniffles/2.2/venv/bin/sniffles --input GDN421LR.snf GDN121LR.snf GDN71LR.snf GDN501LR.snf GDN431LR.snf --vcf GDNLR_5multisample_sniffles.vcf"
##fileDate="2024/04/03 15:20:03"
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr12,length=133275309>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chrY,length=57227415>
##contig=<ID=chrM,length=16569>
##ALT=<ID=INS,Description="Insertion">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=BND,Description="Breakend; Translocation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="Number of reference reads">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of variant reads">
##FORMAT=<ID=ID,Number=1,Type=String,Description="Individual sample SV ID for multi-sample output">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=GT,Description="Genotype filter">
##FILTER=<ID=SUPPORT_MIN,Description="Minimum read support filter">
##FILTER=<ID=STDEV_POS,Description="SV Breakpoint standard deviation filter">
##FILTER=<ID=STDEV_LEN,Description="SV length standard deviation filter">
##FILTER=<ID=COV_MIN,Description="Minimum coverage filter">
##FILTER=<ID=COV_CHANGE,Description="Coverage change filter">
##FILTER=<ID=COV_CHANGE_FRAC,Description="Coverage fractional change filter">
##FILTER=<ID=MOSAIC_AF,Description="Mosaic maximum allele frequency filter">
##FILTER=<ID=ALN_NM,Description="Length adjusted mismatch filter">
##FILTER=<ID=STRAND,Description="Strand support filter">
##FILTER=<ID=SVLEN_MIN,Description="SV length filter">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Structural variation with precise breakpoints">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Structural variation with imprecise breakpoints">
##INFO=<ID=MOSAIC,Number=0,Type=Flag,Description="Structural variation classified as putative mosaic">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variation">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Mate chromsome for BND SVs">
##INFO=<ID=SUPPORT,Number=1,Type=Integer,Description="Number of reads supporting the structural variation">
##INFO=<ID=SUPPORT_INLINE,Number=1,Type=Integer,Description="Number of reads supporting an INS/DEL SV (non-split events only)">
##INFO=<ID=SUPPORT_LONG,Number=1,Type=Integer,Description="Number of soft-clipped reads putatively supporting the long insertion SV">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##INFO=<ID=STDEV_POS,Number=1,Type=Float,Description="Standard deviation of structural variation start position">
##INFO=<ID=STDEV_LEN,Number=1,Type=Float,Description="Standard deviation of structural variation length">
##INFO=<ID=COVERAGE,Number=.,Type=Float,Description="Coverages near upstream, start, center, end, downstream of structural variation">
##INFO=<ID=STRAND,Number=1,Type=String,Description="Strands of supporting reads for structural variant">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count, summed up over all samples">
##INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="List of read support for all samples">
##INFO=<ID=CONSENSUS_SUPPORT,Number=1,Type=Integer,Description="Number of reads that support the generated insertion (INS) consensus sequence">
##INFO=<ID=RNAMES,Number=.,Type=String,Description="Names of supporting reads (if enabled with --output-rnames)">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
##INFO=<ID=NM,Number=.,Type=Float,Description="Mean number of query alignment length adjusted mismatches of supporting reads">
##INFO=<ID=PHASE,Number=.,Type=String,Description="Phasing information derived from supporting reads, represented as list of: HAPLOTYPE,PHASESET,HAPLOTYPE_SUPPORT,PHASESET_SUPPORT,HAPLOTYPE_FILTER,PHASESET_FILTER">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GDN421LR GDN121LR GDN71LR GDN501LR GDN431LR
chr7 | 31776967 | Sniffles2.BND.29AM6 | N | ]chr1:65381783]N | 60 | PASS | PRECISE;SVTYPE=BND;SUPPORT=12;COVERAGE=36,24,39,41,41;STRAND=+-;AC=1;STDEV_LEN=0;STDEV_POS=0;SUPP_VEC=10000 | GT:GQ:DR:DV:ID | 0/1:55:23:12:Sniffles2.BND.3BF4S6 | 0/0:0:31:0:NULL | 0/0:0:24:0:NULL | 0/0:0:21:0:NULL | 0/0:0:27:0:NULL

This variant corresponds to the common chr1 ~8kb duplication in gnomAD:
https://gnomad.broadinstitute.org/variant/DUP_CHR1_A70DA16C?dataset=gnomad_sv_r4

The duplication is inserted at around position chr7:31776967.
It is not a translocation.

Is there a way to return less of these spurious BNDs?
Thanks,
Tam

@fritzsedlazeck
Copy link
Owner

Thanks Tam,
may I ask if you know if this duplciation is tandem or interspersed duplciation ? That could make the difference here.
CNV caller woudl call a region amplified but SV caller would show the molecular connection e.g. it could be that the other copy is connected to chr 1.. ?

Thanks
Fritz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants