Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all Svimmer INS/DEL/DUP are genotyped by grapthyper as expected #133

Open
jjfarrell opened this issue Aug 18, 2023 · 2 comments
Open

Comments

@jjfarrell
Copy link

jjfarrell commented Aug 18, 2023

@hannespetur

A number of svimmer SVs do not get genotyped. It is primarily an issue with INS and DUP calls but some DEL calls also are skipped.

INSERTIONS

zgrep -v ^# svimmer/n48299/chr22_svimmer.vcf.gz|grep SVTYPE=INS|wc -l
3800
zgrep -v ^# graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz|grep SVTYPE=INS|grep AGGREG|wc -l
3208

DELETIONS

zgrep -v ^# graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz|grep SVTYPE=DEL|grep AGGREG|wc -l
12780
zgrep -v ^# svimmer/n48299/chr22_svimmer.vcf.gz|grep SVTYPE=DEL|wc -l
12838

DUPLICATIONS

zgrep -v ^# graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz|grep SVTYPE=DUP|grep AGGREG|wc -l
4374
zgrep -v ^# svimmer/n48299/chr22_svimmer.vcf.gz|grep SVTYPE=DUP|wc -l
3885

In #116, I listed some svimmer examples for deletions.

When I looked at one of them closer, there was a DEL variant that was shifted one BP to the right which did not have any SVTYPE annotation.

zgrep 10950658   svimmer/n48299/chr22_svimmer.vcf.gz
chr22   10950658        .       AGACCAAAACAAAACAAAAGGCAACATGTGAAGGTACAAAGTGATATATGGAG   AAGACCA 0       .       END=10950710;SVTYPE=DEL;SVLEN=-52;CIGAR=1M6I52D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00

zgrep 10950659  graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz
chr22   10950659        chr22:10950659:XG       GACCAAAACAAAACAAAAGGCAACATGTGAAGGTACAAAGTGATATATGGAG    AGACCA  0       LowQD;LowQUAL   ABHet=-1;ABHom=0.9976;AC=0;AF=0;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=1;MaxAASR=0.02041;NHet=0;NHomAlt=0;NHomRef=10;PASS_AC=0;PASS_AN=20;PASS_ratio=1;QD=0;RefLen=52;SDal=0,0;SeqDepth=842;VarType=XG  GT:AD:MD:DP:GQ:PL       0/0:80,0:0:80:99:0,255,255      0/0:73,1:1:75:99:0,200,255      0/0:63,0:0:63:99:0,200,255      0/0:98,0:2:100:99:0,255,255     0/0:48,1:3:52:99:0,125,255      0/0:69,0:1:70:99:0,200,2550/0:65,0:0:65:99:0,200,255      0/0:106,0:0:106:99:0,255,255    0/0:122,0:1:123:99:0,255,255    0/0:108,0:0:108:99:0,255,255

These DELs all had a cram cigar with an insertion(eg: CIGAR=1M6I52D). So it looks like the CIGAR having both an INS and DEL may be related to the issue.

Here are a list of similar PASS calls that are missing an SVTYPE and MODEL annotation.

zgrep -v ^#    graphtyper/test/pVCF/graphtyper.autosome.raw.vcf.gz|grep -v SVTYPE|grep -v LowQD|cut -f1-8

chr22   11027397        chr22:11027397:XG       GAA     AAGAAAGAAAGAGAGAGAGAAAGAAAGAAAGAAAGATAGAGAGAGAGAAAG     402     PASS    ABHet=0.3162;ABHom=0.8517;AC=4;AF=0.2;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=27;MaxAASR=0.4314;NHet=4;NHomAlt=0;NHomRef=6;PASS_AC=3;PASS_AN=14;PASS_ratio=0.7;QD=10.05;RefLen=3;SDal=0,0;SeqDepth=714;VarType=XG
chr22   11455435        chr22:11455435:XG       ATGAGGGACAAACATTCAGACCACGGGAGCAGTGTTCTGGAATCCTACGT      GA      211     PASS    ABHet=0.45;ABHom=0.9167;AC=9;AF=0.45;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=7;MaxAASR=1;NHet=3;NHomAlt=3;NHomRef=4;PASS_AC=0;PASS_AN=4;PASS_ratio=0.2;QD=7.536;RefLen=50;SDal=0,0;SeqDepth=96;VarType=XG
chr22   16306115        chr22:16306115:IG       GATTCCATTTGATGATGATTCTATTTGAGTCCATTCGATGATTCCATTTG      T       135     PASS    ABHet=0.2899;ABHom=1;AC=3;AF=0.15;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=8;MaxAASR=0.4;NHet=3;NHomAlt=0;NHomRef=7;PASS_AC=2;PASS_AN=18;PASS_ratio=0.9;QD=6.429;RefLen=50;SDal=0,0;SeqDepth=412;VarType=IG
chr22   17260105        chr22:17260105:IG       G       ACTTTAGCCTCCTGAGTCTATAGGTGCACACCACCACACCTATCCTCCCA      665     PASS    ABHet=0.4676;ABHom=-1;AC=10;AF=0.5;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=9;MaxAASR=0.5556;NHet=10;NHomAlt=0;NHomRef=0;PASS_AC=10;PASS_AN=20;PASS_ratio=1;QD=9.779;RefLen=1;SDal=0,0;SeqDepth=142;VarType=IG
chr22   17756435        chr22:17756435:XG       CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTTCTTTCT     TC      325     PASS    ABHet=0.2692;ABHom=0.8913;AC=10;AF=0.5;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=7;MaxAASR=1;NHet=6;NHomAlt=2;NHomRef=2;PASS_AC=1;PASS_AN=2;PASS_ratio=0.1;QD=10.48;RefLen=51;SDal=0,0;SeqDepth=124;VarType=XG
chr22   20916632        chr22:20916632:XG       AAGA    GAAAAGAAAAGAAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAAGAAAG      383     PASS    ABHet=0.4595;ABHom=0.9714;AC=10;AF=0.5;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=6;MaxAASR=1;NHet=6;NHomAlt=2;NHomRef=2;PASS_AC=1;PASS_AN=4;PASS_ratio=0.2;QD=12.77;RefLen=4;SDal=0,0;SeqDepth=73;VarType=XG
chr22   23969007        chr22:23969007:XG       AAAACTGTTACTCTAACAACAAGTGTTATACACTTACCATGTGCTAGGTCCTCTACAGGTACTTTACACTCATGATCCCATTTGATCCTTACAATCCCTATC  CTTACTGAATGTCTAAAAAAACAAGTTTAAACTGTTTGTTACCCAAAGTTTGGTG 634     PASS    ABHet=0.488;ABHom=0.9363;AC=4;AF=0.2;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=25;MaxAASR=0.8095;NHet=4;NHomAlt=0;NHomRef=6;PASS_AC=2;PASS_AN=12;PASS_ratio=0.6;QD=16.86;RefLen=102;SDal=0,0;SeqDepth=377;VarType=XG
chr22   24725832        chr22:24725832:XG       TGGTTCCT        ATAGGCGAAACTGCAGAGGGAATGCAATAAAAGGAAATCCCTGTGCTCCCCCTGAGG       674     PASS    ABHet=0.3246;ABHom=0.9744;AC=8;AF=0.4;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=12;MaxAASR=1;NHet=4;NHomAlt=2;NHomRef=4;PASS_AC=2;PASS_AN=10;PASS_ratio=0.5;QD=12.48;RefLen=8;SDal=0,0;SeqDepth=280;VarType=XG
chr22   25781483        chr22:25781483:IG       G       ACCTGTGGTCCCAGCTACTCGGGAGGCTGAGGCAGAAGAATAGGTGGGCA      145     PASS    ABHet=0.2472;ABHom=0.93;AC=3;AF=0.15;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=8;MaxAASR=0.2917;NHet=3;NHomAlt=0;NHomRef=7;PASS_AC=2;PASS_AN=14;PASS_ratio=0.7;QD=6.304;RefLen=1;SDal=0,0;SeqDepth=350;VarType=IG
chr22   26691836        chr22:26691836:IG       TTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTT      C       20      PASS    ABHet=-1;ABHom=1;AC=2;AF=0.1;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=1;MaxAASR=1;NHet=0;NHomAlt=1;NHomRef=9;PASS_AC=0;PASS_AN=8;PASS_ratio=0.4;QD=20;RefLen=50;SDal=0,0;SeqDepth=117;VarType=IG
chr22   32988115        chr22:32988115:IG       T       CGGCCAACATGGATGGGCGGTTCACGAGGTCAAGAGATCAAGACCATCCC      174     PASS    ABHet=0.2703;ABHom=0.984;AC=3;AF=0.15;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=9;MaxAASR=0.5;NHet=3;NHomAlt=0;NHomRef=7;PASS_AC=2;PASS_AN=16;PASS_ratio=0.8;QD=7.25;RefLen=1;SDal=0,0;SeqDepth=266;VarType=IG
chr22   34278980        chr22:34278980:IG       AACATATATATATAATATATATAATATATAATATATATAAAATATATATA      T       687     PASS    ABHet=0.4324;ABHom=0.98;AC=12;AF=0.6;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=10;MaxAASR=1;NHet=4;NHomAlt=4;NHomRef=2;PASS_AC=4;PASS_AN=8;PASS_ratio=0.4;QD=16.36;RefLen=50;SDal=0,0;SeqDepth=90;VarType=IG
chr22   36751600        chr22:36751600:XG       CATATATGTCATATATATCATATATATCATATATATATCATATATATCAT      ATC     0       LowQUAL ABHet=-1;ABHom=1;AC=0;AF=0;AN=4;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=0;MaxAASR=0;NHet=0;NHomAlt=0;NHomRef=10;PASS_AC=0;PASS_AN=4;PASS_ratio=1;QD=0;RefLen=50;SDal=0,0;SeqDepth=41;VarType=XG
chr22   38083743        chr22:38083743:XG       GGAGGGTGTACTCAGAGACAGGTGCACCAGGAGCCGGGGGCTGGGGATAG      CGGCGCTCCTGC    510     PASS    ABHet=0.4667;ABHom=1;AC=3;AF=0.15;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=49;MaxAASR=1;NHet=1;NHomAlt=1;NHomRef=8;PASS_AC=3;PASS_AN=20;PASS_ratio=1;QD=25;RefLen=50;SDal=0,0;SeqDepth=561;VarType=XG
chr22   39653084        chr22:39653084:XG       GC      TTCCCCCACACAGTGGCTAAGAGGGCTGACTGCATTGTGGGTGCACGGATT     765     PASS    ABHet=0.5493;ABHom=1;AC=4;AF=0.2;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=46;MaxAASR=1;NHet=2;NHomAlt=1;NHomRef=7;PASS_AC=4;PASS_AN=20;PASS_ratio=1;QD=25;RefLen=2;SDal=0,0;SeqDepth=350;VarType=XG
chr22   41552567        chr22:41552567:XG       GTAGTATTGA      TTTTGTTTGAGATCACAGCTCACTGCAGCCTCTACCTCCTAGGCTCAAGT      1608    PASS    ABHet=0.7458;ABHom=0.9492;AC=15;AF=0.75;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=16;MaxAASR=1;NHet=5;NHomAlt=5;NHomRef=0;PASS_AC=8;PASS_AN=10;PASS_ratio=0.5;QD=18.37;RefLen=10;SDal=0,0;SeqDepth=118;VarType=XG
chr22   48027078        chr22:48027078:XG       TTCC    CAGACCAGGCCAGACCGTGGTCTCGAGACCAGACCGTGGTCTAGAGACCAT     2190    PASS    ABHet=0.5487;ABHom=1;AC=15;AF=0.75;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=46;MaxAASR=1;NHet=3;NHomAlt=6;NHomRef=1;PASS_AC=15;PASS_AN=20;PASS_ratio=1;QD=23.89;RefLen=4;SDal=0,0;SeqDepth=360;VarType=XG
chr22   49299624        chr22:49299624:XG       CT      TCAGCACAGCACAGCCATCAACTCCAGATCCTGGCCTGGGGCACTCCCTC      50      PASS    ABHet=0.2941;ABHom=1;AC=1;AF=0.05;AN=20;CRal=0,0;MMal=0,0;MQSal=0,0;MaxAAS=5;MaxAASR=0.2941;NHet=1;NHomAlt=0;NHomRef=9;PASS_AC=1;PASS_AN=20;PASS_ratio=1;QD=10;RefLen=2;SDal=0,0;SeqDepth=273;VarType=XG
chr22   50073048        chr22:50073048:IG       TTCCCGGGCAGGCGTGGGCCCCTTCTCCGCAGTCCACCCGGCCATACCAT      C,CTCCCGGGCAGGCGTGGGCCCCTTCTCGGCAGTCCACCCGGCCACACTGGTCCCGGGCAGGCGTGGGCCCCTTCTCCGCAGTCCACCCGGCCATACCAT   455     PASS    ABHet=0.433;ABHom=0.9971;AC=0,2;AF=0,0.1;AN=20;CRal=0,0,0;MMal=0,0,0;MQSal=0,0,0;MaxAAS=1,25;MaxAASR=0.02703,0.5556;NHet=2;NHomAlt=0;NHomRef=8;PASS_AC=0,2;PASS_AN=20;PASS_ratio=1;QD=22.5;RefLen=50;SDal=0,0,0;SeqDepth=621;VarType=IG

So it looks like there may be one variant called by graphtyper but with a missing SVTYPE and Model annotation. It looks like graphtyper thinks the variant is less than 50 bp since the ref and ALT are printed out. Svimmer has these with an SVLEN >= 50bp. It looks like graphtyper is dropping the first base of the reference relative to the original call from svimmer resulting in a SVLEN <50bp and not printing out the SV models.

chr22 11027396 . AGAA AAAGAAAGAAAGAGAGAGAGAAAGAAAGAAAGAAAGATAGAGAGAGAGAAAG 0 . END=11027399;SVTYPE=INS;SVLEN=51;CIGAR=1M51I3D;NUM_MERGED_SVS=322;STDDEV_POS=65.78,65.78

@ValentinaPeona
Copy link

@jjfarrell I also have a similar problem with a deletion. Did you find a solution?

@jjfarrell
Copy link
Author

@ValentinaPeona
Not Yet. Did your deletion also have a nearby INS based in the CRAM cigar in this region?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants