Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNP handling in version 1.0.9 #398

Open
marc-sturm opened this issue Dec 14, 2023 · 2 comments
Open

MNP handling in version 1.0.9 #398

marc-sturm opened this issue Dec 14, 2023 · 2 comments
Labels
bug Genuine bug wavefront

Comments

@marc-sturm
Copy link

Hi,

I have a question about MNP handling in 1.0.9.

I created the following minmal test VCF:

##fileformat=VCFv4.2
##fileDate=20231214
##source=freeBayes v1.3.6
##reference=/tmp/local_ngs_data_GRCh38//GRCh38.fa
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each allele">
##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count">
##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference observations">
##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate observations">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	vc_freebayes_in
chr22	11066446	.	G	A	5700.41	.	.	GT:GQ:DP:AD:RO:QR:AO:QA:GL	0/1:160.002:436:198,238:198:7122:238:8563:-588.912,0,-509.596
chr22	11066465	.	TG	CA	4699.45	.	.	GT:GQ:DP:AD:RO:QR:AO:QA:GL	0/1:135.948:455:216,198:216:7876:198:7070:-482.411,0,-583.824

When using vcfallelicprimitives -kg (version 1.0.3), I get the following variants as expected:

#CHROM  POS     ID      REF ALT QUAL FILTER INFO FORMAT vc_freebayes_in
chr22   11066446        . G A 5700.41 . . GT 0/1
chr22   11066465        . T C 4699.45 . LEN=1;TYPE=snp GT 0|1
chr22   11066466        . G A 4699.45 . LEN=1;TYPE=snp GT 0|1

When using vcfallelicprimitives -k (version 1.0.9), the MNP is missing:

#CHROM  POS     ID      REF ALT QUAL FILTER INFO FORMAT vc_freebayes_in
chr22   11066446        . G A 5700.41 . . GT:GQ:DP:AD:RO:QR:AO:QA:GL 0/1:160.002:436:198,238:198:7122:238:8563:-588.912,0,-509.596

I don't understand why it is not in the output.

Since, vcfwave is recommended anyway I tried vcfwave --quiet, but it crashes when processing the MNP:

#CHROM  POS     ID      REF ALT QUAL FILTER INFO FORMAT vc_freebayes_in
chr22   11066446        . G A 5700.41 . . GT:GQ:DP:AD:RO:QR:AO:QA:GL 0/1:160.002:436:198,238:198:7122:238:8563:-588.912,0,-509.596
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Aborted (core dumped)

Any suggestions?

Best,
Marc

@marc-sturm marc-sturm added the bug Genuine bug label Dec 14, 2023
@pjotrp
Copy link
Contributor

pjotrp commented Apr 22, 2024

try legacy vcfallelicprimitives -a SW -kg

@marc-sturm
Copy link
Author

The legacy version 1,0.3 works as I said above.

Version 1.0.10 still does not work.

vcfwave --quiet no longer crashes, but does not decompose the MNP:

  ##fileformat=VCFv4.2
  ##fileDate=20231214
  ##source=freeBayes v1.3.6
  ##reference=/tmp/local_ngs_data_GRCh38//GRCh38.fa
  ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
  ##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
  ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
  ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each allele">
  ##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count">
  ##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference observations">
  ##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
  ##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate observations">
  ##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">
  ##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
  ##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length">
  ##INFO=<ID=ORIGIN,Number=1,Type=String,Description="Decomposed from a complex record using vcflib vcfwave and alignment with WFA2-lib.">
  ##INFO=<ID=INV,Number=0,Type=Flag,Description="Inversion detected">
  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  vc_freebayes_in
  chr22   11066446        .       G       A       5700.41 .       .       GT:GQ:DP:AD:RO:QR:AO:QA:GL      0/1:160.002:436:198,238:198:7122:238:8563:-588.912,0,-509.596
  chr22   11066465        ._1     TG      CA      4.7e+03 .       ORIGIN=chr22:11066465;LEN=2;TYPE=mnp    GT      0|1

vcfallelicprimitives -k -a SW still removes the MNP. When using -a WF the MNP is also removed:

##fileformat=VCFv4.2
##fileDate=20231214
##source=freeBayes v1.3.6
##reference=/tmp/local_ngs_data_GRCh38//GRCh38.fa
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Number of observation for each allele">
##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count">
##FORMAT=<ID=QR,Number=1,Type=Integer,Description="Sum of quality of the reference observations">
##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
##FORMAT=<ID=QA,Number=A,Type=Integer,Description="Sum of quality of the alternate observations">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">
##INFO=<ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length">
##INFO=<ID=ORIGIN,Number=1,Type=String,Description="Decomposed from a complex record using vcflib vcfallelicprimitives and alignment with obsolete SW.">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  vc_freebayes_in
chr22   11066446        .       G       A       5700.41 .       .       GT:GQ:DP:AD:RO:QR:AO:QA:GL      0/1:160.002:436:198,238:198:7122:238:8563:-588.912,0,-509.596

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Genuine bug wavefront
Projects
None yet
Development

No branches or pull requests

2 participants