Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcfwave generates VCF files with invalid rows #354

Open
AndreaGuarracino opened this issue Jun 11, 2022 · 2 comments
Open

vcfwave generates VCF files with invalid rows #354

AndreaGuarracino opened this issue Jun 11, 2022 · 2 comments
Assignees
Labels
bug Genuine bug

Comments

@AndreaGuarracino
Copy link
Contributor

Given this pipeline:

fin=scerevisiae8.fa.gz.a9c917e.e34d4cd.d31edad.smooth.final.SGDref.haplo.vcf.gz
fout=scerevisiae8.fa.gz.a9c917e.e34d4cd.d31edad.smooth.final.SGDref.haplo.waved.vcf.gz

vcfbub -l 0 -a 100000 --input $fin | vcfwave -I 1000 -t 16 | bgzip -c -@ 16 > $fout

The output presents invalid rows:

  • the TYPE field sometimes is wrong and incomplete
zgrep 987945 $fout

SGDref#1#chrIV	987945	>420266>421972_55	TATGTCATATTGAATTTCCTCATCACTTTCGCCTAGATTTATAATCTTGGTGTCGTATTGCATCTTAAGCTTCTCTATAATTCTTTTGTTTGAATTTAGATTTTTGCTAAACAATACCATATCATCTACGAATAAACAAATTGTCACTTGAC	AATGTCATATTGAATTTCCTCATCACTTTCGCCTAGATTTATAATCTTGGTGTCGTATTGCATCTTAAGCTTCTCTATAATTCTTTTGTTTGAATTTAGATTTTTGCTAAACAATACCATATCATCTACGAATAAACAAATTGTCACTTGAC,TATGTCGTACTGAATTTCGTTATCACTTTCACCCAGATTTATTATCTTTGTATCGTATTGTTTCTTGAGTGTTGTTATGATTTTCTTATTTGCATTTAAGTCTTTGCTGAACAATATCATATCATCAACGAATAAGCAAATTGTTACTTGAC	60	.	48=SGDref#1#chrIV:981143,SGDref#1#chrIV:981143;AC=1,2;AF=0.142857,0.285714;INV=1,1;LEN=1,0;TYPE=snp,	GT	2	.	0	2	.	1	.
  • the ALT allele sometimes is missing
zgrep SGDref#1#chrMT $fout | grep 9878

SGDref#1#chrMT	9878	>627557>627676_51	A		60	.	48=SGDref#1#chrMT:9296;AC=1;AF=0.142857;INV=0;LEN=1;TYPE=complex	GT	0	0	0	0	0	1	0

scerevisiae8.fa.gz.a9c917e.e34d4cd.d31edad.smooth.final.SGDref.haplo.vcf.gz
scerevisiae8.fa.gz.a9c917e.e34d4cd.d31edad.smooth.final.SGDref.haplo.waved.vcf.gz

@pjotrp
Copy link
Contributor

pjotrp commented Jun 12, 2022

The new vcfwave code on trunk (--nextgen switch) should not have these issues because it does not try to have multiple alleles on one line and is a lot cleaner: https://github.com/pjotrp/vcflib/blob/master/src/vcfwave.cpp#L240. I'll make a release in the coming days after more testing.

@pjotrp pjotrp self-assigned this Jun 12, 2022
@pjotrp pjotrp added the bug Genuine bug label Jun 12, 2022
@pjotrp
Copy link
Contributor

pjotrp commented Mar 24, 2023

@AndreaGuarracino can you confirm this is fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Genuine bug
Projects
None yet
Development

No branches or pull requests

2 participants