Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility issue for consensus creation #442

Open
Mossy-Frog opened this issue Nov 27, 2023 · 6 comments
Open

Compatibility issue for consensus creation #442

Mossy-Frog opened this issue Nov 27, 2023 · 6 comments

Comments

@Mossy-Frog
Copy link

Mossy-Frog commented Nov 27, 2023

Hi!
First of all, thank you very much for Sniffles, it's a life-saviour.
I need to create a consensus sequence from a VCF I made with Sniffles containing large deletion in viral genome. I tried different tools, but each time they seemed to be unable to read the VCF. Tried BCFTools, GATK and others.
Do you have any way to transform the VCF into something readable for other tools ?

@wdecoster
Copy link
Contributor

but each time they seemed to be unbale to read the VCF. Tried BCFTools, GATK and others.

Hmm I have never had issues with bcftools, could you perhaps mention the error message you got?

@Mossy-Frog
Copy link
Author

Sure ! I would be glad if there was any solution to my problem !
Here is my input :
bcftools consensus -f /referenceNIH.fasta /mvlsniffles.vcf.gz > /consensus_sniffled.fasta
here the error message:
""
The fasta sequence does not match the REF allele at DQ12345:6121:
REF .vcf: [ACGCAATTGTCCCACGGAAGTGAATCCTTCAACTCACCACCAAAGAGCTCCGTTGCATCAGTTCTGAAAGAGATGAGAAGCCTGTAGAGAGACCCTGCGCTTTCTCTATGGGTCCATCTATGAGAAACCCACAGGATGTATTCAGTCAGACAATGTCTGACGTCGGCCACGGTATTCAGGGAGTCCTTAGTAGCGTGGCAATGACAGGGTCTGAACTTGGCACAAGGAGAGGCCATTGTGAAGGTAGACCTGTAGCCGTCTATGCTAATAGAGGGCTTTAATTTCCATTTTTTTAATGGGGTTGTGGATGAGGAATGAGAGTGATATCATATTGAGATACGTAGTTATGTAGAGGTGTATTTCCTATATTATTTACTTTCGGTTTCATATTTTACCAACTCTTTAATAAATTTCTTTTCACGATGCATCTTATTAAATGACGTTTTCTCATAAGTGGACATATAGATGCAAAAGTAATGAAGAAAAGTATTACCTCTATCATCTACATAATTAGGGTCTGCTCCTTTTTTTAACAACTTATACAGTACGTAGTAGTAGTTTATCGGTTTTAAATCAAGTCTAGAATATATAGTGGATTAATATATTTTTATATTCGCTAAAGCTATCTATACTATCAGAAAGCATATCATTCTCAACTTCATCATGAGTTAAATATTTGTGTAATGGAATGTGACCATCACTGTCATGACATACTCCTTTAATAGGTTTTTTAAAACAGATGATTCAAATCCTTCATTCATTAGATAACAGTGTAACGGAGTCGTACCTTCTACTAGTTTGTTTATATCACAGCATTCTACAAACAGTCTAAACAATAGAGAAGACGGACAGACTTTAACGTATAAATGACACATGTTATCGATATTCGTTGATGAATTATTATTAAACGTAGTTATGATAAATGATTCTAACGACATCTCTCGCTAGAGATAAAATCTAGTATCGTATCATACTCGCATAGCATAGTTTTTCATAATTAATACAATATTTAAAAGACTTATTCGGAAAGTATTTTAATACATGTATCATCGATGGAGATCCATATGAGGAGTCACTTGTAGTTCTTCAGTAGTAATAACAGTGCTATCATCGATAGTATAATTATATGTTGTTGTAATTGGAGTAACTGTTGGTAGTTCTTCCGTGGAATCAATAATTATACTAACAGCAATAGTATAATTATATAAATATGTTCCGTTGATATCACATATTTTAATGAACTCATTTCTAACACCCTCAGCTATATCTGTCCAATTAAATGTAGCCAACAATCTACTACGTTCTCTTTGATTGACTACTTGTACGGTAGCGACGCTACACTATCTTTATTGTCTTCTACATGCTCCAATTGAATGTCATGATACAACGCAGTTTTTCTTATGCATGTTTCATAACACCACGAACATGTCGCAGTAAGATAATTTCTGTAAATTCATGATTGCCGGTCATAAACAAGCCCGTCAATAATTGTGGCTATATATTCAGTTTATAGAGCAAAATAATTAAGCACAATAGCGCTTAATCTCAAAATATGTTATGTTTATTTTTTTCATATTAAACATACTGGTTAAAATCCTCTAAAGGCTGATCTTCATCTATAAATCAAGATCATAATTACATTTAGACAGTGGTTTCATGTTTATAAAAATGTTCTTTTTGTGTGAATAAGGAATATACTAATCAATAATCAACCATCGACCCCATTACGATAGTATGCAGGCAACCCCCCATTAGAGAGGTACGTGTAATCAGTCTCTCCAGTTTTAGTATTTTTATAAGTCATTGTTACATAAACGGCTTTTAAACAGTCTCCTCGATAATAAGCCATATCTGGAAATTTATTAAATACTCGAGTCATTTTACGCACGGTCAAAAAAGTAAGTAATGTCGACGACTTCTTACATTCTATAGAAACACCTAGAATACTCATTTTCTTTTGGAAAATATCCTCAGACTCTGATTTGAACAATGCACGACCTATAGTAAACCGTGACCAATAAGTTATATTAGTCAATGGTATATCCAAACCATCAGGTGTGGATAGTACGCCGATAGTCCAGTCTTTGGTATCGATAGTGTAGTTATTGAACTGAGAAGTTACCGTATAGTCTTTTTGGTCATCTCTAAACAAGGAAACTAATACCTCTACACTATTGAACGATTTATCTTCCGTAATGGGTGGAATAACGGGAATATAAAGTGGACTAGCGATGGATGAAGTCACGAATATAAGACACGCTATTAATCCGTATATCATCATTTTGATATTACTTATAATAACGATTTGTTTAATTTTTAGTTTATACTATTAATTGTAAATGATATTATTATTTTTTTTTAAGTATTATCAGCTTTAGTTTATACTATTACTATTTGTAATATTTAGACATAGATAAACGTGATAAAAGTCTATTTGTTTATATTTATTGCGGATAGCAGTATTTCCCTATAAAAAGTATACGTCCTGTGTTGTCTTTAATCATGTACATGAATGGATGGTTTATGTAGACCTTCGTACGATATACCATCGAAAAGTTAGTCATAAATACTCCTGTAACGGCCGATGCTTCTGTATACTCCTCATTAACATCTATAAACGTCGTATGTAGAAATTTTTCTACAGTGATAGTTTCATTACACATCTTGCTAAAATCTGCATAATATCCGAATATATTAGTAAGTCCTAAATTTTCTAAAATCGGTACCAGATTATACGGTTCTGTCATTTCCACTTTAAACTTTGGCATATACAAGTCTATACTTTTAGTAGATAACATACCACACCATTTTTTAAATTTTTCATCTGTTATATTTTTTTCTATGTTATATATACCTTCTATGTCGTCCGGTAGTATAATTACCATACTAGAGTTTCCCTCGTATGGAATATCGATAATAGAGAATCCTCCGAATAATTCATTAATATGTACATATTGCAAGTTATTCTCGGTACCCACCATCATATCAACGCTGGTAACTATATTCTTAGAAATATAAAACTTGTCTGTATATGTAAGATGTTTAGAAAATGGATATTTCCACATTGCTTTAAAATGGACGGCGCTAACAACTGTCATACGAGTATTAATGGATAGCGGACTAGTCAATAAGGAATTAATTTTACCATTTGTCATTGTCTTAACCCATTCGTTGATTAGTTCCTTTGTTTGGTTAGCATTATTAAAGTTTACAGTTTGAAAATCGTCTTTTATTTTTTGTAGGAAGGAGGCATGGAACTCGATACTATCGCTACCGTATATTTTATTTGCGGTAGCTAGTGTCGCACAATACGGAATATCTACGTCCATGTCATTATTGTCATCGGGTGTATTCTCATTCATATTCTCTATATATTTTGATAGTTGTTCAGCTGTAGAACCAGCTGCTCCATGATTTAGAATAGATAAAGTAGATAAAATAGAAACTGGAGAAATCAAAACATTTTCATCAGGGTGTTTTACGATTAGTTCTTTAAAGATATCCATGGTATAGACCAAACAATAACGATAACGATATATATCATAAATAAATAATGT]
ALT .vcf: [N]
REF .fa : [ACGCAATTGTCCCACGGAAGTGAATCCTTCAACTCACCACCAAAGAGCTCCGTTGCATCAGTTCTGAAAGAGATGAGAAGCCTGTAGAGAGACCCTGCGCTTTCTCTATGGGTCCATCTATGAGAAACCCACAGGATGTATTCAGTCAGACAATGTCTGACGTCGGCCACGGTATTCAGGGAGTCCTTAGTAGCGTGGCAATGACAGGGTCTGAACTTGGCACAAGGAGAGGCCATTGTGAAGGTAGACCTGTAGCCGTCTATGCTAATAGAGGGCTTTAATTTCCATTTTTTTAATGGGGTTGTGGATGAGGAATGAGAGTGATATCATATTGAGATACGTAGTTATGTAGAGGTGTATTTCCTATATTATTTACTTTCGGTTTCATATTTTACCAACTCTTTAATAAATTTCTTTTCACGATGCATCTTATTAAATGACGTTTTCTCATAAGTGGACATATAGATGCAAAAGTAATGAAGAAAAGTATTACCTCTATCATCTACATAATTAGGGTCTGCTCCTTTTTTTAACAACTTATACAGTACGTAGTAGTAGTTTATCGGTTTTAAATCAAGTCTAGAATATATAGTGGATTAATATATTTTTATATTCGCTAAAGCTATCTATACTATCAGAAAGCATATCATTCTCAACTTCATCATGAGTTAAATATTTGTGTAATGGAATGTGACCATCACTGTCATGACATACTCCTTTAATAGGTTTTTTAAAACAGATGATTCAAATCCTTCATTCATTAGATAACAGTGTAACGGAGTCGTACCTTCTACTAGTTTGTTTATATCACAGCATTCTACAAACAGTCTAAACAATAGAGAAGACGGACAGACTTTAACGTATAAATGACACATGTTATCGATATTCGTTGATGAATTATTATTAAACGTAGTTATGATAAATGATTCTAACGACATCTCTCGCTAGAGATAAAATCTAGTATCGTATCATACTCGCATAGCATAGTTTTTCATAATTAATACAATATTTAAAAGACTTATTCGGAAAGTATTTTAATACATGTATCATCGATGGAGATCCATATGAGGAGTCACTTGTAGTTCTTCAGTAGTAATAACAGTGCTATCATCGATAGTATAATTATATGTTGTTGTAATTGGAGTAACTGTTGGTAGTTCTTCCGTGGAATCAATAATTATACTAACAGCAATAGTATAATTATATAAATATGTTCCGTTGATATCACATATTTTAATGAACTCATTTCTAACACCCTCAGCTATATCTGTCCAATTAAATGTAGCCAACAATCTACTACGTTCTCTTTGATTGACTACTTGTACGGTAGCGACGCTACACTATCTTTATTGTCTTCTACATGCTCCAATTGAATGTCATGATACAACGCAGTTTTTCTTATGCATGTTTCATAACACCACGAACATGTCGCAGTAAGATAATTTCTGTAAATTCATGATTGCCGGTCATAAACAAGCCCGTCAATAATTGTGGCTATATATTCAGTTTATAGAGCAAAATAATTAAGCACAATAGCGCTTAATCTCAAAATATGTTATGTTTATTTTTTTCATATTAAACATACTGGTTAAAATCCTCTAAAGGCTGATCTTCATCTATAAATCAAGATCATAATTACATTTAGACAGTGGTTTCATGTTTATAAAAATGTTCTTTTTGTGTGAATAAGGAATATACTAATCAATAATCAACCATCGACCCCATTACGATAGTATGCAGGCAACCCCCCATTAGAGAGGTACGTGTAATCAGTCTCTCCAGTTTTAGTATTTTTATAAGTCATTGTTACATAAACGGCTTTTAAACAGTCTCCTCGATAATAAGCCATATCTGGAAATTTATTAAATACTCGAGTCATTTTACGCACGGTCAAAAAAGTAAGTAATGTCGACGACTTCTTACATTCTATAGAAACACCTAGAATACTCATTTTCTTTTGGAAAATATCCTCAGACTCTGATTTGAACAATGCACGACCTATAGTAAACCGTGACCAATAAGTTATATTAGTCAATGGTATATCCAAACCATCAGGTGTGGATAGTACGCCGATAGTCCAGTCTTTGGTATCGATAGTGTAGTTATTGAACTGAGAAGTTACCGTATAGTCTTTTTGGTCATCTCTAAACAAGGAAACTAATACCTCTACACTATTGAACGATTTATCTTCCGTAATGGGTGGAATAACGGGAATATAAAGTGGACTAGCGATGGATGAAGTCACGAATATAAGACACGCTATTAATCCGTATATCATCATTTTGATATTACTTATAATAACGATTTGTTTAATTTTTAGTTTATACTATTAATTGTAAATGATATTATTATTTTTTTTTAAGTATTATCAGCTTTAGTTTATACTATTACTATTTGTAATATTTAGACATAGATAAACGTGATAAAAGTCTATTTGTTTATATTTATTGCGGATAGCAGTATTTCCCTATAAAAAGTATACGTCCTGTGTTGTCTTTAATCATGTACATGAATGGATGGTTTATGTAGACCTTCGTACGATATACCATCGAAAAGTTAGTCATAAATACTCCTGTAACGGCCGATGCTTCTGTATACTCCTCATTAACATCTATAAACGTCGTATGTAGAAATTTTTCTACAGTGATAGTTTCATTACACATCTTGCTAAAATCTGCATAATATCCGAATATATTAGTAAGTCCTAAATTTTCTAAAATCGGTACCAGATTATACGGTTCTGTCATTTCCACTTTAAACTTTGGCATATACAAGTCTATACTTTTAGTAGATAACATACCACACCATTTTTTAAATTTTTCATCTGTTATATTTTTTTCTATGTTATATATACCTTCTATGTCGTCCGGTAGTATAATTACCATACTAGAGTTTCCCTCGTATGGAATATCGATAATAGAGAATCCTCCGAATAATTCATTAATATGTACATATTGCAAGTTATTCTCGGTACCCACCATCATATCAACGCTGGTAACTATATTCTTAGAAATATAAAACTTGTCTGTATATGTAAGATGTTTAGAAAATGGATATTTCCACATTGCTTTAAAATGGACGGCGCTAACAACTGTCATACGAGTATTAATGGATAGCGGACTAGTCAATAAGGAATTAATTTTACCATTTGTCATTGTCTTAACCCATTCGTTGATTAGTTCCTTTGTTTGGTTAGCATTATTAAAGTTTACAGTTTGAAAATCGTCTTTTATTTTTTGTAGGAAGGAGGCATGGAACTCGATACTATCGCTACCGTATATTTTATTTGCGGTAGCTAGTGTCGCACAATACGGAATATCTACGTCCATGTCATTATTGTCATCGGGTGTATTCTCATTCATATTCTCTATATATTTTGATAGTTGTTCAGCTGTAGAACCAGCTGCTCCATGATTTAGAATAGATAAAGTAGATAAAATAGAAACTGGAGAAATCAAAACATTTTCATCAGGGTGTTTTACGATTAGTTCTTTAAAGATATCCATGGTATAGACCAAACAATAACGATAACGATATATATCATAAATAAATAATGTT]AAATTTCAGTTTATGTTTGTACCCCGTATTCATACTTAACAAATTGGTATTG
""

I must mention that from what I understand, it is a problem of reference matching, but I made sure it is the same reference I used for the VCF creation and during this step, also I created other VCFs using other algorithms in the same fashion (BCFtools, Nanopolish, Dysgu) and worked, thus I guessed the problem came from an incompatibility between the 2 tools ? Open to any inputs !

@fritzsedlazeck
Copy link
Owner

Mmh the only thing I got aware of recently is that we are off by 1bp. Its on our todo to correct this and its coming from that VCF wants the one bp before the sequence from the reference.. dont know if thats the case here?
Fritz

@Mossy-Frog
Copy link
Author

Thank you for your reply!
I do not know how I could test this theory, I fear that changing directly the nucleotide position in the VCF might hurt more the VCF than anything else, do you have anything in mind ?

@wdecoster
Copy link
Contributor

I think another explanation that POS + SVLEN != END, which could lead to wrong sequences from the reference. What do you think @fritzsedlazeck ?

@fritzsedlazeck
Copy link
Owner

so for insertion this should not be.. the end is the position of the refernece. Ideally END=START+1, but that can deviate if we are uncertain about the location in a given region.
We are finally getting back to planing regular releases on sniffles and this is high on our todo list. So i dont see an obvious quick solution right now... but willl be fixed soon.
Fritz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants