New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hits missed when clustering or searching with short sequences #328
Comments
Hello, @torognes , Has there been any update on if the underlying cause of these missed hits? I am currently attempting to search for short sequences (which are primers < 25 bp) in inverted repeat regions which can be interspersed in regions up to 7 kbp. I have attached (in a zip file) a smaller example of trying to search for a primer ( Unfortunately, even when I set Thanks in advance! |
Thanks for your detailed comment. Unfortunately there has not been any changes to this in vsearch lately. I hope to get time to look into it when back from vacation. |
I've now looked closer at your example. In this case vsearch will not find the second match simply because vsearch will never report more than one match in each database sequence, unless they are on different strands. The program is simply not designed to do that. |
Yes, vsearch performs (semi)-global pairwise alignments (Needleman–Wunsch), not local pairwise alignments (Smith–Waterman).
Tests covering that assertion have been added to our test suite frederic-mahe/vsearch-tests@e256e9d |
When clustering or searching with short sequences, obvious hits may be missed. This is also a problem when much of the sequences are masked. It is probably due to few distinct unmasked k-mers in the sequences and the required minimum number of shared k-mers (12, set by the
--minwordmatches
option). These heuristics may need to be tuned to work better in these cases.See also a VSEARCH Forum post where this issue was described.
The text was updated successfully, but these errors were encountered: