Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior with only insertions allowed #29

Open
artemuk opened this issue Apr 22, 2020 · 2 comments
Open

Unexpected behavior with only insertions allowed #29

artemuk opened this issue Apr 22, 2020 · 2 comments

Comments

@artemuk
Copy link

artemuk commented Apr 22, 2020

Hi! I'm using fuzzysearch v0.7.1 and I got a bit unexpected results when only insertions are allowed.
Here is the code:

matches = find_near_matches('12345', '123aa', max_deletions=0, max_substitutions=0, max_insertions=3)
print(matches)

Expected output:

[Match(start=0, end=3, dist=2, matched='123')]

I expected that substring '123' can be matched since we can insert '4' and '5' after it

Actual output

[]

But actually it doesn't return any matches.

Please tell me, am I missing something? Is this an intended behavior or a bug?
Thank you in advance!

OS: Ubuntu 19.04
Setup: Python 3.7 (miniconda), fuzzysearch 0.7.1

@taleinat
Copy link
Owner

taleinat commented Apr 22, 2020

Hi @artemuk,

Thank you for such a clear and thorough description of the issue!

fuzzysearch uses the opposite meaning of what insertion/deletion means: It defines them relative to the searched sub-string. In your case, '12345' can be turned into '123' using 2 deletions, so allowing deletions rather than insertions gives the expected result:

>>> find_near_matches('12345', '123aa', max_deletions=3, max_substitutions=0, max_insertions=0)
[Match(start=0, end=3, dist=2, matched='123')]

In other words, find_near_matches() finds parts of the searched sequence, which can be arrived at from the searched string (e.g. '12345') by making the manipulations allowed by the search parameters.

The docs currently don't mention this at all, unfortunately, and they certainly should! I'll add something about this to the docs soon.

@artemuk
Copy link
Author

artemuk commented Apr 23, 2020

Hi @taleinat,

Thank you very much for the explanation! It makes perfect sense now. I should've figured it out myself since it's actually mentioned in the docstring (now I see it :)) But yea, a bit more explicit mention of it will definitely do good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants