Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Near matches get lost with increasing values of max_l_dist #38

Open
davidefiocco opened this issue Dec 23, 2021 · 1 comment
Open

Near matches get lost with increasing values of max_l_dist #38

davidefiocco opened this issue Dec 23, 2021 · 1 comment

Comments

@davidefiocco
Copy link

davidefiocco commented Dec 23, 2021

To reproduce I am using fuzzysearch==0.7.3 and running

text = "foo bar spam eggs "
query = "four"

with max_l_dist=2 I get one match with

fuzzysearch.find_near_matches(query, text, max_l_dist=2)
[Match(start=0, end=4, dist=2, matched='foo ')]

with max_l_dist=3 I get the previous one with an additional one

fuzzysearch.find_near_matches(query, text, max_l_dist=3)
[Match(start=0, end=4, dist=2, matched='foo '),
 Match(start=6, end=7, dist=3, matched='r')]

but with max_l_dist=4 I fail to get previous ones.

fuzzysearch.find_near_matches(query, text, max_l_dist=4)
[Match(start=0, end=0, dist=4, matched=''),
 Match(start=1, end=1, dist=4, matched=''),
 Match(start=2, end=2, dist=4, matched=''),
 Match(start=3, end=3, dist=4, matched=''),
 Match(start=4, end=4, dist=4, matched=''),
 Match(start=5, end=5, dist=4, matched=''),
 Match(start=6, end=6, dist=4, matched=''),
 Match(start=7, end=7, dist=4, matched=''),
 Match(start=8, end=8, dist=4, matched=''),
 Match(start=9, end=9, dist=4, matched=''),
 Match(start=10, end=10, dist=4, matched=''),
 Match(start=11, end=11, dist=4, matched=''),
 Match(start=12, end=12, dist=4, matched=''),
 Match(start=13, end=13, dist=4, matched=''),
 Match(start=14, end=14, dist=4, matched=''),
 Match(start=15, end=15, dist=4, matched=''),
 Match(start=16, end=16, dist=4, matched=''),
 Match(start=17, end=17, dist=4, matched=''),
 Match(start=18, end=18, dist=4, matched='')]

Is this intended behaviour?

@taleinat
Copy link
Owner

taleinat commented Mar 9, 2022

Hi @davidefiocco, apologies for the late response.

Yes, this is currently the intended behavior.

The reason is that once the maximum distance is equal to (or greater than) the length of what you're searching for (query in your example), even an empty string is a valid match.

However, looking at your example, I can see that this behavior isn't great: There are matches with a lower distance in the text, but these are no longer returned when the max. distance is too large.

I'll think about how this can be improved without complicating things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants