Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results from extractOne and extractTop #83

Open
eswarn24 opened this issue Aug 7, 2020 · 0 comments
Open

Inconsistent results from extractOne and extractTop #83

eswarn24 opened this issue Aug 7, 2020 · 0 comments

Comments

@eswarn24
Copy link

eswarn24 commented Aug 7, 2020

I could see different results are returned when using methods extractOne and extractTop on the same query string and collections.

I have a pretty long list of collection (15k Strings) to search for each query.

For Instance, let's say I have the following scenario
Query - ABC 1721
The collection has following strings in it
ABC1721
ABC1721-FGH/L9
ABC MERAKI Z1
EFGD3111/Z1-ABC
and many more

extractOne("ABC 1721", collection)
gives - ABC1721, Ratio - 95

extractTop("ABC 1721", collection,1)
gives - ABC1721, Ratio - 95

but the problem arose when I want the top 5 results
extractTop("ABC 1721", collection,5)
Match 1 - ABC1721-FGH/L9, Ratio - 86
Match 2 - ABC MERAKI Z1, Ratio - 86
Match 3 - EFGD3111/Z1-ABC, Ratio - 86
and so on

I tried using 'extractSorted' as well, it doesn't give consistent results as extractOne.

I used extractTop (for top 5) and extractOne for 1000+ queries. Around 70% of the 1st Match from extractTop doesn't match with the result of extractOne

BTW, I would like to appreciate your efforts on porting the python logic to Java without any performance lag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant