Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add threshold (fuzzy match) parameters in match_text #460

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sc-stbt
Copy link

@sc-stbt sc-stbt commented Oct 15, 2017

I added a d fuzzy_match based on difflib to tolerate wrong matching specially in special character.
I add unidecode on _hocr_find_phrase function.

Youssef TRIKI added 2 commits October 15, 2017 19:40
Sometime match_text did not find text, we added a threshold to be more flexible.
it is reproduced on special caracters. we based on fuzzy match.
I added unidecode lib, may be there is some impact.
@drothlis
Copy link
Contributor

_stbt/core.py:2895:4: [E0401(import-error), _hocr_find_phrase] Unable to import 'unidecode'

The unit tests fail because you've added a new dependency unidecode.

What problem does this new dependency solve? Would it be possible to add the fuzzy_match functionality without adding a new dependency?

@sc-stbt
Copy link
Author

sc-stbt commented Nov 6, 2017

Sorry for the delay, I was busy with other urgent task.
We used unidecode lib to avoid fail due to some special character, for instance (ç will be read as 5). to go ahead on our test we passed unidecode word to be matched. but of course we can skip this lib.

@drothlis drothlis changed the title Add threshold parameters in match_text Add threshold (fuzzy match) parameters in match_text Jun 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants