Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substring search #122

Open
MatasGds opened this issue May 19, 2022 · 0 comments
Open

Substring search #122

MatasGds opened this issue May 19, 2022 · 0 comments

Comments

@MatasGds
Copy link

Hi,

Thank you for this excellent library. So far, I have successfully implemented symspell in categorization algorithm. It works well and fast. I am looking for suggestions on how to improve my current algorithm for substring search:

I am using a list of keywords as a dictionary. The words that are misspelled or truncated are changed to the keywords, which determine the category of a string. For example 'salar for April', 'Life Insuranse' are changed to 'salary for April' and 'Life Insurance', respectfully, since 'salary' and 'insurance' are in the keywords list. However, some of the strings are not only misspelled, but also missing spaces or there are too many mistakes. So, 'salaryfor April', 'LifeInsurance' and 'salaryyyy' are not recognized and, therefore, cannot be categorized by the current solution. Using the whole vocabulary as a dictionary is not feasible. Instead, I want to find a way to implement substring search, which would help me to find strings that contain certain substrings such as 'salar', 'insuran', 'accommod' and so on.

Can symspell be utilized for substring search? Or maybe you have other suggestions on how to effectively implement this idea and combine it with symspell?

Thank you in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant