Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to implement time limit for find_near_matches() #44

Open
levitation opened this issue Jul 26, 2023 · 3 comments
Open

Need to implement time limit for find_near_matches() #44

levitation opened this issue Jul 26, 2023 · 3 comments

Comments

@levitation
Copy link

Need to implement time limit for find_near_matches() else the process may just suddenly hang with certain values of max_l_dist and not even debugger can pause it.

@levitation
Copy link
Author

levitation commented Jul 27, 2023

Currently I am using my own time limit handler. You can find the code here https://github.com/levitation-opensource/Manipulative-Expression-Recognition/blob/main/TimeLimit.py

But having a built-in time limit argument in the find_near_matches() function would make the behaviour of the function less surprising and thus error prone. People might otherwise not foresee that the find_near_matches() function may go into a long running loop / result in a hang.

The hang appears very suddenly. For example, on the data I was testing on, max_l_dist = 21 returned just as quickly as with smaller max_l_dist values, but max_l_dist = 22 computed indefinitely.

@holub008
Copy link

For any application taking variable inputs (in the searched text or pattern, both in my case), it's essential to be able to give up and move on for "bad" inputs where there are exponentially many close matches per edit distance parameters.

I'm not terribly python proficient, but it seems challenging to implement a timeout in user space (hard to interrupt C callouts, platform dependent, thread safety, etc.). It would be awesome if this library could internally handle this! I'd be happy to take this on if there's interest from maintainers.

@taleinat
Copy link
Owner

taleinat commented Dec 11, 2023

Hi, maintainer here. You're welcome to give this a shot!

The issue is that this codebase currently includes quite a few different search functions, and find_near_matches() chooses the most appropriate among them. The implementation methods very a lot.

I highly recommend starting with one or two or the search methods, getting something working, and getting my feedback on it before progressing further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants