Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow match and crash for single-error strings #65

Open
jquense opened this issue May 7, 2024 · 3 comments
Open

Slow match and crash for single-error strings #65

jquense opened this issue May 7, 2024 · 3 comments

Comments

@jquense
Copy link

jquense commented May 7, 2024

I am not sure what specifically is causing the regex to take such a long time to match, and the example is somewhat contrived, but this did happen to us when trying to get match info from a small list of google places. The problem in this case was easily avoided by using googles metadata but I can imagine that this can happen elsewhere:

https://codepen.io/jquense/pen/GRaKqwv

@leeoniya
Copy link
Owner

leeoniya commented May 7, 2024

this is something that has come up multiple times, but i'm not sure whether the lib should address this, or the user should.

for needles that look like copy-pasted things, just falling back to exact substring match is probably the way to go. there's already a similar guard behavior to prevent outOfOrder from exploding like this when you have > 5 terms.

maybe there should be a default threshold of 25 chars or something (whatever is reasonable to type by hand). generally i'm not a huge fan of search behavior changing silently, but it's probably better than the current situation.

@jquense
Copy link
Author

jquense commented May 16, 2024

I can appreciate that this is a hard thing to have a default for. Is the recommendation just to bail out of using ufuzzy in the case of long needles? Even if the fallback isn't built in it may be nice for ufuzzy to provide a mode that produces info and ranges for a simple substring token match as a convenience so that you can still pass stuff through the sort and highlight paths?

@leeoniya
Copy link
Owner

leeoniya commented May 18, 2024

Is the recommendation just to bail out of using ufuzzy in the case of long needles?

my recommendation is to use another uFuzzy instance configured with intraMode: 0, this way you get case-insensitive substring matching and highlighting.

however, uFuzzy has no way of making a term optional. they're all required. so even in intraMode: 0 your example would return 0 results since your needle includes , USA but the haystack does not. to get matching like this you need to use a library that builds an index and can account for term frequency / similarity (something like a trigram index); uFuzzy is just a clever regexp compiler :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants