Slow match and crash for single-error strings #65

jquense · 2024-05-07T12:43:33Z

I am not sure what specifically is causing the regex to take such a long time to match, and the example is somewhat contrived, but this did happen to us when trying to get match info from a small list of google places. The problem in this case was easily avoided by using googles metadata but I can imagine that this can happen elsewhere:

https://codepen.io/jquense/pen/GRaKqwv

leeoniya · 2024-05-07T13:12:54Z

this is something that has come up multiple times, but i'm not sure whether the lib should address this, or the user should.

for needles that look like copy-pasted things, just falling back to exact substring match is probably the way to go. there's already a similar guard behavior to prevent outOfOrder from exploding like this when you have > 5 terms.

maybe there should be a default threshold of 25 chars or something (whatever is reasonable to type by hand). generally i'm not a huge fan of search behavior changing silently, but it's probably better than the current situation.

jquense · 2024-05-16T13:34:04Z

I can appreciate that this is a hard thing to have a default for. Is the recommendation just to bail out of using ufuzzy in the case of long needles? Even if the fallback isn't built in it may be nice for ufuzzy to provide a mode that produces info and ranges for a simple substring token match as a convenience so that you can still pass stuff through the sort and highlight paths?

leeoniya · 2024-05-18T00:39:36Z

Is the recommendation just to bail out of using ufuzzy in the case of long needles?

my recommendation is to use another uFuzzy instance configured with intraMode: 0, this way you get case-insensitive substring matching and highlighting.

however, uFuzzy has no way of making a term optional. they're all required. so even in intraMode: 0 your example would return 0 results since your needle includes , USA but the haystack does not. to get matching like this you need to use a library that builds an index and can account for term frequency / similarity (something like a trigram index); uFuzzy is just a clever regexp compiler :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow match and crash for single-error strings #65

Slow match and crash for single-error strings #65

jquense commented May 7, 2024

leeoniya commented May 7, 2024 •

edited

jquense commented May 16, 2024

leeoniya commented May 18, 2024 •

edited

Slow match and crash for single-error strings #65

Slow match and crash for single-error strings #65

Comments

jquense commented May 7, 2024

leeoniya commented May 7, 2024 • edited

jquense commented May 16, 2024

leeoniya commented May 18, 2024 • edited

leeoniya commented May 7, 2024 •

edited

leeoniya commented May 18, 2024 •

edited