Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bytes instead of strings, ditch fancy_regex for regex crate #8

Closed
SkeletalDemise opened this issue Jul 21, 2022 · 1 comment · Fixed by #9
Closed

Use bytes instead of strings, ditch fancy_regex for regex crate #8

SkeletalDemise opened this issue Jul 21, 2022 · 1 comment · Fixed by #9

Comments

@SkeletalDemise
Copy link
Contributor

Currently lemmeknow uses the fancy_regex crate for matching regex. The problem is that it doesn't support bytes. The regex crate, however does: https://docs.rs/regex/1.0.0/regex/bytes/index.html

If there is no reason to use fancy_regex then we should switch. Both pyWhat and lemmeknow only support ASCII strings. We need to support UTF-8, UTF-16, etc. and bytes.

See this equivalent pyWhat issue: bee-san/pyWhat#34

@swanandx
Copy link
Owner

regex crate does not support look-around, so if we try to use it, we are not able to compile the regular expressions from regexs.json file.

image

That is why we use fancy-regex.

lemmeknow do support UTF-8 as the Strings are UTF-8 in Rust. ( and we also tested with UTF-16 right? ).

@swanandx swanandx linked a pull request Aug 12, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants