Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to specify characters that have to be converted to character class #48

Open
NightWatch0 opened this issue Sep 28, 2021 · 2 comments

Comments

@NightWatch0
Copy link

NightWatch0 commented Sep 28, 2021

First of all thank you for this great tool.
When using, I often need to convert text into more detailed character classes, not just non-digits or non-blank characters.
Is it possible to customize the range of characters to be converted into character classes, like [a-e\d], [①-⑨⒈-⒙] or specific languages such as Chinese and Japanese.
For example, if the source text is 我的名字是Tom, I hope to get the regular expression [\u{4e00}-\u{9fa5}]{5}\w{3} instead of \w{8}, by specifying character class [\u{4e00}-\u{9fa5}].
And I want to specify the maximum and minimum length of repeated substrings. Sometimes I get results like (\w{5}|\w{7,8}|\w{10,17}), but the regular expression I expected is (\w{3,20}). So I hope to be able to specify the minimum and maximum repetition times of the substring, or combine the repetition times into an interval instead of multiple branches.
I think these two points can be specified together, using multiple formats similar to \w{3,20} to specify characters that must be converted into character classes.

@pemistahl
Copy link
Owner

Thanks for your ideas @NightWatch0. I will think about how to improve grex even further and I will keep your suggestions in mind.

@suliveevil
Copy link

Awesome idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants