Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline flag (?x) with comment doesn't work due to automatic wrapping of patterns inside non-capturing group #2677

Open
1 task done
learnbyexample opened this issue Dec 6, 2023 · 1 comment
Labels
bug A bug.

Comments

@learnbyexample
Copy link

learnbyexample commented Dec 6, 2023

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 14.0.3

How did you install ripgrep?

Using ripgrep_14.0.3-1_amd64.deb

What operating system are you using ripgrep on?

Ubuntu 20.04

Describe your bug.

Using (?x) to add comments isn't working due to automatic wrapping of the supplied pattern within (?:).

What are the steps to reproduce the behavior?

Any pattern which uses (?x) and a comment.

What is the actual behavior?

$ echo 'fig a#b 123' | rg -o '(?x)\d+ #extract digits'
regex parse error:
    (?:(?x)\d+ #extract digits)
    ^
error: unclosed group

What is the expected behavior?

I think automatic wrapping of patterns inside non-capturing group was introducted for this issue: #2480. If possible, don't modify the pattern when there's a comment. Or may be add a note in the documentation for this corner case.

FWIW, GNU grep doesn't support multiple patterns via -e or -f for PCRE:

$ echo 'fig a#b 123' | grep -oP '(?x)\d+ #extract digits'
123

$ echo 'MyText' | grep -P -e '(?i)notintext' -e 'text'
grep: the -P option only supports a single pattern
@BurntSushi BurntSushi added the bug A bug. label Dec 6, 2023
@BurntSushi
Copy link
Owner

There's really no way to get this completely right without individually validating each regex. I very specifically wanted to avoid doing that because of the extra cost it would add. With that said, now that I'm thinking about it more, it might be possible to do that with a fast path that skips the check in most cases. (For example, if a pattern doesn't contain a meta character anywhere, then you know it's valid. That would help when searching large dictionaries of literals.)

But yeah, when I added a fix for #2480, I knew it wouldn't work in all cases. Another example of this is rg ')('. That should be invalid, but it actually works.

With that said, one thing I realized here is that by adding the group around the regex, that group (which the user did not type) now appears in error messages. That's bad juju and I'll need to revert that for that reason alone. When I do that, I'll look into validating each pattern manually.

For PCRE2, we'll probably just need to accept a wontfix bug there. I don't see a good reason to completely disable multi-pattern matching there just for this reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

2 participants