Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cases where unnecessary (?:) separator is added to pattern #179

Open
slevithan opened this issue Apr 17, 2017 · 2 comments
Open

Reduce cases where unnecessary (?:) separator is added to pattern #179

slevithan opened this issue Apr 17, 2017 · 2 comments

Comments

@slevithan
Copy link
Owner

slevithan commented Apr 17, 2017

Update the XRegExp constructor (more specifically the runTokens function) to pass preceding tokens to token handler functions. Alternatively they could be made available on the token handler context (like the existing this.captureNames and this.hasNamedCapture).

In addition to this being a generally useful feature that could enable more powerful syntax addons, this would enable much better handling of where (?:) separators are inserted when stripping whitespace (with flag x) and comments out of a pattern. Currently the getContextualTokenSeparator function is used for this, but it's not very robust. E.g. it avoids adding (?:) if the preceding character is (, but it doesn't deal with (?<name> and other cases where a separator isn't needed.

It would make things even easier if preceding tokens were annotated with a type string (provided as a new property on the options argument of XRegExp.addToken). E.g. from XRegExp('(?<name>\n.)', 'x') you could get:

[
  {type: 'named-capture-start', output: '('},
  {type: 'x-ignored', output: ''},
  {type: 'native-token', output: '.'},
  {type: 'native-token', output: ')'}
]

Then the getContextualTokenSeparator function can easily check whether the preceding token is something that requires the (?:) separator be added.

This above idea of making preceding tokens available to token handler functions would require special handling for the reparse option, since only the final reparsed version of token should be added to the list of prior tokens.

After this change:

  • Check whether the lines in build.js for handling (?:) are still needed.
  • Consider removing the cleanup of double (?:)(?:).
  • Upgrade to support at least (?<name> as an additional case when (?:) doesn't need to be added.

See related discussion in #164 (comment).

josephfrazier added a commit to josephfrazier/xregexp that referenced this issue May 1, 2017
Following up on slevithan#164, this
change prevents a `(?:)` from being inserted in the following places:

* At the beginning of a non-capturing group (the end is already handled)
* Before or after a `|`
* At the beginning or the end of the pattern

This solution isn't as complete as the one suggested in
slevithan#179, but it's a decent
stopgap.
@slevithan
Copy link
Owner Author

slevithan commented Jan 10, 2021

New diffs 076f950 and d78a262 cleaned up more cases where (?:) isn't needed. Specifically, the following are now covered:

  • Before a group (left of ().
  • After a group (right of )).
  • After the opening of a lookbehind (right of (?<= and (?<!).

@josephfrazier
Copy link
Collaborator

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants