Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[editorial] Unconditionally use LookaheadAssertion in Assertion #3191

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nicolo-ribaudo
Copy link
Member

This PR tries to simplify the Annex B RegExp Assertion production, by removing the branching on the UnicodeMode parameter. The two branches were effectively the same, just with the UnicodeMode parameter hard-coded to either enabled or disabled depending on whether it was enabled or disabled.

Additionally, it renames QuantifiableAssertion to LookaheadAssertion since now it's also used in contexts where it cannot be followed by a quantifier (i.e. when UnicodeMode is enabled).

Note that [~UnicodeMode, +UnicodeSetsMode] can never happen, because of https://tc39.es/ecma262/#sec-parsepattern (and https://tc39.es/ecma262/#sec-parsepattern-annexb).

@@ -49014,7 +49014,7 @@ <h2>Syntax</h2>
[+UnicodeMode] Assertion[+UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to reduce branching in this production, but I'm not sure it increases clarity as it mixes the +UnicodeMode and ~UnicodeMode branches:

        Term[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] ::
          [~UnicodeMode] LookaheadAssertion[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] Quantifier
          Assertion[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]
          [~UnicodeMode] ExtendedAtom[?NamedCaptureGroups] Quantifier?
          [+UnicodeMode] Atom[+UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Quantifier?

It could probably be further simplified by merging ExtendedAtom with Atom and moving the branching on UnicodeMode inside it, but the refactor becomes larger:

        Term[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] ::
          [~UnicodeMode] LookaheadAssertion[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] Quantifier
          Assertion[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]
          Atom[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Quantifier?

        Atom[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] ::
          `.`
          `\` AtomEscape[?UnicodeMode, ?NamedCaptureGroups]
          [~UnicodeMode] `\` [lookahead == `c`]
          CharacterClass[?UnicodeMode, ?UnicodeSetsMode]
          `(` GroupSpecifier[?UnicodeMode]? Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
          `(?:` Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)`
          [~UnicodeMode] InvalidBracedQuantifier
          [~UnicodeMode] ExtendedPatternCharacter
          [+UnicodeMode] PatternCharacter

This is because now the production is used in cases where
it cannot be followed by a quantifier
@ljharb ljharb requested a review from a team December 6, 2023 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant