[editorial] Unconditionally use LookaheadAssertion in Assertion #3191

nicolo-ribaudo · 2023-10-10T03:14:56Z

This PR tries to simplify the Annex B RegExp Assertion production, by removing the branching on the UnicodeMode parameter. The two branches were effectively the same, just with the UnicodeMode parameter hard-coded to either enabled or disabled depending on whether it was enabled or disabled.

Additionally, it renames QuantifiableAssertion to LookaheadAssertion since now it's also used in contexts where it cannot be followed by a quantifier (i.e. when UnicodeMode is enabled).

Note that [~UnicodeMode, +UnicodeSetsMode] can never happen, because of https://tc39.es/ecma262/#sec-parsepattern (and https://tc39.es/ecma262/#sec-parsepattern-annexb).

nicolo-ribaudo · 2023-10-10T03:19:18Z

spec.html

@@ -49014,7 +49014,7 @@ <h2>Syntax</h2>
          [+UnicodeMode] Assertion[+UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]


It's also possible to reduce branching in this production, but I'm not sure it increases clarity as it mixes the +UnicodeMode and ~UnicodeMode branches:

Term[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: [~UnicodeMode] LookaheadAssertion[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] Quantifier Assertion[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] [~UnicodeMode] ExtendedAtom[?NamedCaptureGroups] Quantifier? [+UnicodeMode] Atom[+UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Quantifier?

It could probably be further simplified by merging ExtendedAtom with Atom and moving the branching on UnicodeMode inside it, but the refactor becomes larger:

Term[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: [~UnicodeMode] LookaheadAssertion[~UnicodeMode, ~UnicodeSetsMode, ?NamedCaptureGroups] Quantifier Assertion[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Atom[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] Quantifier? Atom[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups] :: `.` `\` AtomEscape[?UnicodeMode, ?NamedCaptureGroups] [~UnicodeMode] `\` [lookahead == `c`] CharacterClass[?UnicodeMode, ?UnicodeSetsMode] `(` GroupSpecifier[?UnicodeMode]? Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)` `(?:` Disjunction[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups] `)` [~UnicodeMode] InvalidBracedQuantifier [~UnicodeMode] ExtendedPatternCharacter [+UnicodeMode] PatternCharacter

This is because now the production is used in cases where it cannot be followed by a quantifier

nicolo-ribaudo commented Oct 10, 2023

View reviewed changes

nicolo-ribaudo added 2 commits October 10, 2023 12:24

Unconditionally use QuantifiableAssertion in Assertion

4157844

Rename QuantifiableAssertion to LookaheadAssertion

c17b51a

This is because now the production is used in cases where it cannot be followed by a quantifier

nicolo-ribaudo force-pushed the regexp-assertions-simplify branch from 4d4b299 to c17b51a Compare October 10, 2023 03:42

ljharb requested a review from a team December 6, 2023 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[editorial] Unconditionally use LookaheadAssertion in Assertion #3191

[editorial] Unconditionally use LookaheadAssertion in Assertion #3191

nicolo-ribaudo commented Oct 10, 2023

nicolo-ribaudo Oct 10, 2023

		@@ -49014,7 +49014,7 @@ <h2>Syntax</h2>
		[+UnicodeMode] Assertion[+UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

[editorial] Unconditionally use LookaheadAssertion in Assertion #3191

Are you sure you want to change the base?

[editorial] Unconditionally use LookaheadAssertion in Assertion #3191

Conversation

nicolo-ribaudo commented Oct 10, 2023

nicolo-ribaudo Oct 10, 2023

Choose a reason for hiding this comment