Add spec text for RegExp Modifiers #3221

rbuckton · 2023-11-15T21:52:40Z

This adds the specification text for the Stage 3 RegExp Modifiers proposal.

Test262 tests can be found at tc39/test262#3960.

jmdyck · 2023-11-16T02:05:13Z

spec.html

+        <emu-grammar>Atom :: `(?` RegularExpressionFlags `:` Disjunction `)`</emu-grammar>
+        <ul>
+          <li>
+            It is a Syntax Error if the source text matched by |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or if it contains the same code point more than once.


Suggested change

It is a Syntax Error if the source text matched by |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or if it contains the same code point more than once.

It is a Syntax Error if the source text matched by |RegularExpressionFlags| contains any code point other than `i`, `m`, or `s`, or contains the same code point more than once.

(for consistency with the others)

michaelficarra · 2023-12-15T21:25:56Z

spec.html

+            1. If _add_ contains *"i"*, set _ignoreCase_ to *true*.
+            1. If _add_ contains *"m"*, set _multiline_ to *true*.
+            1. If _add_ contains *"s"*, set _dotAll_ to *true*.
+            1. If _remove_ contains *"i"*, set _ignoreCase_ to *false*.
+            1. If _remove_ contains *"m"*, set _multiline_ to *false*.
+            1. If _remove_ contains *"s"*, set _dotAll_ to *false*.


Suggested change

1. If _add_ contains *"i"*, set _ignoreCase_ to *true*.

1. If _add_ contains *"m"*, set _multiline_ to *true*.

1. If _add_ contains *"s"*, set _dotAll_ to *true*.

1. If _remove_ contains *"i"*, set _ignoreCase_ to *false*.

1. If _remove_ contains *"m"*, set _multiline_ to *false*.

1. If _remove_ contains *"s"*, set _dotAll_ to *false*.

1. If _add_ contains *"i"*, set _ignoreCase_ to *true*.

1. Else if _remove_ contains *"i"*, set _ignoreCase_ to *false*.

1. If _add_ contains *"m"*, set _multiline_ to *true*.

1. Else if _remove_ contains *"m"*, set _multiline_ to *false*.

1. If _add_ contains *"s"*, set _dotAll_ to *true*.

1. Else if _remove_ contains *"s"*, set _dotAll_ to *false*.

Use else-if to further emphasise that add/remove are disjoint.

IMO, this doesn't indicate disjointedness any more than the independent Ifs do. Also, this suggestion prioritizes modifiers in add, while the existing text prioritizes modifiers in remove. Existing implementations in other languages that don't error on (?i-i:), such as C#/.NET, prioritize removal (i.e., set and then unset). While the consensus was to be more restrictive and issue an error, I'd still prefer we still maintain remove priority in the specification text for the sake of consistency.

If you would still prefer the "Else if", I would prefer to rewrite it as follows:

Suggested change

1. If _add_ contains *"i"*, set _ignoreCase_ to *true*.

1. If _add_ contains *"m"*, set _multiline_ to *true*.

1. If _add_ contains *"s"*, set _dotAll_ to *true*.

1. If _remove_ contains *"i"*, set _ignoreCase_ to *false*.

1. If _remove_ contains *"m"*, set _multiline_ to *false*.

1. If _remove_ contains *"s"*, set _dotAll_ to *false*.

1. If _remove_ contains *"i"*, set _ignoreCase_ to *false*.

1. Else if _add_ contains *"i"*, set _ignoreCase_ to *true*.

1. If _remove_ contains *"m"*, set _multiline_ to *false*.

1. Else if _add_ contains *"m"*, set _multiline_ to *true*.

1. If _remove_ contains *"s"*, set _dotAll_ to *false*.

1. Else if _add_ contains *"s"*, set _dotAll_ to *true*.

Such that remove continues to take precedence.

That's fine too.

michaelficarra · 2023-12-15T21:27:31Z

spec.html

+          <dl class="header">
+          </dl>
+          <emu-alg>
+            1. Let _ignoreCase_ be _rer_.[[IgnoreCase]].


Suggested change

1. Let _ignoreCase_ be _rer_.[[IgnoreCase]].

1. Assert: _add_ and _remove_ have no elements in common.

1. Let _ignoreCase_ be _rer_.[[IgnoreCase]].

michaelficarra

Is there a reason we use the RegularExpressionFlags production and restrict it with an early error instead of introducing a new, more restricted production?

rbuckton · 2023-12-15T22:08:18Z

Is there a reason we use the RegularExpressionFlags production and restrict it with an early error instead of introducing a new, more restricted production?

Is there a reason not to? Modifiers are a strict subset of RegularExpressionFlags, and RegularExpressionFlags itself is not restricted in the grammar, only via semantics. If I were to create a RegularExpressionModifiers production, it would have the same definition as RegularExpressionFlags.

michaelficarra · 2023-12-16T00:47:25Z

I believe RegularExpressionFlags is not restricted in the grammar because RegularExpressionLiteral needs to not change when we add new flags. This is due to the overlap with division. We don't have that same constraint with the modifiers grammar though, so it should be able to be done with grammar restrictions and not early errors.

rbuckton · 2023-12-16T01:16:18Z

I believe RegularExpressionFlags is not restricted in the grammar because RegularExpressionLiteral needs to not change when we add new flags. This is due to the overlap with division. We don't have that same constraint with the modifiers grammar though, so it should be able to be done with grammar restrictions and not early errors.

How would you propose it be written so as not to require early errors? Modifiers can appear in any order, cannot be duplicated within one or both of the modifier locations, and I do plan to extend the set of allowed modifiers over time with things like x-mode, so it needs to be fairly flexible. Even if we limit it to a subset of characters like i, m, and s for now, we either need an early error for duplicates, or a complex grammar like:

RegularExpressionModifiers:
  RegularExpressionModifierChars[+IgnoreCase, +Multiline, +DotAll]

RegularExpressionModifierChars[IgnoreCase, Multiline, DotAll]:
  [empty]
  [+IgnoreCase] `i` RegularExpressionModifierChars[~IgnoreCase, ?Multiline, ?DotAll]?
  [+Multiline] `m` RegularExpressionModifierChars[?IgnoreCase, ~Multiline, ?DotAll]?
  [+DotAll] `s` RegularExpressionModifierChars[?IgnoreCase, ?Multiline, ~DotAll]?

The more modifiers we add, the more production parameters we need, and the more complex the production becomes. Plus, this doesn't avoid the need for an early error to forbid (?i-i:).

michaelficarra · 2023-12-19T18:12:00Z

@rbuckton I'm not saying we can't use any early errors at all on the production, just that we don't need to use early errors to enforce the allowed flag characters when an alternative grammar could do the job. I'm happy to continue using early errors to check for duplicates.

Add spec text for RegExp Modifiers

7f39d7b

ljharb marked this pull request as draft November 16, 2023 00:28

jmdyck reviewed Nov 16, 2023

View reviewed changes

bakkot approved these changes Nov 16, 2023

View reviewed changes

michaelficarra reviewed Dec 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add spec text for RegExp Modifiers #3221

Add spec text for RegExp Modifiers #3221

rbuckton commented Nov 15, 2023

jmdyck Nov 16, 2023

michaelficarra Dec 15, 2023

rbuckton Dec 15, 2023

rbuckton Dec 15, 2023

michaelficarra Dec 16, 2023

michaelficarra Dec 15, 2023

michaelficarra left a comment

rbuckton commented Dec 15, 2023

michaelficarra commented Dec 16, 2023

rbuckton commented Dec 16, 2023

michaelficarra commented Dec 19, 2023

	It is a Syntax Error if the source text matched by \|RegularExpressionFlags\| contains any code point other than `i`, `m`, or `s`, or if it contains the same code point more than once.
	It is a Syntax Error if the source text matched by \|RegularExpressionFlags\| contains any code point other than `i`, `m`, or `s`, or contains the same code point more than once.

	1. Let _ignoreCase_ be _rer_.[[IgnoreCase]].
	1. Assert: _add_ and _remove_ have no elements in common.
	1. Let _ignoreCase_ be _rer_.[[IgnoreCase]].

Add spec text for RegExp Modifiers #3221

Are you sure you want to change the base?

Add spec text for RegExp Modifiers #3221

Conversation

rbuckton commented Nov 15, 2023

jmdyck Nov 16, 2023

Choose a reason for hiding this comment

michaelficarra Dec 15, 2023

Choose a reason for hiding this comment

rbuckton Dec 15, 2023

Choose a reason for hiding this comment

rbuckton Dec 15, 2023

Choose a reason for hiding this comment

michaelficarra Dec 16, 2023

Choose a reason for hiding this comment

michaelficarra Dec 15, 2023

Choose a reason for hiding this comment

michaelficarra left a comment

Choose a reason for hiding this comment

rbuckton commented Dec 15, 2023

michaelficarra commented Dec 16, 2023

rbuckton commented Dec 16, 2023

michaelficarra commented Dec 19, 2023