Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct Regular Expressions Behavior Related to Annex B #58320

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
28 changes: 19 additions & 9 deletions src/compiler/scanner.ts
Expand Up @@ -2638,6 +2638,10 @@ export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean
const digitsStart = pos;
scanDigits();
const min = tokenValue;
if (annexB && !min) {
isPreviousTermQuantifiable = true;
break;
}
if (text.charCodeAt(pos) === CharacterCodes.comma) {
pos++;
scanDigits();
Expand All @@ -2647,25 +2651,28 @@ export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean
error(Diagnostics.Incomplete_quantifier_Digit_expected, digitsStart, 0);
}
else {
if (unicodeMode) {
error(Diagnostics.Unexpected_0_Did_you_mean_to_escape_it_with_backslash, start, 1, String.fromCharCode(ch));
}
error(Diagnostics.Unexpected_0_Did_you_mean_to_escape_it_with_backslash, start, 1, String.fromCharCode(ch));
isPreviousTermQuantifiable = true;
break;
}
}
if (max && Number.parseInt(min) > Number.parseInt(max)) {
else if (max && Number.parseInt(min) > Number.parseInt(max) && (!annexB || text.charCodeAt(pos) === CharacterCodes.closeBrace)) {
error(Diagnostics.Numbers_out_of_order_in_quantifier, digitsStart, pos - digitsStart);
}
}
else if (!min) {
if (unicodeMode) {
if (!annexB) {
error(Diagnostics.Unexpected_0_Did_you_mean_to_escape_it_with_backslash, start, 1, String.fromCharCode(ch));
}
isPreviousTermQuantifiable = true;
break;
}
scanExpectedChar(CharacterCodes.closeBrace);
if (!annexB) {
Copy link
Member

@rbuckton rbuckton May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it may be redundant, I think it might be better to still indicate unicodeMode here so that someone editing this code in the future doesn't mistakenly think this only applies to non-Annex B code. It may be better to use unicodeMode || !annexB and remove the if (unicodeMode) { annexB = false; } at the top of scanRegularExpressionWorker.

The same would go for other uses of annexB as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you are really fine with a dozen of occurrences of unicodeMode || !annexB?

Copy link
Member

@rbuckton rbuckton May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but

const anyUnicodeModeOrNonAnnexB = unicodeMode || !annexB;

would work.

scanExpectedChar(CharacterCodes.closeBrace);
}
else if (text.charCodeAt(pos) === CharacterCodes.closeBrace) {
pos++;
}
pos--;
// falls through
case CharacterCodes.asterisk:
Expand Down Expand Up @@ -2707,7 +2714,7 @@ export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean
// Assume what starting from the character to be outside of the regex
return;
}
if (unicodeMode || ch === CharacterCodes.closeParen) {
if (!annexB || ch === CharacterCodes.closeParen) {
error(Diagnostics.Unexpected_0_Did_you_mean_to_escape_it_with_backslash, pos, 1, String.fromCharCode(ch));
}
pos++;
Expand Down Expand Up @@ -2767,7 +2774,10 @@ export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean
scanGroupName(/*isReference*/ true);
scanExpectedChar(CharacterCodes.greaterThan);
}
else if (unicodeMode) {
else {
// This actually is allowed in Annex B if there are no named capturing groups in the regex,
// but if we were going to slience these errors, we would have to record the positions of all '\k's
// and defer the errors until after the scanning to know if the regex has any named capturing groups.
error(Diagnostics.k_must_be_followed_by_a_capturing_group_name_enclosed_in_angle_brackets, pos - 2, 2);
}
break;
Expand Down Expand Up @@ -3390,7 +3400,7 @@ export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean
error(Diagnostics.Unicode_property_value_expressions_are_only_available_when_the_Unicode_u_flag_or_the_Unicode_Sets_v_flag_is_set, start, pos - start);
}
}
else if (unicodeMode) {
else if (!annexB) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Annex B, braces after p actually should not be parsed at all, but it does provide helpful errors like #58275 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really outdated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outdated label is just because the comment is on an old revision of the PR and GitHub can't figure out where the comment goes after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I could’ve removed the review and re-comment but finally chose not to.

error(Diagnostics._0_must_be_followed_by_a_Unicode_property_value_expression_enclosed_in_braces, pos - 2, 2, String.fromCharCode(ch));
}
return true;
Expand Down
Expand Up @@ -38,7 +38,6 @@ regularExpressionScanning.ts(15,59): error TS1515: Named capturing groups with t
regularExpressionScanning.ts(17,31): error TS1507: There is nothing available for repetition.
regularExpressionScanning.ts(17,32): error TS1506: Numbers out of order in quantifier.
regularExpressionScanning.ts(17,40): error TS1507: There is nothing available for repetition.
regularExpressionScanning.ts(17,61): error TS1505: Incomplete quantifier. Digit expected.
regularExpressionScanning.ts(19,12): error TS1517: Range out of order in character class.
regularExpressionScanning.ts(19,15): error TS1517: Range out of order in character class.
regularExpressionScanning.ts(19,28): error TS1517: Range out of order in character class.
Expand Down Expand Up @@ -203,7 +202,7 @@ regularExpressionScanning.ts(47,89): error TS1518: Anything that would possibly
regularExpressionScanning.ts(47,101): error TS1501: This regular expression flag is only available when targeting 'esnext' or later.


==== regularExpressionScanning.ts (203 errors) ====
==== regularExpressionScanning.ts (202 errors) ====
const regexes: RegExp[] = [
// Flags
/foo/visualstudiocode,
Expand Down Expand Up @@ -301,8 +300,6 @@ regularExpressionScanning.ts(47,101): error TS1501: This regular expression flag
!!! error TS1506: Numbers out of order in quantifier.
~~~~~~~~~
!!! error TS1507: There is nothing available for repetition.

!!! error TS1505: Incomplete quantifier. Digit expected.
// Character classes
/[-A-Za-z-z-aZ-A\d_-\d-.-.\r-\n\w-\W]/,
~~~
Expand Down
Expand Up @@ -39,7 +39,6 @@ regularExpressionScanning.ts(15,59): error TS1515: Named capturing groups with t
regularExpressionScanning.ts(17,31): error TS1507: There is nothing available for repetition.
regularExpressionScanning.ts(17,32): error TS1506: Numbers out of order in quantifier.
regularExpressionScanning.ts(17,40): error TS1507: There is nothing available for repetition.
regularExpressionScanning.ts(17,61): error TS1505: Incomplete quantifier. Digit expected.
regularExpressionScanning.ts(19,12): error TS1517: Range out of order in character class.
regularExpressionScanning.ts(19,15): error TS1517: Range out of order in character class.
regularExpressionScanning.ts(19,28): error TS1517: Range out of order in character class.
Expand Down Expand Up @@ -210,7 +209,7 @@ regularExpressionScanning.ts(47,89): error TS1518: Anything that would possibly
regularExpressionScanning.ts(47,101): error TS1501: This regular expression flag is only available when targeting 'esnext' or later.


==== regularExpressionScanning.ts (210 errors) ====
==== regularExpressionScanning.ts (209 errors) ====
const regexes: RegExp[] = [
// Flags
/foo/visualstudiocode,
Expand Down Expand Up @@ -310,8 +309,6 @@ regularExpressionScanning.ts(47,101): error TS1501: This regular expression flag
!!! error TS1506: Numbers out of order in quantifier.
~~~~~~~~~
!!! error TS1507: There is nothing available for repetition.

!!! error TS1505: Incomplete quantifier. Digit expected.
// Character classes
/[-A-Za-z-z-aZ-A\d_-\d-.-.\r-\n\w-\W]/,
~~~
Expand Down
Expand Up @@ -26,7 +26,6 @@ regularExpressionScanning.ts(15,59): error TS1515: Named capturing groups with t
regularExpressionScanning.ts(17,31): error TS1507: There is nothing available for repetition.
regularExpressionScanning.ts(17,32): error TS1506: Numbers out of order in quantifier.
regularExpressionScanning.ts(17,40): error TS1507: There is nothing available for repetition.
regularExpressionScanning.ts(17,61): error TS1505: Incomplete quantifier. Digit expected.
regularExpressionScanning.ts(19,12): error TS1517: Range out of order in character class.
regularExpressionScanning.ts(19,15): error TS1517: Range out of order in character class.
regularExpressionScanning.ts(19,28): error TS1517: Range out of order in character class.
Expand Down Expand Up @@ -177,7 +176,7 @@ regularExpressionScanning.ts(47,5): error TS1518: Anything that would possibly m
regularExpressionScanning.ts(47,89): error TS1518: Anything that would possibly match more than a single character is invalid inside a negated character class.


==== regularExpressionScanning.ts (177 errors) ====
==== regularExpressionScanning.ts (176 errors) ====
const regexes: RegExp[] = [
// Flags
/foo/visualstudiocode,
Expand Down Expand Up @@ -251,8 +250,6 @@ regularExpressionScanning.ts(47,89): error TS1518: Anything that would possibly
!!! error TS1506: Numbers out of order in quantifier.
~~~~~~~~~
!!! error TS1507: There is nothing available for repetition.

!!! error TS1505: Incomplete quantifier. Digit expected.
// Character classes
/[-A-Za-z-z-aZ-A\d_-\d-.-.\r-\n\w-\W]/,
~~~
Expand Down