UTS46 IdnaTestV2.txt: add 5 normalization corrections #687

markusicu · 2024-02-06T23:21:13Z

Add test cases for the five characters whose Decomposition_Mapping's were corrected in Unicode 4.0:

https://www.unicode.org/versions/corrigendum4.html
https://www.unicode.org/Public/UCD/latest/ucd/NormalizationCorrections.txt
“Normalization Changes (CJK Compatibility Characters)” in https://www.unicode.org/reports/tr46/#TableDerivationStep3

Include strings with both the actual characters and their Punycode forms. For example, test with both

\U0002F9BF.com
xn--8c3n.com

As @hsivonen found, for these five characters it makes a difference whether the UTS46 implementation leaves them in the input until normalization (as the spec says), or whether disallowed+mapping+normalization treats them like any other disallowed character (like ICU does).

The characters should be normalized to valid ones, while when they occur inside Punycode they are disallowed.

See https://util.unicode.org/UnicodeJsps/idna.jsp?a=%5CU0002F9BF.com%0D%0Axn--8c3n.com%0D%0Axn--gro.com

@macchiati @eggrobin

markusicu · 2024-05-08T00:00:48Z

I say that this is no longer necessary, since in Unicode 16 we no longer treat these 5 in any special way. They are just mapped consistent with their Decomposition_Mapping's.

hsivonen · 2024-05-08T07:34:09Z

I seems prudent to have tests for these characters given the history.

eggrobin · 2024-05-08T07:36:41Z

I tend to agree with @hsivonen here; for instance, an implementation could be special-casing them because that used to be needed, and we would want to catch that.

markusicu closed this as completed May 8, 2024

markusicu reopened this May 8, 2024

markusicu mentioned this issue May 22, 2024

idnatest with NormalizationCorrections #829

Merged

markusicu closed this as completed in #829 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTS46 IdnaTestV2.txt: add 5 normalization corrections #687

UTS46 IdnaTestV2.txt: add 5 normalization corrections #687

markusicu commented Feb 6, 2024

markusicu commented May 8, 2024

hsivonen commented May 8, 2024

eggrobin commented May 8, 2024

UTS46 IdnaTestV2.txt: add 5 normalization corrections #687

UTS46 IdnaTestV2.txt: add 5 normalization corrections #687

Comments

markusicu commented Feb 6, 2024

markusicu commented May 8, 2024

hsivonen commented May 8, 2024

eggrobin commented May 8, 2024