You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Include strings with both the actual characters and their Punycode forms. For example, test with both
\U0002F9BF.com
xn--8c3n.com
As @hsivonen found, for these five characters it makes a difference whether the UTS46 implementation leaves them in the input until normalization (as the spec says), or whether disallowed+mapping+normalization treats them like any other disallowed character (like ICU does).
The characters should be normalized to valid ones, while when they occur inside Punycode they are disallowed.
I say that this is no longer necessary, since in Unicode 16 we no longer treat these 5 in any special way. They are just mapped consistent with their Decomposition_Mapping's.
I tend to agree with @hsivonen here; for instance, an implementation could be special-casing them because that used to be needed, and we would want to catch that.
Add test cases for the five characters whose Decomposition_Mapping's were corrected in Unicode 4.0:
Include strings with both the actual characters and their Punycode forms. For example, test with both
As @hsivonen found, for these five characters it makes a difference whether the UTS46 implementation leaves them in the input until normalization (as the spec says), or whether disallowed+mapping+normalization treats them like any other disallowed character (like ICU does).
The characters should be normalized to valid ones, while when they occur inside Punycode they are disallowed.
See https://util.unicode.org/UnicodeJsps/idna.jsp?a=%5CU0002F9BF.com%0D%0Axn--8c3n.com%0D%0Axn--gro.com
@macchiati @eggrobin
The text was updated successfully, but these errors were encountered: