Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characters.operator == should document that it doesn't compare normalized forms #76

Open
jamesderlin opened this issue Mar 23, 2023 · 1 comment
Labels
type-documentation A request to add or improve documentation

Comments

@jamesderlin
Copy link

I expected that Characters.operator == would compare normalized forms, but it doesn't. (See https://stackoverflow.com/q/64094438/.)

If it intentionally doesn't, it would be nice if the operator == documentation explicitly stated that (and ideally recommended what people should do to normalize Unicode strings instead).

@lrhn
Copy link
Member

lrhn commented Mar 23, 2023

This package does exactly one thing: Grapheme cluster segmentation in the default locale.

The documentation for == definitely needs fixing (what's it even saying?), but the fix will be to say that characters are equal if their underlying strings are equal, which means containing the same sequence of UTF-16 code units.
(Or, what it tries to say now, that the Characters iterable values contain the same sequence of grapheme cluster substrings, which amounts to the same thing.)

@lrhn lrhn added the type-documentation A request to add or improve documentation label Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-documentation A request to add or improve documentation
Projects
None yet
Development

No branches or pull requests

2 participants