Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use colander.Length() to validate emoji grapheme clusters #327

Closed
dwt opened this issue Feb 18, 2019 · 9 comments
Closed

Use colander.Length() to validate emoji grapheme clusters #327

dwt opened this issue Feb 18, 2019 · 9 comments

Comments

@dwt
Copy link
Contributor

dwt commented Feb 18, 2019

Not sure this is the right object to start from, but since python has not so much support for unicode, maybe this is the right start?

Our use case is that we want to have an initials field where people can enter up to 2 characters to be rendered on their user icon.

Of course Emojis are a great choice for this, but frequently fail the length test, as they can be combined of many characters. E.g. "🤔 🙈 me así, se 😌 ds 💕👭👙 hello 👩🏾‍🎓 emoji hello 👨‍👩‍👦‍👦 how are 😊 you today🙅🏽🙅🏽"

The problem with colander.Length() is that it is naive, in the sense that it only counts code-points, while we wold like it to count grapheme-clusters, to find how many characters would be rendered from that string.

Does that make sense? Do you guys have a proposal how to handle that better?

@stevepiercy
Copy link
Member

Have you tried a custom validator?

@dwt
Copy link
Contributor Author

dwt commented Feb 19, 2019

I did, but the problem is that it is quite a hard problem to work with grapheme clusters in python (especially in python 2), which is why I would very much like it if the validation library knew what that is and could handle it.

I found a python3 library grapheme (that I can't use) which seems like it could help, but still I think it would be really nice if I where able to express the fact that I would like an input have just a certain number of visible characters.

Ideally that would also take care (and allow) stuff like this:

Ȳ̶̧̙̺̪͕̰̬̹̫̟̫̥̺̓́̍͜͜͠e̸͇̽̊̇͆̐ä̸̛̠́̽̑̃̃̃̈́̐̏͘̕͜͠h̷̨̡̛̦̲̯̰̪̜̭͎̠̹̏̈́̌̉̽͌̌͜ ̷̨͈͚̬̮͈̦́͒̍͂͘ͅẗ̶̨̮̩̭̘͕̤͈̰̣͔̝͝h̶̭̹̘̰͚̬͖̗͐i̵̮͍͓̰̣̱͎̤͕̽̀s̸̜͐̽͛̅̀͑̎̅̕͠͝͠͠ ̵̢̧̘͇̱͇̠̝͚͔̱̙͔̀̀̀͗ͅi̶͉̱̐̿͗͂͋s̶̮̫̝͇͓̤̲̼̮̟̝̫̳̫̿̀́̍͂̋͌̽̂͊̈́͛̚͠ ̷̧̘̘̙̳̬̻̱͑̄̇̊̒͌͒t̴͕͙̜͕̦͚̥͉̳̿̿͑̓̈́͐͘h̸͓̬̱̙̎͊͛ͅę̶̧̢̼͇͈͖̘̼̜̠͊̍̊́̕ͅͅ ̶̧͓͖̥̗̝̤̜̣̣̘̓̍́̌̉̉̔̂̈́̽̓͗̀̕̚s̴̨͕̳͕̟͇̬̳͚͔̻̦̺̟͌̓ͅţ̵̡̧͍̙̺̳̪͇̟̝̫͚̺́ụ̵̹͔̝̩͊͌͐f̷̗̦̗̟͇̃̓̉f̵̨̡̨̪̗̯̩̞͇̞̞̫͔̏̈̏̈́́̑͗̃͋͘̕͘͠ͅ!̵̛̻͕̓̀̀͛͂̃̈͘

.

@jenstroeger
Copy link
Contributor

jenstroeger commented Feb 19, 2019

@dwt, thanks for pointing this out! Also, you might be interested in Python bug 30717 which I think is related to this…

@dwt
Copy link
Contributor Author

dwt commented Feb 19, 2019

That is indeed interesting, but the bug report looks dead, and I think it will take a long time for python to convert it's string handling around proper unicode grapheme support. People are still burnt from the unicode -> str transition.

@digitalresistor
Copy link
Member

@dwt this is not something that we are likely to pick up and work on... especially for Python 2.7.

If you are still deploying on Python 2.7 I would recommend you look at porting, and then you may take advantage of the grapheme library.

Unless you can point me at documentation in the standard library that we can take advantage of, colander.Length() will continue to stay naive and a custom validator using an external third-party library will be your best bet.

@digitalresistor
Copy link
Member

While doing a quick search I also found https://pypi.org/project/uniseg/ which is Python 2.7 compatible.

@dwt
Copy link
Contributor Author

dwt commented Feb 20, 2019

yes, but unfortunately that library supports an old version of the unicode spec. :-(

Ah well, I'd be perfectly happy if you had grapheme counting support on python3, as indeed I'll be switching soon.

I still think that in this world a validation library for strings should know about the concept of grapheme clusters to allow people to enter one smiley in a field that requires a one to two character input.

@digitalresistor
Copy link
Member

@dwt I will be happy to review patches/PR. I am not saying we won't ship or provide that, but it is not a use case that the core developers have and is not something that is easy to add because of the standard around it.

@stevepiercy stevepiercy changed the title Use colander.Lenth() to validate emoji grapheme clusters Use colander.Length() to validate emoji grapheme clusters Aug 6, 2020
@tseaver
Copy link
Member

tseaver commented May 20, 2024

I'm closing because Colander 2.0 dropped support for Python 2.7.

@tseaver tseaver closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants