-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use colander.Length() to validate emoji grapheme clusters #327
Comments
Have you tried a custom validator? |
I did, but the problem is that it is quite a hard problem to work with grapheme clusters in python (especially in python 2), which is why I would very much like it if the validation library knew what that is and could handle it. I found a python3 library grapheme (that I can't use) which seems like it could help, but still I think it would be really nice if I where able to express the fact that I would like an input have just a certain number of visible characters. Ideally that would also take care (and allow) stuff like this: Ȳ̶̧̙̺̪͕̰̬̹̫̟̫̥̺̓́̍͜͜͠e̸͇̽̊̇͆̐ä̸̛̠́̽̑̃̃̃̈́̐̏͘̕͜͠h̷̨̡̛̦̲̯̰̪̜̭͎̠̹̏̈́̌̉̽͌̌͜ ̷̨͈͚̬̮͈̦́͒̍͂͘ͅẗ̶̨̮̩̭̘͕̤͈̰̣͔̝͝h̶̭̹̘̰͚̬͖̗͐i̵̮͍͓̰̣̱͎̤͕̽̀s̸̜͐̽͛̅̀͑̎̅̕͠͝͠͠ ̵̢̧̘͇̱͇̠̝͚͔̱̙͔̀̀̀͗ͅi̶͉̱̐̿͗͂͋s̶̮̫̝͇͓̤̲̼̮̟̝̫̳̫̿̀́̍͂̋͌̽̂͊̈́͛̚͠ ̷̧̘̘̙̳̬̻̱͑̄̇̊̒͌͒t̴͕͙̜͕̦͚̥͉̳̿̿͑̓̈́͐͘h̸͓̬̱̙̎͊͛ͅę̶̧̢̼͇͈͖̘̼̜̠͊̍̊́̕ͅͅ ̶̧͓͖̥̗̝̤̜̣̣̘̓̍́̌̉̉̔̂̈́̽̓͗̀̕̚s̴̨͕̳͕̟͇̬̳͚͔̻̦̺̟͌̓ͅţ̵̡̧͍̙̺̳̪͇̟̝̫͚̺́ụ̵̹͔̝̩͊͌͐f̷̗̦̗̟͇̃̓̉f̵̨̡̨̪̗̯̩̞͇̞̞̫͔̏̈̏̈́́̑͗̃͋͘̕͘͠ͅ!̵̛̻͕̓̀̀͛͂̃̈͘ . |
@dwt, thanks for pointing this out! Also, you might be interested in Python bug 30717 which I think is related to this… |
That is indeed interesting, but the bug report looks dead, and I think it will take a long time for python to convert it's string handling around proper unicode grapheme support. People are still burnt from the unicode -> str transition. |
@dwt this is not something that we are likely to pick up and work on... especially for Python 2.7. If you are still deploying on Python 2.7 I would recommend you look at porting, and then you may take advantage of the grapheme library. Unless you can point me at documentation in the standard library that we can take advantage of, |
While doing a quick search I also found https://pypi.org/project/uniseg/ which is Python 2.7 compatible. |
yes, but unfortunately that library supports an old version of the unicode spec. :-( Ah well, I'd be perfectly happy if you had grapheme counting support on python3, as indeed I'll be switching soon. I still think that in this world a validation library for strings should know about the concept of grapheme clusters to allow people to enter one smiley in a field that requires a one to two character input. |
@dwt I will be happy to review patches/PR. I am not saying we won't ship or provide that, but it is not a use case that the core developers have and is not something that is easy to add because of the standard around it. |
I'm closing because Colander 2.0 dropped support for Python 2.7. |
Not sure this is the right object to start from, but since python has not so much support for unicode, maybe this is the right start?
Our use case is that we want to have an initials field where people can enter up to 2 characters to be rendered on their user icon.
Of course Emojis are a great choice for this, but frequently fail the length test, as they can be combined of many characters. E.g.
"🤔 🙈 me así, se 😌 ds 💕👭👙 hello 👩🏾🎓 emoji hello 👨👩👦👦 how are 😊 you today🙅🏽🙅🏽"
The problem with
colander.Length()
is that it is naive, in the sense that it only counts code-points, while we wold like it to count grapheme-clusters, to find how many characters would be rendered from that string.Does that make sense? Do you guys have a proposal how to handle that better?
The text was updated successfully, but these errors were encountered: