New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[shared] [webassembly] pyexec_event_repl_process_char unable to understand unicode #14255
Comments
I think I can elaborate on what does not work/what is missing: The micropython-implemented "readline" functionality does not handle unicode characters at all (maybe this also explains why "raw" mode and "paste" mode works). You can see it in this implementation: After escape-sequences are handled, any non-ascii character is ignored (which is, how the first few chars of those emojis would look like): https://github.com/micropython/micropython/blob/master/shared/readline/readline.c#L279 I'm not sure, where the webassembly port exactly interfaces with the rest of micropython, because it seems like I cannot interact with readline, even though the call path that I traced seems to lead to readline (For example, it does not react to Ctrl-R, and Tab-completion, Ctrl-C seems to work, maybe thats implemented explicitly, and the other special characters are not input completely?) Probably extending that statement to parse multi-byte UTF-8 characters and handle them correctly, would solve this problem at its root. (I'm also interested in unicode input working in other instances of the micropython-REPL) |
I think this is a duplicate of #2789. |
apologies I wasn't sure it was strictly REPL related but @felixdoerre explained it well (with code) and @dpgeorge knows this since 6+ years ago (I was able to debug up to the pyexec_event_repl_process_char behind replProcessChar but no more). If you feel like closing it I will update related issues to point at that 2017 issue but I hope that handling at least UTF-8 without caring much about arrows and deletion would be a very welcomed first step: we can tell our users those are known limitations but we can't really tell our users "please just speak English or see surprises in your live/REPL code". Thanks for understanding and hopefully moving that old issue forward incrementally 🙏 P.S. for @felixdoerre it is possible the editor or the browser intercepts those ctrl+X chars without explicit preventDefault on all combinations so that's more on us than on the WASM REPL |
Checks
I agree to follow the MicroPython Code of Conduct to ensure a safe and respectful space for everyone.
I've searched for existing issues matching this bug, and didn't find any.
Port, board and/or hardware
webassembly, linux shell
MicroPython version
latest
Reproduction
Open a MicroPython REPL or visit this page (which is half patched, but not fully): https://webreflection.github.io/coincident/test/micropython.html
try to type in it the following:
on a native shell you'll see
python
instead ofµpython
, on the Web REPL you see even less because the count goes off duereplProcessChar
(even the Asyncify one) and this is the tip of the iceberg ... now try a combined emoji:... see emptiness or awkward results ...
Most emoji are indeed just broken out of the box unless you ask for these as an
input(...)
:Coincidentally, if you explicitly go into "REPL paste mode" (
\5
) you can past anything you like then get out (\4
) and see all code pasted had no issues in being processed, just like theinput(...)
case.Related PR that fixes at least the output side of affairs pyscript/pyscript#2018 but it cannot fix users' typing on the terminal somehow as
replProcessChar
misses chars in the process (and yes, it has nolinebuffer
but it's the same withlinebuffer
, the issue is within the code behindreplProcessChar
to me).Expected behaviour
if I type the following in the REPL I expect things to just work and output the correct result:
Observed behaviour
if I type the following in the REPL this happens instead:
Additional Information
Pinging @dpgeorge as I've done already in Discord but this looks and feels like a broader issue with REPL because it's possible to reproduce it via native Linux port.
The text was updated successfully, but these errors were encountered: