Fix MicroPython badly handling unicode chars #2018

WebReflection · 2024-04-04T19:10:22Z

Description

Ready to land: this MR is the best we can do to offer unicode sequences in MicroPython.

The underlying issue is likely bigger than just replProcessChar exposed API and it should be tackled upstream but meanwhile we can at least explain that typing emoji or non ASCII chars might fail but that having these already in the output or the external file or asking for any input works.

Accordingly, if anyone would like to review it I think this should land sooner than later.

P.S. this has been already published as latest npm package.

This MR tries to tackle the fact that MiroPython doesn't really like unicode chars.

The irony lands in a simple print("µpython") neither the Linux REPL nor the PyScript one are able to handle in any meaningful way:

on the Linux REPL native shell it just ignores that µ
on the Web (WASM) PyScript port it forwards unicode sequences only if I handle these manually
a simple test = input("unicode? ") can accept µpython because that's just returned as '\xb5python' string from Readline and one can use and print(test) after without any issue ... however ...
if the entry is manually typed as in test = "µpython" something weird happens because replProcessChar(byte) passes along the whole string but that char gets ignored/discarded and the test reference will contain only "python" without the µ

I am pinging @dpgeorge here because I don't really think this MR should ever land or better, if it has to, I need replProcessChar to not mess up with code points because otherwise the input length in bytes mismatches the actual output produced by replProcessChar via stdout and this is looking pretty ugly to me or surely hard to explain.

/cc @ntoll too as this might be the last thing to solve before having a terminal parity with Pyodide which suffers none of these issues.

Update if I enter explicitly in paste mode (modified version of the terminal) and I type test = "µpython" and then I get out of paste mode, the test reference rightly points at the right thing ... now I wonder why the paste mode is not the default or why that would work while normal REPL mode wouldn't without the past mode ... I could maybe circumvent this but it would feel super awkward to \5 and \4 the terminal around users' inputs to simply be able to let them ... well, input whatever they want ... I couldn't find the culprit in MicroPython repository but I suggest to make the REPL "past mode" by defaut as that works!

Update

I have created a module that might help dealing with these cases in the future and it code covered everything so that I have spot an issue with current code but I'll fix that tomorrow: https://github.com/WebReflection/buffer-points

Changes

fixed an issue with the py-editor related to the new linebuffer directive
provide in worker hook scope a simple callback that pre-buffers unicode sequences accordingly to the standard so that the buffer is sent to the terminal only once those sequences are fulfilled
test with both µ and way more convoluted sequences such as 👩‍❤️‍👨 that the output, if either requested as input or already evaluated from the page works ... in latter case test = "👩‍❤️‍👨" completely messes up the program and the resulting string is empty

Checklist

create a function that accumulate chars and eventually write whole sequences in the terminal as opposite of "vomiting" one char after the other which results in unreadable output
test simple to convoluted unicode cases work

All tests pass locally
I have updated CHANGELOG.md
I have created documentation for this(if applicable)

for more information, see https://pre-commit.ci

WebReflection · 2024-04-09T12:50:15Z

As usual, thank you @FabioRosado 🙏

To whom it might concern, this is published on npm already and it fixed most issues around the linebuffer false used now in MicroPython, it fixes the output in the terminal, it fixes a regression with the PyEditor, it just misses the ability for users to directly type, without asking explicitly for an input("...") unicode chars.

The original issue is still open from January 2017 micropython/micropython#2789

We are going to try to help fixing the underlying gotcha as we can but this needs to land due broken state of current MicroPython interpreter with the latest package published on npm.

WebReflection force-pushed the micropython-vs-unicode branch 4 times, most recently from edc35ff to 4c86c83 Compare April 5, 2024 05:28

WebReflection marked this pull request as ready for review April 5, 2024 09:35

WebReflection force-pushed the micropython-vs-unicode branch from 8de3c20 to 6ac0878 Compare April 5, 2024 09:37

WebReflection mentioned this pull request Apr 5, 2024

[shared] [webassembly] pyexec_event_repl_process_char unable to understand unicode micropython/micropython#14255

Open

2 tasks

WebReflection and others added 2 commits April 9, 2024 14:27

Fix MicroPython badly handling unicode chars

19db460

[pre-commit.ci] auto fixes from pre-commit.com hooks

7839d77

for more information, see https://pre-commit.ci

WebReflection force-pushed the micropython-vs-unicode branch from 6ac0878 to ca75b20 Compare April 9, 2024 12:28

[pre-commit.ci] auto fixes from pre-commit.com hooks

99f654f

for more information, see https://pre-commit.ci

WebReflection force-pushed the micropython-vs-unicode branch from ca75b20 to 99f654f Compare April 9, 2024 12:31

FabioRosado approved these changes Apr 9, 2024

View reviewed changes

WebReflection merged commit 2d5cf09 into pyscript:main Apr 9, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MicroPython badly handling unicode chars #2018

Fix MicroPython badly handling unicode chars #2018

WebReflection commented Apr 4, 2024 •

edited

WebReflection commented Apr 9, 2024

Fix MicroPython badly handling unicode chars #2018

Fix MicroPython badly handling unicode chars #2018

Conversation

WebReflection commented Apr 4, 2024 • edited

Description

Changes

Checklist

WebReflection commented Apr 9, 2024

WebReflection commented Apr 4, 2024 •

edited