Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MicroPython badly handling unicode chars #2018

Merged
merged 3 commits into from
Apr 9, 2024

Conversation

WebReflection
Copy link
Contributor

@WebReflection WebReflection commented Apr 4, 2024

Description

Ready to land: this MR is the best we can do to offer unicode sequences in MicroPython.

The underlying issue is likely bigger than just replProcessChar exposed API and it should be tackled upstream but meanwhile we can at least explain that typing emoji or non ASCII chars might fail but that having these already in the output or the external file or asking for any input works.

Accordingly, if anyone would like to review it I think this should land sooner than later.

P.S. this has been already published as latest npm package.


This MR tries to tackle the fact that MiroPython doesn't really like unicode chars.

The irony lands in a simple print("µpython") neither the Linux REPL nor the PyScript one are able to handle in any meaningful way:

  • on the Linux REPL native shell it just ignores that µ
  • on the Web (WASM) PyScript port it forwards unicode sequences only if I handle these manually
  • a simple test = input("unicode? ") can accept µpython because that's just returned as '\xb5python' string from Readline and one can use and print(test) after without any issue ... however ...
  • if the entry is manually typed as in test = "µpython" something weird happens because replProcessChar(byte) passes along the whole string but that char gets ignored/discarded and the test reference will contain only "python" without the µ

I am pinging @dpgeorge here because I don't really think this MR should ever land or better, if it has to, I need replProcessChar to not mess up with code points because otherwise the input length in bytes mismatches the actual output produced by replProcessChar via stdout and this is looking pretty ugly to me or surely hard to explain.

/cc @ntoll too as this might be the last thing to solve before having a terminal parity with Pyodide which suffers none of these issues.


Update if I enter explicitly in paste mode (modified version of the terminal) and I type test = "µpython" and then I get out of paste mode, the test reference rightly points at the right thing ... now I wonder why the paste mode is not the default or why that would work while normal REPL mode wouldn't without the past mode ... I could maybe circumvent this but it would feel super awkward to \5 and \4 the terminal around users' inputs to simply be able to let them ... well, input whatever they want ... I couldn't find the culprit in MicroPython repository but I suggest to make the REPL "past mode" by defaut as that works!

Update

I have created a module that might help dealing with these cases in the future and it code covered everything so that I have spot an issue with current code but I'll fix that tomorrow: https://github.com/WebReflection/buffer-points


Changes

  • fixed an issue with the py-editor related to the new linebuffer directive
  • provide in worker hook scope a simple callback that pre-buffers unicode sequences accordingly to the standard so that the buffer is sent to the terminal only once those sequences are fulfilled
  • test with both µ and way more convoluted sequences such as 👩‍❤️‍👨 that the output, if either requested as input or already evaluated from the page works ... in latter case test = "👩‍❤️‍👨" completely messes up the program and the resulting string is empty

Checklist

  • create a function that accumulate chars and eventually write whole sequences in the terminal as opposite of "vomiting" one char after the other which results in unreadable output
  • test simple to convoluted unicode cases work
  • All tests pass locally
  • I have updated CHANGELOG.md
  • I have created documentation for this(if applicable)

@WebReflection
Copy link
Contributor Author

As usual, thank you @FabioRosado 🙏

To whom it might concern, this is published on npm already and it fixed most issues around the linebuffer false used now in MicroPython, it fixes the output in the terminal, it fixes a regression with the PyEditor, it just misses the ability for users to directly type, without asking explicitly for an input("...") unicode chars.

The original issue is still open from January 2017 micropython/micropython#2789

We are going to try to help fixing the underlying gotcha as we can but this needs to land due broken state of current MicroPython interpreter with the latest package published on npm.

@WebReflection WebReflection merged commit 2d5cf09 into pyscript:main Apr 9, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants