Fix MicroPython badly handling unicode chars #2018
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Ready to land: this MR is the best we can do to offer unicode sequences in MicroPython.
The underlying issue is likely bigger than just
replProcessChar
exposed API and it should be tackled upstream but meanwhile we can at least explain that typing emoji or non ASCII chars might fail but that having these already in the output or the external file or asking for any input works.Accordingly, if anyone would like to review it I think this should land sooner than later.
P.S. this has been already published as latest npm package.
This MR tries to tackle the fact that MiroPython doesn't really like unicode chars.
The irony lands in a simple
print("µpython")
neither the Linux REPL nor the PyScript one are able to handle in any meaningful way:µ
test = input("unicode? ")
can acceptµpython
because that's just returned as'\xb5python'
string from Readline and one can use andprint(test)
after without any issue ... however ...test = "µpython"
something weird happens becausereplProcessChar(byte)
passes along the whole string but that char gets ignored/discarded and thetest
reference will contain only"python"
without theµ
I am pinging @dpgeorge here because I don't really think this MR should ever land or better, if it has to, I need
replProcessChar
to not mess up with code points because otherwise theinput
length in bytes mismatches the actual output produced byreplProcessChar
via stdout and this is looking pretty ugly to me or surely hard to explain./cc @ntoll too as this might be the last thing to solve before having a terminal parity with Pyodide which suffers none of these issues.
Update if I enter explicitly in paste mode (modified version of the terminal) and I type
test = "µpython"
and then I get out of paste mode, thetest
reference rightly points at the right thing ... now I wonder why the paste mode is not the default or why that would work while normal REPL mode wouldn't without the past mode ... I could maybe circumvent this but it would feel super awkward to \5 and \4 the terminal around users' inputs to simply be able to let them ... well, input whatever they want ... I couldn't find the culprit in MicroPython repository but I suggest to make the REPL "past mode" by defaut as that works!Update
I have created a module that might help dealing with these cases in the future and it code covered everything so that I have spot an issue with current code but I'll fix that tomorrow: https://github.com/WebReflection/buffer-points
Changes
linebuffer
directiveµ
and way more convoluted sequences such as 👩❤️👨 that the output, if either requested as input or already evaluated from the page works ... in latter casetest = "👩❤️👨"
completely messes up the program and the resulting string is emptyChecklist
CHANGELOG.md