-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urwid/widget.py: Edit: Do not crash on ISO-8859-15 with non utf-8 locale #138
base: master
Are you sure you want to change the base?
Conversation
On non utf-8 terminals 127 ascii is assumed as input encoding. For extended ascii locales such as de_DE@euro, de_DE, en_US, fr_FR and so on this is wrong. Python provides the correct input encoding of a terminal in the library function sys.stdin.encoding which should be used instead. Howto reproduce the bug: (x1) [~] locale LANG=C LANGUAGE=en_US:en LC_CTYPE=de_DE@euro LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL= (x1) [~] cat test.py import urwid def exit_on_q(key): if key in ('q', 'Q'): raise urwid.ExitMainLoop() class QuestionBox(urwid.Filler): def keypress(self, size, key): if key != 'enter': return super(QuestionBox, self).keypress(size, key) self.original_widget = urwid.Text( u"Nice to meet you,\n%s.\n\nPress Q to exit." % edit.edit_text) edit = urwid.Edit(u"What is your name?\n") fill = QuestionBox(edit) loop = urwid.MainLoop(fill, unhandled_input=exit_on_q) loop.run() (x1) [~] python test.py Traceback (most recent call last): File "test.py", line 18, in <module> loop.run() File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 274, in run self.screen.run_wrapper(self._run) File "/usr/lib/python2.7/dist-packages/urwid/raw_display.py", line 268, in run_wrapper return fn() File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 339, in _run self.event_loop.run() File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 669, in run self._loop() File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 706, in _loop self._watch_files[fd]() File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 390, in _update self.process_input(keys) File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 490, in process_input k = self._topmost_widget.keypress(self.screen_size, k) File "test.py", line 10, in keypress return super(QuestionBox, self).keypress(size, key) File "/usr/lib/python2.7/dist-packages/urwid/decoration.py", line 836, in keypress return self._original_widget.keypress((maxcol,), key) File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1474, in keypress self.insert_text(key) File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1398, in insert_text text = self._normalize_to_caption(text) File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1415, in _normalize_to_caption return text.decode('ascii') UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128)
@@ -1412,7 +1414,7 @@ def _normalize_to_caption(self, text): | |||
return text | |||
if tu: | |||
return text.encode('ascii') # follow python2's implicit conversion | |||
return text.decode('ascii') | |||
return text.decode(sys.stdin.encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to ask the screen for the encoding somehow, because sys.stdin isn't always the place that we're getting input. Maybe the suggestion of only sending unicode input from the screen is a better fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See pimutils/khal#224 for discussion, IMO the encoding issue should be handled in keypress
. If you agree I could provide a patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
As I said, the screen is the only place that has a chance of knowing the right encoding.
If you want the current version urwid to not crash on whatever encoding the user has selected you can use a bytestring for the caption in the Edit widget and the bytes will be passed through without attempting to encode or decode them. (admittedly this guesses the type of the encoding: narrow/wide based on the local console for historical reasons)
If you want to decode the correct encoding the only place that this can be done is the screen. Widgets have no idea where the user input came from and can't assume they came from sys.stdin. A future version of urwid should probably just use Unicode everywhere by having the screen decode and encode all characters. The reason I didn't do this originally was because some encodings don't round-trip to Unicode and back safely. That might not be a real concern these days, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do agree though that anything that is called by set_text
insert_text
is not a good place to handle this?
Perhaps keypress
is not the right place to deal with encodings, but it seemed like a fair choice given that there is already some encoding going on there. I'm not really familiar with urwid's codebase and thought keypress
was already fairly low-level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faced this issue on Debian 9 (python-urwid 1.3.1-2+b1), how should we move forward on this issue?
I am not sure if I correctly understand the complete context of this discussion. But AFAIK there is no way to tell the encoding even when you have loaded the content in the console. You can at best guess the encoding using modules like chardet >>> import chardet
>>> s = '\xe2\x98\x83' # ☃
>>> chardet.detect(s)
{'confidence': 0.505, 'encoding': 'utf-8'} or using the command line
I think the best that can be done is make a guess and if the error still occurs, exit gracefully with a meaningful message. |
On non utf-8 terminals 127 ascii is assumed as input encoding. For extended
ascii locales such as de_DE@euro, de_DE, en_US, fr_FR and so on this is wrong.
Python provides the correct input encoding of a terminal in the library
function sys.stdin.encoding which should be used instead.
Howto reproduce the bug:
(x1) [~] locale
LANG=C
LANGUAGE=en_US:en
LC_CTYPE=de_DE@euro
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=
(x1) [~] cat test.py
import urwid
def exit_on_q(key):
if key in ('q', 'Q'):
raise urwid.ExitMainLoop()
class QuestionBox(urwid.Filler):
def keypress(self, size, key):
if key != 'enter':
return super(QuestionBox, self).keypress(size, key)
self.original_widget = urwid.Text(
u"Nice to meet you,\n%s.\n\nPress Q to exit." %
edit.edit_text)
edit = urwid.Edit(u"What is your name?\n")
fill = QuestionBox(edit)
loop = urwid.MainLoop(fill, unhandled_input=exit_on_q)
loop.run()
(x1) [~] python test.py
Traceback (most recent call last):
File "test.py", line 18, in
loop.run()
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 274, in run
self.screen.run_wrapper(self._run)
File "/usr/lib/python2.7/dist-packages/urwid/raw_display.py", line 268, in run_wrapper
return fn()
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 339, in _run
self.event_loop.run()
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 669, in run
self._loop()
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 706, in _loop
self._watch_filesfd
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 390, in _update
self.process_input(keys)
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 490, in process_input
k = self._topmost_widget.keypress(self.screen_size, k)
File "test.py", line 10, in keypress
return super(QuestionBox, self).keypress(size, key)
File "/usr/lib/python2.7/dist-packages/urwid/decoration.py", line 836, in keypress
return self._original_widget.keypress((maxcol,), key)
File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1474, in keypress
self.insert_text(key)
File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1398, in insert_text
text = self._normalize_to_caption(text)
File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1415, in _normalize_to_caption
return text.decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128)