Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urwid/widget.py: Edit: Do not crash on ISO-8859-15 with non utf-8 locale #138

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sithglan
Copy link

On non utf-8 terminals 127 ascii is assumed as input encoding. For extended
ascii locales such as de_DE@euro, de_DE, en_US, fr_FR and so on this is wrong.
Python provides the correct input encoding of a terminal in the library
function sys.stdin.encoding which should be used instead.

Howto reproduce the bug:

(x1) [~] locale
LANG=C
LANGUAGE=en_US:en
LC_CTYPE=de_DE@euro
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

(x1) [~] cat test.py
import urwid

def exit_on_q(key):
if key in ('q', 'Q'):
raise urwid.ExitMainLoop()

class QuestionBox(urwid.Filler):
def keypress(self, size, key):
if key != 'enter':
return super(QuestionBox, self).keypress(size, key)
self.original_widget = urwid.Text(
u"Nice to meet you,\n%s.\n\nPress Q to exit." %
edit.edit_text)

edit = urwid.Edit(u"What is your name?\n")
fill = QuestionBox(edit)
loop = urwid.MainLoop(fill, unhandled_input=exit_on_q)
loop.run()

(x1) [~] python test.py
Traceback (most recent call last):
File "test.py", line 18, in
loop.run()
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 274, in run
self.screen.run_wrapper(self._run)
File "/usr/lib/python2.7/dist-packages/urwid/raw_display.py", line 268, in run_wrapper
return fn()
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 339, in _run
self.event_loop.run()
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 669, in run
self._loop()
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 706, in _loop
self._watch_filesfd
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 390, in _update
self.process_input(keys)
File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 490, in process_input
k = self._topmost_widget.keypress(self.screen_size, k)
File "test.py", line 10, in keypress
return super(QuestionBox, self).keypress(size, key)
File "/usr/lib/python2.7/dist-packages/urwid/decoration.py", line 836, in keypress
return self._original_widget.keypress((maxcol,), key)
File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1474, in keypress
self.insert_text(key)
File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1398, in insert_text
text = self._normalize_to_caption(text)
File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1415, in _normalize_to_caption
return text.decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128)

On non utf-8 terminals 127 ascii is assumed as input encoding. For extended
ascii locales such as de_DE@euro, de_DE, en_US, fr_FR and so on this is wrong.
Python provides the correct input encoding of a terminal in the library
function sys.stdin.encoding which should be used instead.

Howto reproduce the bug:

(x1) [~] locale
LANG=C
LANGUAGE=en_US:en
LC_CTYPE=de_DE@euro
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

(x1) [~] cat test.py
import urwid

def exit_on_q(key):
    if key in ('q', 'Q'):
        raise urwid.ExitMainLoop()

class QuestionBox(urwid.Filler):
    def keypress(self, size, key):
        if key != 'enter':
            return super(QuestionBox, self).keypress(size, key)
        self.original_widget = urwid.Text(
            u"Nice to meet you,\n%s.\n\nPress Q to exit." %
            edit.edit_text)

edit = urwid.Edit(u"What is your name?\n")
fill = QuestionBox(edit)
loop = urwid.MainLoop(fill, unhandled_input=exit_on_q)
loop.run()

(x1) [~] python test.py
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    loop.run()
  File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 274, in run
    self.screen.run_wrapper(self._run)
  File "/usr/lib/python2.7/dist-packages/urwid/raw_display.py", line 268, in run_wrapper
    return fn()
  File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 339, in _run
    self.event_loop.run()
  File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 669, in run
    self._loop()
  File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 706, in _loop
    self._watch_files[fd]()
  File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 390, in _update
    self.process_input(keys)
  File "/usr/lib/python2.7/dist-packages/urwid/main_loop.py", line 490, in process_input
    k = self._topmost_widget.keypress(self.screen_size, k)
  File "test.py", line 10, in keypress
    return super(QuestionBox, self).keypress(size, key)
  File "/usr/lib/python2.7/dist-packages/urwid/decoration.py", line 836, in keypress
    return self._original_widget.keypress((maxcol,), key)
  File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1474, in keypress
    self.insert_text(key)
  File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1398, in insert_text
    text = self._normalize_to_caption(text)
  File "/usr/lib/python2.7/dist-packages/urwid/widget.py", line 1415, in _normalize_to_caption
    return text.decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 0: ordinal not in range(128)
@@ -1412,7 +1414,7 @@ def _normalize_to_caption(self, text):
return text
if tu:
return text.encode('ascii') # follow python2's implicit conversion
return text.decode('ascii')
return text.decode(sys.stdin.encoding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to ask the screen for the encoding somehow, because sys.stdin isn't always the place that we're getting input. Maybe the suggestion of only sending unicode input from the screen is a better fix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See pimutils/khal#224 for discussion, IMO the encoding issue should be handled in keypress. If you agree I could provide a patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

As I said, the screen is the only place that has a chance of knowing the right encoding.

If you want the current version urwid to not crash on whatever encoding the user has selected you can use a bytestring for the caption in the Edit widget and the bytes will be passed through without attempting to encode or decode them. (admittedly this guesses the type of the encoding: narrow/wide based on the local console for historical reasons)

If you want to decode the correct encoding the only place that this can be done is the screen. Widgets have no idea where the user input came from and can't assume they came from sys.stdin. A future version of urwid should probably just use Unicode everywhere by having the screen decode and encode all characters. The reason I didn't do this originally was because some encodings don't round-trip to Unicode and back safely. That might not be a real concern these days, I'm not sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do agree though that anything that is called by set_textinsert_text is not a good place to handle this?

Perhaps keypress is not the right place to deal with encodings, but it seemed like a fair choice given that there is already some encoding going on there. I'm not really familiar with urwid's codebase and thought keypress was already fairly low-level.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faced this issue on Debian 9 (python-urwid 1.3.1-2+b1), how should we move forward on this issue?

@bhaskar-c
Copy link

I am not sure if I correctly understand the complete context of this discussion. But AFAIK there is no way to tell the encoding even when you have loaded the content in the console.

You can at best guess the encoding using modules like chardet

>>> import chardet
>>> s = '\xe2\x98\x83' # ☃
>>> chardet.detect(s)
{'confidence': 0.505, 'encoding': 'utf-8'}

or using the command line

chardetect somefile
somefile: windows-1252 with confidence 0.5

I think the best that can be done is make a guess and if the error still occurs, exit gracefully with a meaningful message.

@and3rson and3rson added this to the 2.0.2 milestone Mar 13, 2018
@tonycpsu tonycpsu removed this from the 2.0.2 milestone May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants