Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip issues UnicodeDecodeError on Windows 10 for Russian language #4251

Closed
xtipacko opened this issue Jan 24, 2017 · 16 comments
Closed

pip issues UnicodeDecodeError on Windows 10 for Russian language #4251

xtipacko opened this issue Jan 24, 2017 · 16 comments
Labels
auto-locked Outdated issues that have been locked by automation C: encoding Related to text encoding and likely, UnicodeErrors

Comments

@xtipacko
Copy link

xtipacko commented Jan 24, 2017

  • Pip version: 9.0.1
  • Python version: 3.6.0
  • Operating system: Microsoft Windows 10 Home Edition [Version 10.0.10586] for Russian language

Description:

pip issues UnicodeDecodeError on byte 0x8d in Windows 10 for Russian language.
It is not a problem for Windows 7 Ultimate SP1 for English language.
Probably has something to do with default CMD encoding, please fix it.

What I've run:

C:\WINDOWS\system32>pip install pyyaml
Collecting pyyaml
  Using cached PyYAML-3.12.tar.gz
Building wheels for collected packages: pyyaml
  Running setup.py bdist_wheel for pyyaml ... error
  Failed building wheel for pyyaml
  Running setup.py clean for pyyaml
Failed to build pyyaml
Installing collected packages: pyyaml
  Running setup.py install for pyyaml ... error
Exception:
Traceback (most recent call last):
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\compat\__init__.py", line 73, in console_to_str
    return s.decode(sys.__stdout__.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 68: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\commands\install.py", line 342, in run
    prefix=options.prefix_path,
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_set.py", line 784, in install
    **kwargs
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_install.py", line 878, in install
    spinner=spinner,
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess
    line = console_to_str(proc.stdout.readline())
  File "c:\program files (x86)\python36-32\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str
    return s.decode('utf_8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 68: invalid start byte
@pfmoore
Copy link
Member

pfmoore commented Jan 24, 2017

This is likely due to the fact that on Windows Python 3.6 switched to using UTF-8 for console IO. The code is running a subprocess, and then guessing the encoding of the subprocess output as being the same as the encoding of sys.stdout - which was true in Python <3.6 (arguably more by luck than anything else) but is no longer true in 3.6+

The simplest fix is probably to use locale.getpreferredencoding(False) for the encoding, as that's the default encoding used in io.TextIOWrapper and for subprocess when universal_newlines is True.

@xtipacko
Copy link
Author

I thought pip is suposed to be easy for users, is it possible to hide this problems from us ?:)

@pfmoore
Copy link
Member

pfmoore commented Jan 24, 2017

Encodings are not easy for anyone :-) It's certainly possible to deal with this as I said. Just the first time it's come up (it's a Python 3.6 change).

@xtipacko
Copy link
Author

xtipacko commented Jan 24, 2017

I've easily reproduced the problem on my VM with Windows 6.1.7601 (win7 SP1 Russian),
cmd utility: chcp shows up 866,
chcp 65001(UTF-8) - doesn't help

@robinxb
Copy link

robinxb commented Feb 11, 2017

Add a solution here:
run a new cmd.exe console
chcp
it will show the system default code, for example 936.
open Lib/site-package/pip/compat/__init__.py
around 75 line, change return s.decode('utf_8') to return s.decode('cp936')

It's just a workaround. I think pip might need solve this issue asap, it's not easy to find solution.

This may have a general solution using cdll.
Not sure if this is the best solution on windows but I still made a PR for this issue.

robinxb added a commit to robinxb/pip that referenced this issue Feb 11, 2017
robinxb added a commit to robinxb/pip that referenced this issue Feb 11, 2017
@xtipacko
Copy link
Author

actually it was easier to use easy_install as workaround...

@xavfernandez xavfernandez added the C: encoding Related to text encoding and likely, UnicodeErrors label Mar 24, 2017
@dstufft
Copy link
Member

dstufft commented Mar 31, 2017

Closing as a duplicate of #4110.

@dstufft dstufft closed this as completed Mar 31, 2017
sakurai-youhei added a commit to sakurai-youhei/pip that referenced this issue Apr 2, 2017
@zed
Copy link

zed commented Apr 23, 2017

What is the official workaround? How do you update pip if it is itself broken?

@xavfernandez
Copy link
Member

@zed does #4280 fix this ?

@zed
Copy link

zed commented Apr 25, 2017

@xavfernandez What do you mean? Are you suggesting to edit the installed pip/compat.py file manually? I meant something like: set PYTHONLEGACYWINDOWSIOENCODING=nonempty before running pip.

@JoeVogel
Copy link

JoeVogel commented Apr 6, 2018

What is the right solution for fix this?

@pradyunsg
Copy link
Member

Hey @JoeVogel!

pip 10 is currently in beta and has a fix for this. You can upgrade o it (if you don't mind using a beta version) by running pip install -U --pre pip

@changnet
Copy link

changnet commented May 9, 2018

win10
E:>pip -V
pip 10.0.1 from d:\program files\python\python35\lib\site-packages\pip (python 3.5)

Still have the same problem when i install lupa1.6 with "pip install lupa":

    Using bundled Lua
    building without Cython
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\dell\AppData\Local\Temp\pip-req-build-zth2l84p\setup.py", line 308, in <module>
        for text_file in ['README.rst', 'INSTALL.rst', 'CHANGES.rst', "LICENSE.txt"]])
      File "C:\Users\dell\AppData\Local\Temp\pip-req-build-zth2l84p\setup.py", line 308, in <listcomp>
        for text_file in ['README.rst', 'INSTALL.rst', 'CHANGES.rst', "LICENSE.txt"]])
      File "C:\Users\dell\AppData\Local\Temp\pip-req-build-zth2l84p\setup.py", line 298, in read_file
        return f.read()
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 1183: illegal multibyte sequence

I'm a chinese,so the system default encoding is cp936,which is 'gbk'.Switch console encoding to utf-8(chcp 65001) won't make any diffrence.

So i download the lupa1.6 tar ball from:https://pypi.org/project/lupa/#files.Found the code raise error:

# line 295
def read_file(filename):
    with open(os.path.join(basedir, filename)) as f:
        return f.read()


def write_file(filename, content):
    with open(os.path.join(basedir, filename), 'w') as f:
        f.write(content)


long_description = '\n\n'.join([
    read_file(text_file)
    for text_file in ['README.rst', 'INSTALL.rst', 'CHANGES.rst', "LICENSE.txt"]])

write_file(os.path.join('lupa', 'version.py'), "__version__ = '%s'\n" % VERSION)

Files('README.rst', 'INSTALL.rst', 'CHANGES.rst', "LICENSE.txt") are encoding with utf-8,while function open
do not specify encoding argument.I add utf-8 encoding argument,problem solved.

def read_file(filename):
    with open(os.path.join(basedir, filename), 'r',encoding='utf-8') as f:
        return f.read()


def write_file(filename, content):
    with open(os.path.join(basedir, filename), 'w',encoding='utf-8') as f:
        f.write(content)

@pradyunsg
Copy link
Member

@changnet Please open a new issue.

@pfmoore
Copy link
Member

pfmoore commented May 9, 2018

@changnet However, this appears to be a problem with the setup.py for the lupa project, so you should probably raise it with them, rather than here.

@lock
Copy link

lock bot commented Jun 2, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 2, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: encoding Related to text encoding and likely, UnicodeErrors
Projects
None yet
Development

No branches or pull requests

9 participants