Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filename encoding error in some environments with PAX sdist #7667

Closed
ncoghlan opened this issue Jan 28, 2020 · 5 comments · Fixed by #9569
Closed

Filename encoding error in some environments with PAX sdist #7667

ncoghlan opened this issue Jan 28, 2020 · 5 comments · Fixed by #9569
Labels
C: encoding Related to text encoding and likely, UnicodeErrors type: bug A confirmed bug or unintended behavior

Comments

@ncoghlan
Copy link
Member

ncoghlan commented Jan 28, 2020

Environment

  • pip version: any
  • Python version: 2.7
  • OS: Windows, non-Windows in C locale

(pip Windows CI hits this)

Description
The PAX format wheel 0.34.1 sdists fail to install on Python 2.7 on Windows with a UnicodeEncodeError, or on non-Windows systems in a non-utf-8 locale: pypa/wheel#331

Expected behavior
Unicode filename from the PAX tarball is correctly encoded for the local filesystem.

How to Reproduce
Attempt to install a PAX formatted tarball containing a file name that cannot be encoded to the default code page (Windows) or the default locale encoding (non-Windows).

In GNU tar, the affected paths are pre-mangled to something ASCII compatible, but PAX tar preserves them correctly, so the installer needs to handle them itself.

Output

See
https://dev.azure.com/pypa/pip/_build/results?buildId=18040&view=logs&j=404e6841-f5ba-57d9-f2c8-8c5322057572&t=0219f6bf-240d-5b08-c877-377b12af5079&l=309 for a Windows example in the pip test suite.

The wheel issue linked above has some Linux examples.

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Jan 28, 2020
@ncoghlan ncoghlan changed the title Filename encoding error on Windows with PAX sdist Filename encoding error with PAX sdist Jan 28, 2020
@ncoghlan ncoghlan changed the title Filename encoding error with PAX sdist Filename encoding error on Python 2.7 with PAX sdist Jan 28, 2020
@johnthagen
Copy link
Contributor

@ncoghlan Just an FYI, the issue I noted on pypa/wheel#331 was using Python 3.6 (in case that has any bearing here).

@chrahunt chrahunt added the type: bug A confirmed bug or unintended behavior label Jan 29, 2020
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Jan 29, 2020
@chrahunt chrahunt added Python 2 only Python 2 specific S: needs triage Issues/PRs that need to be triaged labels Jan 29, 2020
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Jan 29, 2020
@chrahunt
Copy link
Member

In the process of justifying not fixing this, I figured out enough to fix it. :( See #7668.

@ncoghlan
Copy link
Member Author

@johnthagen Yeah, the non-universal locale encoding problem I mention in #7668 (comment) will apply Python 3 as well.

However 3.7+ mitigate it significantly, as they don't believe the OS when it claims to be using ASCII, and automatically switch to using UTF-8 instead.

@ncoghlan ncoghlan changed the title Filename encoding error on Python 2.7 with PAX sdist Filename encoding error in some environments with PAX sdist Jan 29, 2020
openstack-gerrit pushed a commit to openstack/ironic-python-agent-builder that referenced this issue Jan 30, 2020
As found recently, pip with Python 3.6 and forward has some issues
installing tarballs that contain files with non-ascii characters
in their names.
This is due mainly to the fact that the default locale in the
system is set to C [1].
As a workaround, we run the installation of the packages in the
virtualenv forcing C.UTF-8 locale.

[1] pypa/pip#7667

Change-Id: Idfb8b121a43a0bb74844fd63d5c2507d7b888b15
openstack-gerrit pushed a commit to openstack/openstack that referenced this issue Jan 30, 2020
* Update ironic-python-agent-builder from branch 'master'
  - Fix pip install pkgs with non-ascii characters in filenames
    
    As found recently, pip with Python 3.6 and forward has some issues
    installing tarballs that contain files with non-ascii characters
    in their names.
    This is due mainly to the fact that the default locale in the
    system is set to C [1].
    As a workaround, we run the installation of the packages in the
    virtualenv forcing C.UTF-8 locale.
    
    [1] pypa/pip#7667
    
    Change-Id: Idfb8b121a43a0bb74844fd63d5c2507d7b888b15
@hexagonrecursion
Copy link
Contributor

This issue is marked as "python 2 only". pip 21.0 dropped support for Python 2. Should this be closed?

@uranusjr
Copy link
Member

The encoding arguments affects more than just Python 2, but all versions prior to 3.8, see #7668 (comment)

I’ll remove the Python 2-only label.

@uranusjr uranusjr added C: encoding Related to text encoding and likely, UnicodeErrors and removed Python 2 only Python 2 specific labels Feb 10, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: encoding Related to text encoding and likely, UnicodeErrors type: bug A confirmed bug or unintended behavior
Projects
None yet
5 participants