Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError when reading file with accents in file path #15086

Closed
JGoutin opened this issue Jan 9, 2017 · 27 comments · Fixed by #24758 or #25769
Closed

OSError when reading file with accents in file path #15086

JGoutin opened this issue Jan 9, 2017 · 27 comments · Fixed by #24758 or #25769
Labels
Bug IO CSV read_csv, to_csv Unicode Unicode strings Windows Windows OS
Milestone

Comments

@JGoutin
Copy link

JGoutin commented Jan 9, 2017

Code Sample, a copy-pastable example if possible

test.txt and test_é.txt are the same file, only the name change:

pd.read_csv('test.txt')
Out[3]: 
   1 1 1
0  1 1 1
1  1 1 1

pd.read_csv('test_é.txt')
Traceback (most recent call last):

  File "<ipython-input-4-fd67679d1d17>", line 1, in <module>
    pd.read_csv('test_é.txt')

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 389, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
    self._make_engine(self.engine)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
    self._reader = _parser.TextReader(src, **kwds)

  File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)

  File "pandas\parser.pyx", line 669, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8471)

OSError: Initializing from file failed

Problem description

Pandas return OSError when trying to read a file with accents in file path.

The problem is new (Since I upgraded to Python 3.6 and Pandas 0.19.2)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr
LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 32.3.1
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.9.3
boto: None
pandas_datareader: None

@m-charlton
Copy link
Contributor

Just my pennies worth. Quickly tried it out on Mac OSX and Ubuntu with no
problems. See below.

Could this be an environment/platform problem? I noticed that the LOCALE is
set to None.None. Unfortunately I do not have a windows machine to try this
example on. Admittedly this would not explain why you've seen this after the
upgrade to python3.6 and pandas 0.19.2.

Note: I just set up a virtualenv with python3.6 and installed pandas 0.19.2 using pip.

>>> import pandas as pd
>>> pd.read_csv('test_é.txt')
   a  b  c
0  1  2  3
1  4  5  6

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-57-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 32.3.1
Cython: None
numpy: 1.11.3
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jan 9, 2017

I believe 3.6 switches the file system encoding on windows to utf8 (from ascii). Apart from that we don't have testing enable yet on windows for 3.6 (as some of the required packages are just now becoming available).

@jreback jreback added the Windows Windows OS label Jan 9, 2017
@jreback
Copy link
Contributor

jreback commented Jan 9, 2017

@JGoutin

so I just added build support on appveyor (windows) for 3.6, so if you'd push up your tests to see if it works, would be great.

@z94624
Copy link

z94624 commented Jul 16, 2017

I also faced the same problem when the program stopped at pd.read_csv(file_path). The situation is similar to me after I upgraded my python to 3.6 (I'm not sure the last time the python I installed is exactly what version, maybe 3.5......).

@tpietruszka
Copy link

@jreback what is the next step towards a fix here?
You have mentioned a PR that got 'blown away' - what does it mean?

While I do not use Windows, I could try to help (just got a VM to debug a piece of my code that apparently does not work on windows)

BTW, a workaround: pass a file handle instead of a name
pd.read_csv(open('test_é.txt', 'r'))
(there are several workarounds in related issues, but I have not seen this one)

@jreback
Copy link
Contributor

jreback commented Aug 24, 2017

@tpietruszka see comments on the PR: #15092 (it got removed from a private fork, was pretty much there).

you basically need to encode the paths differently on py3.6 (vs other pythons) on wnidows. basically need to implement: https://docs.python.org/3/whatsnew/3.6.html#pep-529-change-windows-filesystem-encoding-to-utf-8

@dondon2475848
Copy link

my old code (can't run):

import pandas as pd
import os
file_path='./dict/字典.csv'
df_name = pd.read_csv(file_path,sep=',' )

new code (sucessful):

import pandas as pd
import os
file_path='./dict/dict.csv'
df_name = pd.read_csv(file_path,sep=',' )

I think this bug is filename problem.
I change filename from chinese to english, it can run now.

@jreback jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017
@jreback jreback added IO CSV read_csv, to_csv Unicode Unicode strings labels Oct 4, 2017
@fotisj
Copy link

fotisj commented Jan 14, 2018

If anyone comes here like me because he/she hit the same problem, here is a solution until pandas is fixed to work with pep 529 (basically any non ascii chars will in your path or filename will result in errors):

Insert the following two lines at the beginning of your code to revert back to the old way of handling paths on windows:

import sys
sys._enablelegacywindowsfsencoding()

@ColdHumour
Copy link

I use the solution above and it works. Thanks very much @fotisj !
However I'm still confused on why DataFrame.to_csv() doesn't occur same problem. In other words, for unicode file path, write is ok, while read isn't.

@mmagnuski
Copy link

Just pinging this - I have the same issue, I'm using a workaround but it would be great if that was not required.

@jreback
Copy link
Contributor

jreback commented Nov 21, 2018

this needs a community patch

@kchawla-pi
Copy link

I am encountering this issue. I want to try and contribute a patchc Any pointers on how to start fixing this?

@TomAugspurger
Copy link
Contributor

I think none of the maintainers have access to a system that can reproduce this.

Perhaps some of the others in this issue can help put together a solution.

gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 13, 2019
Python 3.6+ changes the default encoding to
UTF8 (PEP 529), which conflicts with the
encoding of Windows (MBCS).

This fix checks if we're using Python 3.6+
and on Windows, after which we force the
encoding to "mbcs".

Closes pandas-devgh-15086.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 13, 2019
Python 3.6+ changes the default encoding to
UTF8 (PEP 529), which conflicts with the
encoding of Windows (MBCS).

This fix checks if we're using Python 3.6+
and on Windows, after which we force the
encoding to "mbcs".

Closes pandas-devgh-15086.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 13, 2019
Python 3.6+ changes the default encoding to
UTF8 (PEP 529), which conflicts with the
encoding of Windows (MBCS).

This fix checks if we're using Python 3.6+
and on Windows, after which we force the
encoding to "mbcs".

Closes pandas-devgh-15086.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 13, 2019
Python 3.6+ changes the default encoding to
UTF8 (PEP 529), which conflicts with the
encoding of Windows (MBCS).

This fix checks if we're using Python 3.6+
and on Windows, after which we force the
encoding to "mbcs".

Closes pandas-devgh-15086.
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Jan 13, 2019
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 14, 2019
Python 3.6+ changes the default encoding to
UTF8 (PEP 529), which conflicts with the
encoding of Windows (MBCS).

This fix checks if we're using Python 3.6+
and on Windows, after which we force the
encoding to "mbcs".

Closes pandas-devgh-15086.
jreback pushed a commit that referenced this issue Jan 14, 2019
Python 3.6+ changes the default encoding to
UTF8 (PEP 529), which conflicts with the
encoding of Windows (MBCS).

This fix checks if we're using Python 3.6+
and on Windows, after which we force the
encoding to "mbcs".

Closes gh-15086.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Python 3.6+ changes the default encoding to
UTF8 (PEP 529), which conflicts with the
encoding of Windows (MBCS).

This fix checks if we're using Python 3.6+
and on Windows, after which we force the
encoding to "mbcs".

Closes pandas-devgh-15086.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Python 3.6+ changes the default encoding to
UTF8 (PEP 529), which conflicts with the
encoding of Windows (MBCS).

This fix checks if we're using Python 3.6+
and on Windows, after which we force the
encoding to "mbcs".

Closes pandas-devgh-15086.
vnlitvinov added a commit to anmyachev/pandas that referenced this issue Mar 12, 2019
vnlitvinov added a commit to anmyachev/pandas that referenced this issue Mar 14, 2019
vnlitvinov added a commit to anmyachev/pandas that referenced this issue Mar 14, 2019
vnlitvinov added a commit to anmyachev/pandas that referenced this issue Mar 18, 2019
vnlitvinov added a commit to anmyachev/pandas that referenced this issue Mar 20, 2019
jreback pushed a commit that referenced this issue Mar 20, 2019
* Fix gh-15086 properly instead of making a workaround

* fix code style

* Make sure test_filename_with_special_chars properly tests combinations of chars
Updated whatsnew

* Address comments by @jreback

* Parametrize test_filename_with_special_chars

Use CP-1252 and CP-1251 filenames separately,
skip the test on Windows on < 3.6 as it won't pass
anmyachev pushed a commit to anmyachev/pandas that referenced this issue Apr 18, 2019
* Fix pandas-devgh-15086 properly instead of making a workaround

* fix code style

* Make sure test_filename_with_special_chars properly tests combinations of chars
Updated whatsnew

* Address comments by @jreback

* Parametrize test_filename_with_special_chars

Use CP-1252 and CP-1251 filenames separately,
skip the test on Windows on < 3.6 as it won't pass
@mmagnuski
Copy link

mmagnuski commented Apr 26, 2020

Hi, I have this problem on pandas 1.0.3 now and sys._enablelegacywindowsfsencoding() workaround stopped working. I have ą and ź in file path.
I get this error also on pandas 0.25.3 but 0.23.4 seems to be working fine when using the workaround (I didn't check other versions). I'd be happy to provide any additional information.

@pranjulknit
Copy link

Remove file from same folder name like ,if your file stored in same folder name as file.
Just remove file from that folder.
don't store file in same folder name.
then,it works

@mmagnuski
Copy link

mmagnuski commented Oct 19, 2020

@pranjulknit If I understand you suggest to move the file to a folder without these problematic characters in the path. This is not always possible. If you suggest that folder names and file names should be different - this is not the issue that is described here, I never had problems with that.

@pranjulknit
Copy link

Actually, i have this problem while reading csv file from jupyter notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Unicode Unicode strings Windows Windows OS
Projects
None yet