Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when using Python 2.7 #28

Open
jplu opened this issue Oct 19, 2017 · 4 comments
Open

UnicodeDecodeError when using Python 2.7 #28

jplu opened this issue Oct 19, 2017 · 4 comments
Labels

Comments

@jplu
Copy link

jplu commented Oct 19, 2017

Hello,

I'm using Deepdish to save a dictionary that contains unicode strings as key and numpy arrays, corresponding to computed embeddings. Small example to reproduce the exception:

import deepdish as dd
import numpy as np
d = {'foo': np.ones((10, 20)),'sub': {'bar': 'a string','é': 1.23,},}
dd.io.save('test.h5', d)

And the raised exception is:

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/path.py:112: NaturalNameWarning: object name is not a valid Python identifier: '\xc3\xa9'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 584, in save
    filters=filters, idtable=idtable)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 212, in _save_level
    idtable=idtable)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/deepdish/io/hdf5io.py", line 297, in _save_level
    setattr(group._v_attrs, name, level)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/attributeset.py", line 481, in __setattr__
    self._g__setattr(name, value)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/attributeset.py", line 423, in _g__setattr
    self._g_setattr(self._v_node, name, stvalue)
  File "tables/hdf5extension.pyx", line 658, in tables.hdf5extension.AttributeSet._g_setattr (tables/hdf5extension.c:7458)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Any idea how to overcome this problem?

Thanks!

@asanakoy
Copy link

asanakoy commented Oct 19, 2017

I have the same problem. You have to encode every unicode key before saving: u'é'.encode('utf-8')

@jplu
Copy link
Author

jplu commented Oct 19, 2017

It is not working. Here what I did:

import deepdish as dd
import numpy as np
d = {'foo': np.ones((10, 20)),'sub': {'bar': 'a string',u'é'.encode("utf-8"): 1.23,},}
dd.io.save('test.h5', d)

And I get the same exception

@gustavla
Copy link
Member

Thank you so much for finding this issue! I don't use Python 2 much, so I am happy that this has been identified.

So far, I have detected two issues with unicode under Python 2. One was my bug that meant you could not save using unicode group names. This has just been fixed so you can just do pip install -U deepdish.

However, the other problem is reading files with unicode group names, and this seems to be an issue with PyTables, which is our HDF5 backend. I have filed and issue (PyTables/PyTables#652) so let's see what they say.

All of this seems to be working fine under Python 3, so that is currently a work-around.

@gustavla gustavla added the bug label Oct 21, 2017
@jplu
Copy link
Author

jplu commented Oct 23, 2017

Thanks a lot for jumping quickly into this issue @gustavla I will closely follow the issue on PyTables. I'm using Python 3 for now as a work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants