Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes when dealing with large datasets #34

Open
juliaroquette opened this issue Aug 6, 2018 · 1 comment
Open

Crashes when dealing with large datasets #34

juliaroquette opened this issue Aug 6, 2018 · 1 comment

Comments

@juliaroquette
Copy link

I am trying to use deepdish to store/restore large datasets in the HDF5 format, but deepdish.io.save crashes every time the dataset is larger than about 2GB.

For example, suppose we have a very large array:
t=bytearray(8*1000*1000*400)
when I try:
dd.io.save('testeDeepdishLimit',t)
I get the error:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-3-26ecd71b151a> in <module>()
----> 1 dd.io.save('testeDeepdishLimit',t)

~/anaconda3/lib/python3.6/site-packages/deepdish/io/hdf5io.py in save(path, data, compression)
    594         else:
    595             _save_level(h5file, group, data, name='data',
--> 596                         filters=filters, idtable=idtable)
    597             # Mark this to automatically unpack when loaded
    598             group._v_attrs[DEEPDISH_IO_UNPACK] = True

~/anaconda3/lib/python3.6/site-packages/deepdish/io/hdf5io.py in _save_level(handler, group, level, name, filters, idtable)
    302 
    303     else:
--> 304         _save_pickled(handler, group, level, name=name)
    305 
    306 

~/anaconda3/lib/python3.6/site-packages/deepdish/io/hdf5io.py in _save_pickled(handler, group, level, name)
    170                   DeprecationWarning)
    171     node = handler.create_vlarray(group, name, tables.ObjectAtom())
--> 172     node.append(level)
    173 
    174 

~/anaconda3/lib/python3.6/site-packages/tables/vlarray.py in append(self, sequence)
    535             nparr = None
    536 
--> 537         self._append(nparr, nobjects)
    538         self.nrows += 1
    539 

tables/hdf5extension.pyx in tables.hdf5extension.VLArray._append()

OverflowError: value too large to convert to int

Is there any workaround for this issue?

@twmacro
Copy link
Contributor

twmacro commented Sep 6, 2018

I can confirm that I get the same error when I try your example (on a Linux machine and on a Windows machine). I think the error is within PyTables. See PyTables/PyTables#550.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants