Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas Support #156

Open
gheber opened this issue Jun 30, 2022 · 5 comments
Open

pandas Support #156

gheber opened this issue Jun 30, 2022 · 5 comments
Assignees

Comments

@gheber
Copy link
Member

gheber commented Jun 30, 2022

import h5pyd as h5py -> Happiness
import pandahsds as pandas -> Sadness

@jreadey
Copy link
Member

jreadey commented Jun 30, 2022

It's pretty easy now to read a numpy array with h5pyd and convert to a Pandas dataframe. See: https://github.com/HDFGroup/hdflab_examples/blob/master/Tutorial/09-Queries.ipynb for an example.

Using HSDS as the basis for a distributed table package would be interesting. This idea is explored a bit in: h5py/h5py#2095.

@gheber
Copy link
Member Author

gheber commented Jun 30, 2022

Right, but I want to read an HDF5 file created via DataFrame.to_hdf.

@gheber
Copy link
Member Author

gheber commented Jun 30, 2022

Or DataFrame.to_hsds 😄

@ajelenak
Copy link
Contributor

Perhaps this could be done by enabling pandas HDF-related methods to accept an h5py.File object? Then this could also be an h5pyd.File object.

@jreadey
Copy link
Member

jreadey commented Jul 7, 2022

Pandas is designed to work with in-memory data which has led to several other projects that support Pandas-like API but work with larger data sets than Pandas can support.
Something like: https://github.com/vaexio/vaex, already supports HDF5. Extend to support h5pyd?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants