Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Make get_data_in_units not load entire array into memory #1881

Open
3 tasks done
rly opened this issue Apr 1, 2024 · 1 comment
Open
3 tasks done

[Feature]: Make get_data_in_units not load entire array into memory #1881

rly opened this issue Apr 1, 2024 · 1 comment
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users

Comments

@rly
Copy link
Contributor

rly commented Apr 1, 2024

What would you like to see added to PyNWB?

As mentioned in #1880, get_data_in_units() loads the entire dataset into memory. For large datasets, that is impractical and will silently blow up a user's RAM.

Is your feature request related to a problem?

No response

What solution would you like?

What do you think about supporting the syntax timeseries.data_in_units[1000:2000, 5:10], i.e., adding a simple wrapper class WrappedArray that defines __getitem__ and delegates the slice argument to the underlying list / numpy array / h5py.Dataset / zarr.Array object.

We can reuse this wrapper class elsewhere to help with addressing slicing differences between different array backends (#1702) and improving performance in h5py slicing (h5py/h5py#293). As mentioned in #1702, full unification of these libraries is outside the scope of this project, but I think providing this wrapper class with its few enhancements would only help.

If we do this, the wrapper class would probably live in HDMF.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

@h-mayorquin
Copy link
Contributor

h-mayorquin commented Apr 1, 2024

Interesting idea. I am personally curios about how the implementation of WrappedArray would look like.

Another alternative is to pass a slice as an argument to get_data_in_units but that way the expresiveness of getitem that most people know from numpy is lost.

@stephprince stephprince added category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users category: proposal proposed enhancements or new features and removed category: enhancement improvements of code or code behavior labels Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features priority: medium non-critical problem and/or affecting only a small set of NWB users
Projects
None yet
Development

No branches or pull requests

3 participants