Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a bug fix for the general ragged array #112

Open
kazuakiyama opened this issue Apr 6, 2023 · 3 comments
Open

a bug fix for the general ragged array #112

kazuakiyama opened this issue Apr 6, 2023 · 3 comments

Comments

@kazuakiyama
Copy link

kazuakiyama commented Apr 6, 2023

Thanks a lot for developing this package! I want to use Zarr's ragged arrays for some tables that contain multi-dimensional arrays as elements, and I encounter some issues when I try to form a ragged array of two or higher-dimensional arrays. The issue happens when you run a code like the following.

using Zarr

arr = zcreate(Array{Float64, 2}, 10)
arr[:] .= [zeros(Float64,2,2) for i in 1:10]

This will give an error because of the lack of the _zero method for general multidimensional arrays around line 133 of ZArray.jl. I guess adding a line something like

_zero(::Type{<:Array{T, N}}) where T where N = zeros(T, zeros(Int, N)...)

will fix the issue as I think we can follow what has already been defined for Vector{T}. I really appreciate it if you can fix this problem as it is probably easily fixable. Thanks!

@kazuakiyama
Copy link
Author

I tried to fix it in a forked repository and found that it may need a change in the readblock! function. The above code snippets worked after adding the above line in Zarray.jl (also needs a couple of lines to define storageratio and nobytes methods for other methods to work). On the other hand any methods involving readblock! doesn't work for instance getindex and setindex!. As I'm not familier with DiskArrays.jl, hope that someone will chime in.

@meggart
Copy link
Collaborator

meggart commented Apr 14, 2023

I think the main issue here is that you are trying to store a list of matrices instead of vectors and I am not sure this is a trivial change, because there is currently no mechanism for this in the Zarr specs. For storing a vector of vectors there is currently the VLenArrayFilter defined in the numcodecs pyhon package which is currently the reference for available filters. However, this works only for 1D arrays. There is already a suggestion to extend this to n-dimensional arrays: zarr-developers/numcodecs#200 but this seems to have stalled in the meantime.

The main difference is that in addition to the number of elements, also the shape of the array needs to be encoded in a special field of the stored data. I am happy to implement this feature, once there is some consensus on how to do it. If you want to you can bump the PR linked here, it looks as if it is in a reasonable state...

@kazuakiyama
Copy link
Author

Thanks for the reply. Our use case is storing a vector of arrays that may not have the same shape (so, length could be different) but share the number of dimensions, which can be used as a column of for instance DataFrames.jl (also maybe Dagger-based table like DTables.jl). We thought that might be an excellent way to utilize a stack of multi-dimensional arrays in Zarr that has a common "row" axis as a dimension for a table-like interpretation, especially because Zarr.jl has an excellent frontend that can switch various storage types depending on the situation.

So, I don't need a more general case that a ragged array should handle variable dimensions, but I guess it is probably nice to hear a wide opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants