Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should dimnames -> axes names work? #86

Open
Tokazama opened this issue Dec 12, 2019 · 11 comments
Open

How should dimnames -> axes names work? #86

Tokazama opened this issue Dec 12, 2019 · 11 comments
Labels
RequiresThinking For issues / feature requests that need detailed and careful consideration

Comments

@Tokazama
Copy link
Contributor

Tokazama commented Dec 12, 2019

I've mentioned this in a more rambly post here, but I thought it would be good to specifically address how to propagate names to axes. Specifically, how should one get the axes (e.g., axes(NamedDimsArray)) and access the names in a concise and meaningful way?

As I see it there are 3 possible ways for this to happen:

  1. Return a NamedTuple
  2. Incorporate names into the individual elements of the Tuple returned by axes.
  3. Make a unique structure like NamedTuple that permits repeated names.
  4. Make names and axes completely separate and users sort out how to keep track of them

The first one would only work if each name is completely unique, so there would have to be defaults for wild card names. We'd also have to address a lot of the instances in base where indexing is accomplished by iterating through a tuple (e.g. to_indices).
The second one would require making a novel type for a named index.
The third is what is currently happening and would mean we do nothing for this particular issue, but is pretty obnoxious.

@mcabbott
Copy link
Collaborator

I think that when last I tried, making axes or size return a NamedTuple broke all sorts of things which Base expects. And, as you say, this demand unique names. And I'm not so sure what problem it solves anyway. If you want to retrieve an axis, axes(nda, :x) already works fine, instead of axes(nda).x.

For point 2. the problem this aims to solve is the creation of new arrays which carry some of the same names? You may enjoy some experiments in NamedPlus.jl which do this for size, because I realised I had a lot of functions which took the size of their input, and made a new array to write into. (All axes were just OneTo(n).) With this modified size, the new array automatically gets the names:

julia> using NamedPlus

julia> nda = named(rand(Int8, 3,7), :x, :y)
3×7 NamedDimsArray(::Array{Int8,2}, (:x, :y)):
      → :y
↓ :x  -79  -80  111  -70  110  -122  52
      -36  -57  109   94   20    -4  21
      123    2   65  -64  -30    -2  72

julia> d,n = size(nda)
(3ᵅ, 7ᵅ)

julia> zed = zeros(2,n)
2×7 NamedDimsArray(::Array{Float64,2}, (:_, :y)):
      → :y
↓ :_  0.0  0.0  0.0  0.0  0.0  0.0  0.0
      0.0  0.0  0.0  0.0  0.0  0.0  0.0

julia> d'
NamedInt(x′ = 3)

@Tokazama
Copy link
Contributor Author

I hadn't even thought of size returning names. The main reason I'm interested in this is because it's used a lot in ImageCore.jl there's a whole packages, ImageAxes that built on this feature of AxisArrays.

@mcabbott
Copy link
Collaborator

I see, but this is about AxisArrays's nontrivial "axes", not Base.axes. There was a PR to make those accessible as A.x, which I copied in AxisRanges.jl.

@Tokazama
Copy link
Contributor Author

I see, but this is about AxisArrays's nontrivial "axes", not Base.axes.

To be clear, I'm not saying there shouldn't be a comparable functionality for size. I just haven't thought of it and actually think it raises a good point. That is, how far should the names be propagated through other methods? I'm assuming it wouldn't be too difficult to settle on just the "axes" and "size" type methods because those are really all that could propagate the names meaningfully.

My personal preference would be that size or axes return something that has both the size/axes information and names. I realize that might not be a realistic option though and I'm ultimately open to any solution that is performant and maintainable. I have more detailed opinions about each solution, but I wanted to see if anyone else had more solutions/thoughts first.

@mcabbott
Copy link
Collaborator

mcabbott commented Dec 12, 2019

Sure. My point with the size example was mostly that it's a working implementation you can play with, of something like your point 2. (With a few rough edges!) One could surely do something similar for axes, although I guess it would be a bit more work. If you have lots of OffsetArrays, then this would be useful to make things like zeros(Int, axes(nda, 1)) also propagate names.

@Tokazama
Copy link
Contributor Author

I really like the idea of approaching it from point 2, but then what does NamedDims.jl offer that couldn't be more conveniently incorporated into NamedIndices.jl? Just use a struct that wraps an array and it's accompanying axes like NamedIndices is trying to do. This would avoid rewriting all of the array interface again.

@oxinabox oxinabox added the RequiresThinking For issues / feature requests that need detailed and careful consideration label Dec 12, 2019
@oxinabox
Copy link
Member

oxinabox commented Jan 18, 2020

I forgot about this issue. I was at a conference.

Note: all the following also apply to size:

I think that when last I tried, making axes or size return a NamedTuple broke all sorts of things which Base expects.
...
I see, but this is about AxisArrays's nontrivial "axes", not Base.axes.

AxisArrays and Base both defining slightly different semantics for different functions call axes is annoying.

One fairly easy solution would be for NamedDims.jl to define named_axes(::NamedDimsArray) which returns a NamedTuple.
Something like:

function named_axes(nda::NamedDimsArray{L}) where L
    return (; (name=>axes(nda, name) for name in L)...)
end

(That should nest happily with IndexedDims overloading axes to retun iterators of the dimensions indexes)

Then if a package is transitioning from the use of AxisArrays.axes to something NamedDims based,
they can do find and replace, or const axes = named_axes

@oxinabox
Copy link
Member

I really like the idea of approaching it from point 2, but then what does NamedDims.jl offer that couldn't be more conveniently incorporated into NamedIndices.jl? Just use a struct that wraps an array and it's accompanying axes like NamedIndices is trying to do. This would avoid rewriting all of the array interface again.

I have no idea what this comment is talking about sorry.
I can't find this NamedIndices.jl package.

In general NamedDims takes care of everything to do with the names.
So all the things that define that they take a dim argument, NamedDims takes care of converting those into integers before passing it on the other underlying type.
For a bunch of those functions (e.g. sum) almost all underling types have nothing to do and don't need to overload it again.
For some functions, like getindex then the backing array tyope does still have more things to do; but at least when writing it they don't have to think about name lookup anymore.

@Tokazama
Copy link
Contributor Author

I implemented some glue code very similar to this elsewhere https://github.com/JuliaImages/ImageCore.jl/blob/master/src/traits.jl#L42. I think it's probably the right solution, but at some point we need to figure out how our names will by compatible with the keys of a NamedTuple. Right now we'd have to convert wild card names to something unique.

@oxinabox
Copy link
Member

at some point we need to figure out how our names will by compatible with the keys of a NamedTuple. Right now we'd have to convert wild card names to something unique.

To my mind when people end up in circumstances like that, it is a sign that they should have resolved all wildcards first -- since if they want to access something by name, they had better know what that name is before they do so.
So generally, either erroring or making picking one arbitarily is kind of OK -- we can assume they never check the value for the duplicated one anyway?

With your trait based approach, could detect the duplicate named and if so say that it doesn't have names.

@Tokazama
Copy link
Contributor Author

Tokazama commented Jan 18, 2020

I have no idea what this comment is talking about sorry.

Sorry, I meant IndexDims. It's a fair point though that this functions could be simpler to implement separately.

since if they want to access something by name, they had better know what that name is before they do so.

I think that logically makes sense, but it would change how we do some things with images right now. Maybe I should make a PR and see if I can figure out how to connect all the dots on paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RequiresThinking For issues / feature requests that need detailed and careful consideration
Projects
None yet
Development

No branches or pull requests

3 participants