deprecate (then remove) generalized linear indexing #14770

IainNZ · 2016-01-23T00:04:11Z

I'm sure this has been discussed elsewhere, but maybe not recently or in a focussed/isolated fashion. If this shouldn't be an issue report, I'll move it to julia-users.

Essentially, I would say this behavior is surprising:

julia> x = reshape(1:3^3, (3,3,3));

julia> size(x)
(3,3,3)

julia> x[3,4]
12

This was a cause of a hard-to-find bug for me recently, where I accidentally forgot an index.
I suppose this is in some way a generalization of the fact that x[12] works, but I'm not really sure why or where that behavior is useful for 2D-or-higher indices. Is there some logic for the current behavior?

EDIT: most similar discussion: #5396

The text was updated successfully, but these errors were encountered:

timholy · 2016-01-23T01:19:10Z

The last index is effectively interpreted as a linear index, if the dimensionality is insufficient. (More properly, for LinearFast AbstractArrays, everything gets converted to an linear index in the end, although there is intermediate bounds-checking.) Extra 1s are also dropped. It's been this way "forever," but I won't close this issue if you're hoping for reconsideration. Suffice it to say that I've sometimes found this behavior useful, but I also understand how it could be a source of bugs.

Relevant code:
LinearFast (converts all to a linear index):

julia/base/abstractarray.jl

Line 508 in 0c20e64

unsafe_getindex(A, sub2ind(size(A), J...))

LinearSlow (expands to full dimensionality):

julia/base/abstractarray.jl

Lines 525 to 551 in 0c20e64

    
           elseif N > AN 
        
               # Drop trailing ones 
        
               Isplat = Expr[:(I[$d]) for d = 1:AN] 
        
               Osplat = Expr[:(to_index(I[$d]) == 1) for d = AN+1:N] 
        
               quote 
        
                   $(Expr(:meta, :inline, :propagate_inbounds)) 
        
                   (&)($(Osplat...)) || throw_boundserror(A, I) 
        
                   getindex(A, $(Isplat...)) 
        
               end 
        
           else 
        
               # Expand the last index into the appropriate number of indices 
        
               Isplat = Expr[:(I[$d]) for d = 1:N-1] 
        
               i = 0 
        
               for d=N:AN 
        
                   push!(Isplat, :(s[$(i+=1)])) 
        
               end 
        
               sz = Expr(:tuple) 
        
               sz.args = Expr[:(size(A, $d)) for d=N:AN] 
        
               szcheck = Expr[:(size(A, $d) > 0) for d=N:AN] 
        
               quote 
        
                   $(Expr(:meta, :inline, :propagate_inbounds)) 
        
                   # ind2sub requires all dimensions to be > 0: 
        
                   (&)($(szcheck...)) || throw_boundserror(A, I) 
        
                   s = ind2sub($sz, to_index(I[$N])) 
        
                   getindex(A, $(Isplat...)) 
        
               end 
        
           end

eschnett · 2016-01-23T01:38:29Z

Whether the functionality is useful is not quite the same question as whether this should be the default behaviour. There could be a function linearindexing with this behaviour, or one could create a subarray...

johnmyleswhite · 2016-01-23T01:40:58Z

FWIW, I tend to feel like this behavior is too odd to be the default.

IainNZ · 2016-01-23T02:07:26Z

I would say that I'm looking for reconsideration, yes. I think its got a high surprising-ness to it, relative to its (minimal-but-nonzero) utility. I guess my main initial goal was understanding how much utility it does have, and to whether "fixing it" would have nontrivial performance applications.

In terms of options for "fixing it", after I admit relatively little consideration, it doesn't seem like anything but requiring all dimensions would be consistent. e.g. consider

julia> x = reshape(1:3^3, (3,3,3));
julia> x[3,3]
9

In some ways I find that even more surprising than my other example (but would have found it returning x[3,3,:] less so).

BobPortmann · 2016-01-23T06:31:28Z

I agree and feel strongly that it should be an error to index an array with fewer indices than the rank of the array. It just makes it too easy to introduce hard to find errors into code when using higher rank arrays (say 4-10 dimensional). It is easy enough to use reshape to make a copy free reference if one wants to use the trailing linear indexing feature (and this leaves the intent clear in the code). And one could easily write some helper functions to make trailing linear indexing easier to use. I would also note that other languages (e.g., IDL and Fortran) make this an error so the present behavior will be surprising to many.

The case of 1-D linear indexing is perhaps OK since indexing as a one dimensional array is visually distinct (compared with, e.g., indexing a 8-D array as 7-D). However, since it is so easy to use vec I am not sure even this is necessary.

This was discussed a bit in #4774 but I don't know how to link to specific comments and that is a long issue.

rfourquet · 2016-01-23T06:44:22Z

@BobPortmann you can link to specific comments by right clicking on the timestamp (next to the author name in the header of the comment), and selecting "copy link location" or equivalent.

toivoh · 2016-01-23T07:27:58Z

+1 to reconsidering this, it just seems so much more likely that this would be used by mistake than on purpose. Even for linear indexing I think that we should consider if it should have its own method instead of being overloaded onto regular indexing.

johnmyleswhite · 2016-01-23T17:46:24Z

I kind of love the idea of giving linear indexing a special method.

timholy · 2016-01-23T19:02:41Z

What about A[i, LinearIndex(j)]?

johnmyleswhite · 2016-01-23T19:06:00Z

I'd be ok with that.

JeffBezanson · 2016-01-23T19:44:27Z

+1 to dropping this behavior and having separate syntax of some kind for linear indexing. This behavior was just taken as the default choice since it's what some other environments do. I would describe it as an over-generalization of a premature optimization.

StefanKarpinski · 2016-01-23T20:31:58Z

I would describe it as an over-generalization of a premature optimization.

Precisely. Let's ditch this for sure.

tknopp · 2016-01-24T07:20:16Z

-1 for dropping the 1D version. Its very natural to loop over all elements in a multidimensional array using

for n=1:length(x)
  x[n] = ...
end

I know this can be achieve in other ways (e.g. enumerate) but the above is quite easy to remember.

lobingera · 2016-01-24T13:23:15Z

@JeffBezanson, i disagree with the statement about over-generalization of a premature optimization.

Linear indexing is the natural indexing scheme and anything providing multi dimensional access is actually some infrastructure to make it easier for the person writing code or design an algorithm.
Taking linear indexing away completely just feels wrong to me, introducing additional syntax (which means to me, i have to hit more keys on the REPL before getting output) seems to me additional burden; a[i] is just linear, a[i,j] is dimensional; where is the problem?

Now to the interesting problem that started this: Having a a[i,j] access when a is actually defined 3D. I was not aware that this existed, but actually this is something that i'd like to have in a language.

The initial problem for me (and i think the title of the issue is missleading) is rather in the area of a compiler warning or some code checking (like lint).

toivoh · 2016-01-24T15:51:16Z

The way I see it, linear indexing is the natural indexing scheme in the same sense that machine code is the natural programming language: it is the fundamental one.

But Julia is in the business of providing abstractions. In fact, it provides abstractions such as subarrays, where linear indexing is not the natural indexing scheme.

I don't think that linear indexing should be removed completely, especially not for the array types where it is the fundamental scheme. But seeing as subarray performance is going to be increasingly important, it might make sense to make it a bit more cumbersome to use linear indexing, in order to nudge people into using eachindex etc to get good performance across a greater number of array types.

eschnett · 2016-01-24T18:04:23Z

Just thinking out loudly: In addition to eachindex, there could also be an eachelement function that returns a reference to an array element, as in

for xn in eachelement(x)
    xn = ...
end

This reference could be something like a zero-dimensional subarray.

timholy · 2016-01-24T18:14:40Z

Wouldn't that have to be xn[] = ...?

To just access the value, we already have that from for x in X....

eschnett · 2016-01-24T19:24:07Z

Right, should have been xn[].

Yes, this is about avoiding having to handle indices at all if one wants to iterate over a whole array, assigning to elements.

toivoh · 2016-01-24T19:49:39Z

That could be useful.

mbauman · 2016-01-24T20:08:53Z

👍 to deprecating this. I've called this "partial" linear indexing before — not sure where I picked up that terminology. It's been discussed in #13015 and #5396, and it's a bullet point that I listed as a possibility in #13157.

JaredCrean2 · 2016-01-24T22:16:44Z

That kind of iterator would be very useful for sparse matrices where accessing non-structural locations is not allowed (for example, PETSc matrices).

lobingera · 2016-01-25T09:33:56Z

i'm just wondering, how and why this was brought in, anyway? Someone found it helpful and i guess it was happening <v0.2.

Maybe i'm biased because my matlab coding includes very often linear indexing although the data is organized in 2or3D and i have a mental model to deal with it.

KristofferC · 2016-01-25T09:41:19Z

Another -1 for dropping the 1D version for same reason as #14770 (comment).

Just keep the 1D version of linear indexing and remove the "partial" linear indexing imo.

ViralBShah · 2016-01-25T10:02:54Z

I too would love to have the 1d linear indexing version and getting rid of the partial linear indexing. That is quite a sane solution.

toivoh · 2016-01-25T12:15:27Z

Sorry for hijacking the thread, no matter what happens to linear indexing, it seems that almost everyone is for getting rid of partial linear indexing. That should also be a much smaller change, since there shouldn't be much (if any) code that depends on it.

lobingera · 2016-01-25T12:18:14Z

@toivoh, No, i'm not for getting rid of this partial linear indexing.

toivoh · 2016-01-25T13:34:07Z

That's why I said almost everyone :)
Is there anyone else who wants to keep partial linear indexing?

There is of course always a tension between catching bugs and convenience,
personally I just think that this would catch a lot more bugs than the
times the functionality would be needed. I don't even know when I would use
partial linear indexing.

toivoh · 2016-01-25T13:35:02Z

s/linear/partial linear/ in the last sentence.

tknopp · 2016-01-25T13:46:27Z

Well, although I do not want to vote in a particular direction, there are use cases and I even have uses it in some situations. Imagine you have some plotting tool that is capable of displaying slices within a 3D dataset c. So it will display

c[:,:,k]

where k is the slice number. Now I am confronted with 4D data (3D+time) and would like to reuse my existing infrastructure. Currently, when everything is properly duck typed it would be very simple to push a 4D dataset an allow k to be beyond size(c,3).

mbauman · 2016-12-09T17:55:36Z

For what it's worth, the most straight-forward implementation here requires #18457 since it fixes a method sorting bug with Varargs of a bound length.

StefanKarpinski · 2016-12-09T19:05:37Z

Since #18457 is almost ready (can we merge it yet?), let's wait for that and then do that.

mbauman · 2017-01-18T21:45:20Z

An update is in order here. We've been talking about two completely different things in this thread, with vastly different ramifications:

Deprecate the linearization of any dimension beyond the first. This is what I've done in RFC: Deprecate partial linear indexing #20079. Note that it still allows indexing into an N dimensional array with fewer than N indices. Once this change goes through, we will be able to simplify the lowering of A[i, end] to simply use size(A, 2) — and I think we can ditch Base.trailingsize altogether. Similarly, we'll no longer need to fuss with special cases for trailing :s beyond the single A[:] case. Notably, this only impacted test code that was specifically crafted to test this behavior.
Deprecate indexing into N dimensional arrays with anything but 1 or N indices. This is what I attempted in WIP/RFH: Deprecate generalized linear indexing #20040, and it is massively disruptive because it removes the ability to index into vectors with trailing 1s. This change requires bending over backwards to ensure that we only define indexing with 1 or N dimensions. While I suppose we could theoretically deprecate indexing with 1 < n < N indices but keep trailing singleton dimensions, specifying that with dispatch would require support for a new signature: getindex{N}(A::AbstractArray{T,N} where T, I::Vararg{Int, N}, trailing_singletons::Int…).

I'm for the first change, but against the second.

Sacha0 · 2017-01-22T19:31:31Z

Re. indexing with trailing singletons: Though indexing with trailing singletons is sometimes convenient when writing code, that style is often confusing / obfuscating when reading code. Code being read more than written, the benefit of disallowing indexing with trailing singletons (clarity) seems worth the cost (minor convenience reduction).

Deprecating indexing with anything other than 1 or N dimensions received broad support in this thread. The primary obstacle to completing this deprecation seems to be the volume of work involved in removing indexing with trailing singletons from existing base code. Particularly, the linear algebra code seems to be the primary consumer of indexing with trailing singletons. (Reading the linear algebra code is what convinced me of the above clarity point :).) If that is correct, and there otherwise remains broad support for deprecating indexing with anything other than 1 or N dimensions, I would be happy to primarily shoulder the necessary linear algebra related work over the next release cycle. Best!

mbauman · 2017-01-23T01:15:26Z

I'm not as certain on the consensus since the thread is a little meandering… and I know I have been imprecise in my language when talking about these two options. It'd be great to hear updated opinions now that we have two very concrete (and implemented) options. You can try them out! CC @andreasnoack, who is conspicuously absent here.

My hesitancy about only-1-or-N-indices isn't just that it requires a lot of changes to existing code, but that it is also more difficult to implement and enforce. Were this more consistently a net simplification (on either side), then I think I'd find it more attractive.

BobPortmann · 2017-01-23T17:50:16Z

@mbauman Either of your 2 options above will be a big improvement but I think having stricter rules would be better (i.e., deprecating indexing with anything other than 1 or N dimensions). It is not clear to me why this would be "more difficult to implement and enforce". Seems naively that it would be easier, not harder. In any case, in my experience this distinction only becomes an issue when working with higher-dimensional arrays (say larger that 5 or so) where being off by one index is less visually distinctive and getting yelled at by the compiler really helps in the long run. I suppose the linear-algebra folks usually use lower dimensional arrays and thus have no issue.

mlubin · 2017-01-23T17:55:09Z

I can't speak to the difficulty of implementation, but if we're prioritizing user experience, unintentionally providing the wrong number of indices is currently an unfortunate trap. We get complaints about this in JuMP: jump-dev/JuMP.jl#937

JaredCrean2 · 2017-01-23T18:42:51Z

I'm also in in favor of 1 or N. One of the other people in my lab spent nearly a week tracking down a bug caused by this behavior.

mbauman · 2017-01-25T22:47:13Z

Moving milestone now that #20079 is merged.

StefanKarpinski · 2017-07-20T16:47:29Z

What remains to be done here? Now is probably a good time to take this out entirely and try to simplify all the array indexing infrastructure.

mbauman · 2017-07-20T17:07:24Z

I need to spend some time to get #21750 through. Then we can talk about the next steps here. I'd be in favor of one final round of bounds check tightening here, such that we only allow omitted trailing dimensions when the size of those omitted dimensions are all 1.

StefanKarpinski · 2017-08-31T19:55:54Z

With #21750 merged, I believe there are only a few small indexing cases that still need to be deprecated here.

kshyatt added the domain:arrays [a, r, r, a, y, s] label Jan 23, 2016

StefanKarpinski removed the needs decision A decision on this change is needed label Sep 13, 2016

StefanKarpinski assigned mbauman Sep 13, 2016

tkelman mentioned this issue Dec 15, 2016

things we should deprecate, 0.6 edition #19598

Closed

22 tasks

This was referenced Jan 10, 2017

Simplify scalar indexing #19958

Merged

Arraypocalypse Now and Then #13157

Closed

mbauman mentioned this issue Jan 24, 2017

RFC: Deprecate partial linear indexing #20079

Merged

mbauman modified the milestones: 1.0, 0.6.0 Jan 25, 2017

mbauman removed their assignment Jan 25, 2017

mbauman mentioned this issue Feb 2, 2017

0.6: vec*mat throws "Cannot left-multiply a matrix by a vector" even when mat is 1 x n #20389

Closed

timholy added a commit that referenced this issue Feb 13, 2017

Fully deprecate partial linear indexing. Fixes #14770.

a99f379

mbauman mentioned this issue May 22, 2017

(Row)Vector equality with Matrices #21998

Closed

StefanKarpinski assigned mbauman Aug 24, 2017

mbauman mentioned this issue Sep 7, 2017

Deprecate the omission of trailing indices over non-singleton dimensions #23628

Merged

mbauman closed this as completed in #23628 Sep 22, 2017

mbauman mentioned this issue Sep 22, 2017

Express reshape to a certain dimensionality as conversion instead? #23821

Open

cormullion mentioned this issue Mar 18, 2018

NEWS.md is getting a bit untidy #26508

Closed

jd-foster mentioned this issue Mar 5, 2020

Non-explicit index placed without warning jump-dev/JuMP.jl#2190

Closed

deprecate (then remove) generalized linear indexing #14770

deprecate (then remove) generalized linear indexing #14770

Comments

IainNZ commented Jan 23, 2016

timholy commented Jan 23, 2016

eschnett commented Jan 23, 2016

johnmyleswhite commented Jan 23, 2016

IainNZ commented Jan 23, 2016

BobPortmann commented Jan 23, 2016

rfourquet commented Jan 23, 2016

toivoh commented Jan 23, 2016 via email

johnmyleswhite commented Jan 23, 2016

timholy commented Jan 23, 2016

johnmyleswhite commented Jan 23, 2016

JeffBezanson commented Jan 23, 2016

StefanKarpinski commented Jan 23, 2016

tknopp commented Jan 24, 2016

lobingera commented Jan 24, 2016

toivoh commented Jan 24, 2016

eschnett commented Jan 24, 2016

timholy commented Jan 24, 2016

eschnett commented Jan 24, 2016

toivoh commented Jan 24, 2016 via email

mbauman commented Jan 24, 2016

JaredCrean2 commented Jan 24, 2016

lobingera commented Jan 25, 2016

KristofferC commented Jan 25, 2016

ViralBShah commented Jan 25, 2016

toivoh commented Jan 25, 2016 via email

lobingera commented Jan 25, 2016

toivoh commented Jan 25, 2016

toivoh commented Jan 25, 2016 via email

tknopp commented Jan 25, 2016

mbauman commented Dec 9, 2016

StefanKarpinski commented Dec 9, 2016

mbauman commented Jan 18, 2017

Sacha0 commented Jan 22, 2017 • edited

mbauman commented Jan 23, 2017

BobPortmann commented Jan 23, 2017

mlubin commented Jan 23, 2017

JaredCrean2 commented Jan 23, 2017

mbauman commented Jan 25, 2017

StefanKarpinski commented Jul 20, 2017 • edited

mbauman commented Jul 20, 2017

StefanKarpinski commented Aug 31, 2017

Sacha0 commented Jan 22, 2017 •

edited

StefanKarpinski commented Jul 20, 2017 •

edited