Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make I(n) behave as Diagonal / eye? #23156

Closed
StefanKarpinski opened this issue Aug 6, 2017 · 33 comments
Closed

make I(n) behave as Diagonal / eye? #23156

StefanKarpinski opened this issue Aug 6, 2017 · 33 comments
Assignees
Labels
domain:linear algebra Linear algebra kind:speculative Whether the change will be implemented is speculative

Comments

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Aug 6, 2017

It might be nice if we made I callable so that you can write I(10) and get a 10x10 Diagonal matrix object. Of course it's a bit weird to call one kind of object and get another, but we do that with functions all the time so maybe that's no problem. Possible definition:

::UniformScaling)(n::Integer) = Diagonal(collect(Iterators.repeated.λ, n)))

With this definition, we could potentially deprecate eye(n) (which is usually quite inefficient) in favor of I(n) when one doesn't need to mutate any off-diagonals or full(I(n)) in the cases where one does need to mutate an off-diagonal.

@StefanKarpinski StefanKarpinski added domain:linear algebra Linear algebra kind:speculative Whether the change will be implemented is speculative labels Aug 6, 2017
@yuyichao
Copy link
Contributor

yuyichao commented Aug 6, 2017

What advantage does this have? Functions are pretty special (sure they are objects but they are very special ones) so I don't think functions being callable is a good arguments for this. It's a good argument for not implementing algebra or other operations on functions if anything. The syntax will also be really confusing with juxtapose multiplication when you have both I(2) and 2I in the code.

eye is also a well known function name from python and Matlab so I don't think removing it is a good idea. We should certainly make more efficient things easier to do (x * I sounds simple enough though) but I don't think we should make doing less efficient but sometimes necessary things much harder. It's also unclear to me what would be the new syntax to get a eye(Int8, 3) or eye(Bool, 2)

@JeffBezanson
Copy link
Sponsor Member

You can do copy!(Matrix{Int8}(n,n), I). eye is a pun that I've never really liked that much.

@lobingera
Copy link

As in similar places: What problem is solved by this?

@KlausC
Copy link
Contributor

KlausC commented Aug 7, 2017

When I heard eye the first time in the MATLAB context, I typed I. I thought somebody was kidding, when said: <Type "e", "why", "e">. So I am conditioned for 'I'.

@StefanKarpinski
Copy link
Sponsor Member Author

We currently have eye, I and Diagonal (not to mention speye), it would be nice to have one less function for making a linear algebra identity operator. There's almost never any reason to construct a dense one, so it's actively bad to have eye around since using I would be better in 99% of case. Since one want to use it in situations like A + I and [A I; I B] already, it seems much more natural to use I(n) for constructing a particular size of identity matrix instead of explaining "oh, there's this other thing with a really punny name to do that, but don't actually use it since it's not very efficient, instead do Diagonal(collect(Iterators.repeated(1, n)))". I'm proposing that we provide a more convenient spelling of that last expression – and I is a pretty natural name for it.

@StefanKarpinski
Copy link
Sponsor Member Author

I kind of think we should double down on the whole I thing and allow it to produce rectangular matrices and make Z a binding for 0I that lets us write things like [A Z; Z B]. Either that or allow a scalar to implicitly broadcast in concatenation expressions so you can write [A 0; 0 B]. Writing [A zeros(size(A,2), size(B,1)); zeros(size(B,1), size(A,2)) B] is such a drag.

@tkelman
Copy link
Contributor

tkelman commented Aug 7, 2017

Diagonal(ones(n)) isn't that long. Also a different type than UniformScaling, odd to use callable I to construct something of a different type.

@JeffBezanson
Copy link
Sponsor Member

I'm also fine with Diagonal(ones(n)) and Array(Diagonal(ones(n))) (for a full eye replacement). It's pretty clear, and as Stefan said you don't really want to be constructing big mutable dense identity matrices if you can avoid it.

@RalphAS
Copy link

RalphAS commented Aug 10, 2017

@StefanKarpinski "I kind of think we should double down … and make Z a binding for 0I "
Please, please, don't do this. Exporting single-character variables from the base library is rarely wise. (In fact, it's almost always foolish, but I'll save that for another day.) Those of us who actually use linear algebra employ Z for lots of obvious purposes, among which your proposal does not figure.

@TotalVerb
Copy link
Contributor

0I seems like a reasonable way to express 0I, so there's no need to spell zeros in full.

@ViralBShah ViralBShah added the status:triage This should be discussed on a triage call label Oct 21, 2017
@ViralBShah ViralBShah added this to the 1.0 milestone Oct 21, 2017
@StefanKarpinski
Copy link
Sponsor Member Author

This issue as stated is not breaking, so doesn't belong on the milestone. However, it has been brought up previously that we may want to get rid of eye since it is inefficient and never really the best alternative. Deprecating it would force people to consider better alternatives. So perhaps that should be the real question at hand here. Should we deprecate eye and if so, do we have sufficient facilities to cover all use cases of it?

@JeffBezanson
Copy link
Sponsor Member

I would deprecate eye to something like Array(Diagonal(ones(n))) --- is there any way to do that without allocating the vector of ones? I see copy!(zeros(3,3), I) works, but that depends on mutation so isn't ideal.

@StefanKarpinski
Copy link
Sponsor Member Author

Note that eye definitely belongs to the LinAlg module so could go in stdlib rather than Base.

@StefanKarpinski
Copy link
Sponsor Member Author

@Sacha0 has agreed to give replacing eye with better things a try and seeing what we need.

@StefanKarpinski StefanKarpinski removed the status:triage This should be discussed on a triage call label Oct 26, 2017
@Sacha0
Copy link
Member

Sacha0 commented Oct 27, 2017

@Sacha0 has agreed to give replacing eye with better things a try and seeing what we need.

To start I skimmed through all eye/speye calls in base, and then sketched a pull request deprecating speye (#24356). Observations from that deprecation sketch follow below, most of which also apply to eye.

speye takes the following forms: [sp]eye([T::Type], m::Integer[, n::Integer]) and speye(A::AbstractMatrix).

Where speye(m::Integer[, n::Integer]) cannot be replaced with something more clever (e.g. I alone), sparse(1.0I, m[, n]) works, as would SparseMatrixCSC{Float64}(I, m, n) (when that constructor exists). Fine so far, though having to specify the Float64 in SparseMatrixCSC{Float64} due to eltype(I) defaulting to Int may be slightly cumbersome.

Replacing speye(T, m[, n]) is trickier: Where typeof(one(T) * eltype(I)) == T, sparse(one(T)I, m[, n]) or some equivalent works reasonably well. But where typeof(one(T) * eltye(I)) != T, e.g. in speye(Bool, m), life is a bit tricker: Seemingly one must either write sparse(UniformScaling(one(T)), m[, n]), which is a bit cumbersome due to the UniformScaling(one(T)), or SparseMatrixCSC{T}(I, m, n) (when that constructor exists). A compact way to construct a UniformScaling with specified eltype might smooth that wrinkle out. Similarly, e.g. speye(Real, m) is tricky, as sparse(UniformScaling{Real}(1), m) yields a SparseMatrixCSC{Int} rather than SparseMatrixCSC{Real}. That either seems like a bug in sparse that should be fixed, or calls for a method to handle SparseMatrixCSC{Real}(I, m, n).

speye(A) is similarly tricky: The best rewrite I have so far is sparse(UniformScaling(one(eltype(A))), size(A)...), which is a bit cumbersome. Here again a compact way to construct a UniformScaling with specified eltype might be nice. Supporting array constructors that accept an iterable as their first argument, and another AbstractArray from which eltype and shape are taken as the second argument, might also smooth this bump out (e.g. SparseMatrixCSC(I, A)).

Methods for comparison against identity (foo == I, foo ≈ I) could be useful here and there too. Best!

@fredrikekre
Copy link
Member

To start I skimmed through all eye/speye calls in base

I'm curious, how often did you encounter calls like speye([T, ]m, n) where m !== n? Since it does not create an identity matrix it is rather weird that is even supported? Perhaps we can just discontinue that construction? Would make this simpler I imagine.

@Sacha0
Copy link
Member

Sacha0 commented Oct 27, 2017

I'm curious, how often did you encounter calls like speye([T, ]m, n) where m !== n? Since it does not create an identity matrix it is rather weird that is even supported? Perhaps we can just discontinue that construction? Would make this simpler I imagine.

Fantastic question! In the speye deprecation and excluding tests specific to the corresponding methods, I found only one instance where m != n (and in that instance, the matrix's contents could just as well be zero or junk). In other words, at least the speye([T, ]m, n) (and so sparse(I, m, n)) methods are likely superfluous.

Similarly, I found only three uses of speye(A). Those three uses are essentially identical, occurring in definitions of arithmetic between SparseMatrixCSCs and UniformScalings. From the speye deprecation diff:

-(+)(A::SparseMatrixCSC, J::UniformScaling) = A + J.λ * speye(A)
-(-)(A::SparseMatrixCSC, J::UniformScaling) = A - J.λ * speye(A)
-(-)(J::UniformScaling, A::SparseMatrixCSC) = J.λ * speye(A) - A
+(+)(A::SparseMatrixCSC, J::UniformScaling) = A + sparse(J, size(A)...)
+(-)(A::SparseMatrixCSC, J::UniformScaling) = A - sparse(J, size(A)...)
+(-)(J::UniformScaling, A::SparseMatrixCSC) = sparse(J, size(A)...) - A

As shown, those uses are perhaps better written sparse(J, size(A)...). But these methods could be written more efficiently without the sparse(J, size(A)...)/speye(A) altogether. Moreover, the same arithmetic operations between AbstractMatrixs and UniformScalings throw if the AbstractMatrix is not square. So one could argue that the uses in the copied methods are incorrect. Altogether, the speye(A) methods might also be superfluous.

So perhaps deprecating sparse(I, m, n) (as opposed to sparse(I, n)) alongside speye is reasonable, which would simplify life a bit.

I will keep this question in mind while reviewing uses of eye again! Thank you! :)

@Sacha0
Copy link
Member

Sacha0 commented Oct 27, 2017

On second thought, with the future SparseMatrixCSC(I, m, n) around, perhaps sparse(I, m, n) should remain around as well? Thoughts? Best!

@Sacha0
Copy link
Member

Sacha0 commented Oct 28, 2017

Another observation:

About a third of eye invocations in test/ are in spirit either res == eye(...) or res ≈ eye(...). Where the intention of such tests is to check that the contents of res match identity, res == I and res ≈ I are superior rewrites. But such tests also implicitly test res's shape, and whether that shape check was intended / is an important part of the test is rarely (if ever) clear.

So while most of those test could be nicely rewritten as res == I or res ≈ I, strictly they should be (res = ...; res == I && size(res) == ...). On the one hand, where necessary the latter rewrite might sometimes be a bit verbose. On the other hand, the latter rewrite makes what the test actually tests explicit, which is of tremendous benefit to parties reading/modifying the tests. Best!

@Sacha0
Copy link
Member

Sacha0 commented Oct 29, 2017

Additional observations. Most eye calls serve one of the following purposes, ordered roughly by prevalence:

  1. Comparison (==, ) against identity, discussed in the preceding comment.
  2. Concise array construction, where frequently the contents do not much matter.
  3. Construction of a dense identity matrix for subsequent in-place multiplication.
  4. Construction of a dense identity for subsequent out-of-place multiplication (with some implicitly represented operator).

For purposes (3) and (4), Matrix(I, m, n) for eye(m[, n]) seems like a good rewrite. The catch: (3) requires eltype(Matrix(I, m, n)) <: AbstractFloat and (4) usually expects the same, while eltype(Matrix(I, m, n)) would beInt64 presently (or Bool with #24396) due to corresponding eltype(I). Matrix(1.0I, m, n) and Matrix{Float64}(I, m, n) do the trick but are somewhat less pleasant. Consequently, special-casing Matrix(I, m, n) to return a Matrix{Float64} might be worthwhile given that's likely the most useful result. (And likewise with future Array([zeros|ones], dims) methods, matching existing zeros/ones behavior.) Best!

@andreasnoack
Copy link
Member

Consequently, special-casing Matrix(I, m, n) to return a Matrix{Float64} might be worthwhile given that's likely the most useful result.

For computations of eigen and singular vectors you'd like to be able to specify the element type. It could be Float32, BigFloat, or Complex{<:AbstractFloat} as well so you'd typically use the generic version anyway. Hence it might not be too important to avoid Int elements by default. I think Matrix{T}(I, n) would be fine in those applications.

@Sacha0
Copy link
Member

Sacha0 commented Oct 30, 2017

Sounds good on this end! :)

@ViralBShah
Copy link
Member

Alternate issue title: An I for an eye.

@Sacha0
Copy link
Member

Sacha0 commented Nov 20, 2017

With #24438 and the other pull requests linked above in, this issue should be set :).

@Sacha0 Sacha0 closed this as completed Nov 20, 2017
@andreasnoack
Copy link
Member

I think we should consider having I(n) alongside Matrix(I, n, n). It would be a nice simplification for a common teaching situation where you'd like the identity to have a size and it shouldn't really get in the way of anybody. In practice I doubt that 2I vs I(2) will be a real concern.

@alanedelman
Copy link
Contributor

alanedelman commented Oct 11, 2018

I agree. (says the teacher of large linear algebra classes)

@StefanKarpinski
Copy link
Sponsor Member Author

It's a little weird that calling an instance of UniformScaling creates an instance of Matrix, but 🤷‍♂️. It does seem quite convenient and intuitive, so 👍.

@andreasnoack andreasnoack reopened this Nov 15, 2018
@StefanKarpinski
Copy link
Sponsor Member Author

This keeps coming up—let's do it.

@jebej
Copy link
Contributor

jebej commented Nov 16, 2018

Also useful for tensor products kron(A,I(4)) where the size of the identity isn't determined from A.

@tkf
Copy link
Member

tkf commented Nov 16, 2018

Re: implementation, how about adding a lazy non-allocating representation of x * ones(n) (say, UniformVector; something similar to FillArrays.Ones), and define (λ * I)(n) via

::UniformScaling)(n::Integer) = Diagonal(UniformVector.λ, n))

UniformVector is useful in other places like nzval of sparse arrays.

@jlapeyre
Copy link
Contributor

Some comments:

eye may be a bad pun, but it is an excellent mnemonic.

My guess is that having eye taken away without a clearly documented (anywhere documented ?), simple replacement is upsetting to many users. I have seen this anecdotally several times. I don't think they care whether the identity matrix is lazy or not, just that they have something that is not too much more difficult to type and to remember than eye, and don't have to think about it anymore. This may be related to Julia becoming less of an application and more of a language (ie moving away from MATLAB), and the need for a more application-like layer.

Allocation is not always slower than a lazily evaluated object (BBN-Q/QuantumInfo.jl#3):

import FillArrays
import LinearAlgebra
using BenchmarkTools

eye(n::Integer) = Matrix{Float64}(LinearAlgebra.I, (n, n))

ikron_dense(n, mat) = kron(eye(n), mat)
ikron_lazy(n, mat) = kron(FillArrays.Eye(n), mat)

rikron_dense(n, mat) = kron(mat, eye(n))
rikron_lazy(n, mat) = kron(mat, FillArrays.Eye(n))

const nsmall = 2
const nbig = 100

const mat_small = rand(nsmall, nsmall)
const mat_big = rand(nbig, nbig);

macro sbtime(expr)
    quote
        println($(expr |> string))
        @btime $expr
    end
end

@sbtime ikron_dense(nbig, mat_big);
@sbtime ikron_lazy(nbig, mat_big);

println()

@sbtime rikron_dense(nbig, mat_big);
@sbtime rikron_lazy(nbig, mat_big);

println()

@sbtime ikron_dense(nsmall, mat_small);
@sbtime ikron_lazy(nsmall, mat_small);

println()

@sbtime ikron_dense(nsmall, mat_big);
@sbtime ikron_lazy(nsmall, mat_big);

println()

@sbtime rikron_dense(nsmall, mat_small);
@sbtime rikron_lazy(nsmall, mat_small);

println()

@sbtime rikron_dense(nsmall, mat_big);
@sbtime rikron_lazy(nsmall, mat_big);

println()

@sbtime ikron_dense(nbig, mat_small);
@sbtime ikron_lazy(nbig, mat_small);
ikron_dense(nbig, mat_big)
 157.041 ms (4 allocations: 763.02 MiB)
ikron_lazy(nbig, mat_big)
  209.153 ms (2 allocations: 762.94 MiB)

rikron_dense(nbig, mat_big)
  304.093 ms (4 allocations: 763.02 MiB)
rikron_lazy(nbig, mat_big)
  358.267 ms (2 allocations: 762.94 MiB)

ikron_dense(nsmall, mat_small)
  101.824 ns (2 allocations: 320 bytes)
ikron_lazy(nsmall, mat_small)
  61.975 ns (1 allocation: 208 bytes)

ikron_dense(nsmall, mat_big)
  41.494 μs (3 allocations: 312.69 KiB)
ikron_lazy(nsmall, mat_big)
  41.486 μs (2 allocations: 312.58 KiB)

rikron_dense(nsmall, mat_small)
  105.991 ns (2 allocations: 320 bytes)
rikron_lazy(nsmall, mat_small)
  68.820 ns (1 allocation: 208 bytes)

rikron_dense(nsmall, mat_big)
  60.869 μs (3 allocations: 312.69 KiB)
rikron_lazy(nsmall, mat_big)
  76.319 μs (2 allocations: 312.58 KiB)

ikron_dense(nbig, mat_small)
  64.835 μs (4 allocations: 390.78 KiB)
ikron_lazy(nbig, mat_small)
  59.712 μs (2 allocations: 312.58 KiB)

@vtjnash vtjnash modified the milestones: 0.7, 1.1 Dec 6, 2018
@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 6, 2018

Stefan said above to do this, so I fixed the milestone.

@JeffBezanson
Copy link
Sponsor Member

Since there is no PR for this yet I think it needs to be moved to the next milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:linear algebra Linear algebra kind:speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests