Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add +/- for Number and AbstractMatrix to LinearAlgebra #29951

Closed
wants to merge 2 commits into from

Conversation

dalum
Copy link
Contributor

@dalum dalum commented Nov 7, 2018

This is a follow-up to: #22880.

With the deprecation of automatic broadcasting and things having seemingly settled, it is now possible to introduce automatic promotion of Number to UniformScaling. The arguments for this can be found in the issue linked above, but distils to:

  • a desire for + to obey the common distributive property of (A + B)*C == A*C + B*C for all mathematical objects in Julia where this is generally valid; this does not currently hold for (matrix + scalar)*vector == matrix*vector + scalar*vector
  • a desire to be able to write generic code like f(x, y) = x^2 + y^2 + 2*x*y which does not require special casing on scalars and matrices; the "generic" definition f(x, y) = (I*x)^2 + (I*y)^2 + 2*(I*x)*(I*y) will promote f(::Number, ::Number) to a UniformScaling.

The type piracy done here currently has the unfortunate side–effect of adding these methods to the method table of + and - without the user doing using LinearAlgebra, similar to how matrix multiplication is also exported.

Ideally, I think the feature implemented in this PR should only be available after using LinearAlgebra, but I have not been able to find a good way to do it.

@antoine-levitt
Copy link
Contributor

I think this would be a wonderful change, and would love to be able to do A - z. I also think it should not be made, because the ratio of potential confusion to added convenience is just too high.

1 it would be very idiosyncratic to julia, I don't know of any other language that does this, and in matlab A+x means A .+ x.
2 It would be the only instance (I think?) where x + y and x .+ y would both be valid but do different things
3 It does not generalize to anything else than 2D + 0D: a+b would be valid if dim a == dim b or dim a = 2, dim b = 0, which feels a bit weird
4 It makes little sense for 2D arrays that are not matrices (eg an image)

@andreasnoack
Copy link
Member

andreasnoack commented Nov 7, 2018

I also tend to think that the possible confusion this could cause outweigh the benefits here although I'm sympathetic to this change. Coming up with a compelling example might tip the balance, though. As it stands, I'm not convinced by f(x, y) = x^2 + y^2 + 2*x*y. In a generic setting, I'd expect x and y to be either both Numbers or Matrixs and those cases already work.

Regarding @antoine-levitt's comments:

it would be very idiosyncratic to julia

Probably true for most languages but PARI/GP actually does this

? matid(3) + 1
%2 = [2, 0, 0; 0, 2, 0; 0, 0, 2]

It does not generalize to anything else than 2D + 0D:

This is also the case for * and \ so I don't think this, by itself, it disqualifying.

It makes little sense for 2D arrays that are not matrices (eg an image)

I suspect this is also true for many of the existing linear algebra operations and is something we'll have to live with to allow that Matrix is both a general 2D array and a matrix.

@antoine-levitt
Copy link
Contributor

I suspect this is also true for many of the existing linear algebra operations and is something we'll have to live with to allow that Matrix is both a general 2D array and a matrix.

Agreed, which is why I think we should strive to minimize the potential confusion, and that anyone not familiar with the specific idiosyncrasies of julia should be able to know what the code does without reading the docs. I tend to think * and \ are the only exceptions, because it's so common to write A*B, and writing numpy-style dot(A,B) is such a pain, but maybe that's just matlab stockholm syndrome. Julia has moved away from this by having exp/log/sin/cos mean the matrix functions, but there the argument could be made that not doing so would make these features significantly more cumbersome (expm/logm/etc), and that those operations are rare enough for the user to know what they're doing anyway.

(+)(a::Number, A::AbstractMatrix) = UniformScaling(a) + A
(+)(A::AbstractMatrix, a::Number) = A + UniformScaling(a)
(-)(a::Number, A::AbstractMatrix) = UniformScaling(a) - A
(-)(A::AbstractMatrix, a::Number) = A - UniformScaling(a)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could/should be done via the promotion mechanism. Interestingly, that would address the behavior of scaling a matrix by a scalar as well.

@StefanKarpinski
Copy link
Sponsor Member

I'm not 100% sure about whether we should do this but I like it and think that we should definitely have the discussion now that we have room for the feature. I wonder if it would be best to leave room for it and make it a pirate package, say, MatrixScalarOps.jl that defines these operations.

@andyferris
Copy link
Member

I’m in favour. We’ve always strived for a syntax in common with mathematical language (and yes, most languages screw this up, even MATLAB). IMO gaining the distributive law would plug a relatively major inconsistency.

@mschauer
Copy link
Contributor

mschauer commented Nov 7, 2018

@andyferris I was writing a statement along the same lines:

I think Julia has in total done well will the fundamental approach to have algebraic reasoning to be a major motivation regarding the meaning of +, -, * and /, one and zero . This has to be communicated to newcomers in any case, e.g. one(::Array), A*B, "a"*"b" and A == B and others more.

@dalum
Copy link
Contributor Author

dalum commented Nov 8, 2018

Agreed, which is why I think we should strive to minimize the potential confusion, and that anyone not familiar with the specific idiosyncrasies of julia should be able to know what the code does without reading the docs.

Doesn't this go both ways? My initial expectation, without reading the docs, was that array + scalar would promote scalar to a UniformScaling, because Julia generally obeys the principle of looking and behaving like math. At least this was one of my first impressions of Julia: this is the language where I can basically write LaTeX and have it run.

The way I see it, the argument for the current behaviour (erroring) is that people who think like me repeatedly forgot that array + scalar did not add to the diagonal, because the automatic broadcasting behaviour was inconsistent with how much of the rest of Julia felt like. Thus, I am very sympathetic to the view that it makes little sense to add to the diagonal of a 2D-array, if thought of in terms of an image, and that introducing a change like this could result in bugs for people who wants to add a background to a square image or a similar dataset. I have no personal experience with it, though, and I don't know how many people get the MethodError: no method matching +(::Float64, ::Array{Float64,2}) error when attempting to do this currently. Does this happen to people?

EDIT: And if it does, how does it compare to, say, potential errors due to exp, sin, cos, log also not erroring on square matrices?

@stevengj
Copy link
Member

stevengj commented Nov 8, 2018

This change seems like an invitation for bugs and confusion. Since array + scalar is mathematically ambiguous, I think it is better to require the user to be explicit. It's not that hard to write matrix + scalar*I if that is what is desired.

It also seems weird to me to allow array + scalar only when array is a matrix. I agree with @antoine-levitt that it seems awfully idiosyncratic in over-emphasizing linear-algebra applications.

@StefanKarpinski
Copy link
Sponsor Member

Why can't these definitions live in an opt-in package?

@dalum
Copy link
Contributor Author

dalum commented Nov 8, 2018

I think the opt-in package with a license to pirate is the best solution. If/when LinearAlgebra gets completely pulled out from base, then I think it would be reasonable to put the definitions in there, though.

@StefanKarpinski
Copy link
Sponsor Member

license to pirate

😂 ❤️

@antoine-levitt
Copy link
Contributor

Privateer.jl? :)

If the opt in package option is chosen, it'd be great to move the matrix functions (exp/log/etc) there eventually (although that would be a breaking change)

@stevengj
Copy link
Member

stevengj commented Nov 8, 2018

license to pirate

Technically, this should be called a letter of marque.

@mschauer
Copy link
Contributor

mschauer commented Nov 8, 2018

As I understood, this would not be trivial: any package which uses the pirate package would become a pirate.

@bramtayl
Copy link
Contributor

bramtayl commented Nov 8, 2018

So it should be a copy-left marque?

@dalum
Copy link
Contributor Author

dalum commented Nov 9, 2018

I have been thinking about the piracy issue, and I think it would be a lot simpler if exported functions from base, etc. could be overridden by other exports without qualification. To prevent unintended exports, this should only be allowed from special pirate modules. I. e., a module could be granted the letter of marque by declaring it as a piratemodule:

piratemodule LinearAlgebraOperators
export +
using LinearAlgebra
@inline +(args...) = Base.:+(args...)
@inline +(x::Number, A::AbstractMatrix) = Base.:+(UniformScaling(x), A)
end

The splatting here doesn't seem to have any performance issues and produce the exact same @code_llvm.

It would also resolve this issue on Discourse: https://discourse.julialang.org/t/option-to-start-repl-in-baremodule-mode/17244/3

As an afterthought, I guess it wouldn't really be piracy anymore, just intrusion.

EDIT: It would work really badly with things that rely on dispatching on typeof(Base.:+), though, so it would require something like subtyped functions, i. e., typeof(LinearAlgebraOperators.:+) <: typeof(Base.:+).

@fredrikekre
Copy link
Member

That is already possible:

julia> module LinearAlgebraOperators
           using LinearAlgebra
           +(args...) = Base.:+(args...)
           +(x::Number, A::AbstractMatrix) = Base.:+(x*I, A)
       end
Main.LinearAlgebraOperators

julia> using .LinearAlgebraOperators: +

julia> 1.0 + 0.0
1.0

julia> 1.0 + zeros(2, 2)
2×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0

@dalum
Copy link
Contributor Author

dalum commented Nov 9, 2018

That has the explicit +, and works pretty badly if you have already referenced +, though:

julia> module LinearAlgebraOperators
           using LinearAlgebra
           +(args...) = Base.:+(args...)
           +(x::Number, A::AbstractMatrix) = Base.:+(x*I, A)
       end
Main.LinearAlgebraOperators

julia> 1 + 2
3

julia> using .LinearAlgebraOperators: +
WARNING: ignoring conflicting import of LinearAlgebraOperators.+ into Main

@andyferris
Copy link
Member

andyferris commented Nov 9, 2018

Well, in general yes I'm in favour of developing as many features outside of Base and porting in the ones that made sense. Even vector transposes was developed/prototyped in a package. (Matrix transpose in a package quickly proved impossible...) But this is a tiny four method change.

We've strived very hard for consistent and reliable mathematical syntax where +, -, *, / etc follow the usual rules for rings and fields and so-on. Hell, the parser seems to assume associativity of +, so we're definitely opinionated about these operators and their relationships to each other! I'd point out with this PR we get

  • (+, *, Union{Number, SquareMatrix}) is a ring
  • (.+, .*, Union{Number, SquareMatrix}) is a ring

Of course there's no real type SquareMatrix but you get what I mean. I feel the above pairings are kind-of beautiful. @stevengj when I read

Since array + scalar is mathematically ambiguous

I didn't particularly agree, since if we use the old (broadcasting) meaning of array + scalar then (+, *) doesn't act together as a ring. While the old system wasn't ill-posed or anything nasty like that, it isn't nearly as consistent as the usual written usage of these operators in typical algebraic contexts (where one would be very suspicious that + and * follow a distributive law if you were reading equations that others had written by hand). Similarly

over-emphasizing linear-algebra applications

I'd note that in taking matrix transposes seriously it seemed to me that there was an expression that we would be attempting to take a very linear algebraic view of things, beyond Number and Array{Number} (including Array{Array{Number}} amongst other, more abstract linear algebraic types) and that we'd be attempting to use conventional linear algebra notation around these operations (including +, *, and '). My summary of that issue is that we would be trying hard in LinearAlgebra to support abstract linear algebra -and I'm personally surprised that this line of thinking hasn't been followed through in this issue.

(I also feel that there won't be a huge wave of contusion if we implemented this... people dealt with all the transpose and adjoint and permutedims changes quite well, as well the switch away from auto-broadcasting operations, and the community has embraced whole-heartedly the beautiful generalization of .op syntax throughout).

@StefanKarpinski
Copy link
Sponsor Member

While I ❤️ this view very much, to play devil's advocate, there are a lot of Matlab converts who are already kind of bent out of shape that A + 1 doesn't work the way they expect it to; how mad do you think they'll be when it does work but silently does something quite different from what they expect? The whole v' business still works the way people expect it to or fails if not.

@andyferris
Copy link
Member

andyferris commented Nov 10, 2018

OK, let's address that. If I understand, the main reason people are against this is due to a fear of people expecting different behavior and creating bugs in their code.

I may be wrong, but I'd actually expect ex-MATLAB users to be least likely to get actual production bugs out of this. By-and-large, MATLAB users will come with at least a basic understanding of linear algebra. They are already familar with the difference between * and .*. They are used to playing at the REPL, they are likely to try things like 1 + [1 2; 3 4] interactively (especially if they aren't getting the results they expect). They will learn that expm has been replaced be exp, that less operations auto-broadcast (and they would be likely to embrace dot-broadcast syntax because they already understand the difference between * and .*). In academia and industry, I've worked with a bunch of people transitioning from MATLAB to Julia, and I've never seen them complain about the way we've strived to clean up the corner cases, being more rigorous both mathematically and as a tool for software engineering. Surprised at some differences at times, for sure, but not terribly confused, and certainly not annoyed. Generally, they actively like that we get these things right - that's why they switched in the first place.

I'd be more worried about people from a more pure software background and who have used systems with auto-broadcasting in the past, and for whom the concept of adding a scalar to the diagonal of matrix would never have occured to them as a likely operation that people use (because they haven't had much linear algebra exposure). Of course, a pure software engineer will frequently be solving problems using non-square arrays (where the error message will set them on the right track), they know they should at least skim-read the docs of the interfaces they employ, they may have some unit tests around their production code, and so on. They will encounter that other operations don't auto-broadcast, and note that * does something interesting with matrices and vectors. I'd have faith that, given we document this well, that they'll learn pretty quickly.

Then of course, there is the occasional typo, where one types + instead of .+, or whatever. Again, this will error for everything that isn't a square matrix, and even in that case unit tests and/or verifying outputs will indicate there is a problem somewhere. And typos can happen anywhere in code - we've never worried about * and .* being similar to type, either. This argument was gone over in #23233 (as was the expectations of MATLAB and numpy users) and was overruled. I don't know about other people, but once I got used to fused-dot-broadcast syntax, my eyes intuitively scan for .s and groups of broadcasted operations everywhere in my Julia code.

The one thing we can say about software engineers and scientists and that we are good at pattern recognition. We've got a pretty consistent system in Julia. All elementwise operations use broadcasting, all the time. Linear algebra is typically done with + and * (like matrix * vector) and other un-dotted operations. There is a distinction between exp(matrix) and exp.(matrix). Honestly - our users are attracted to Julia because of these things! I'd be surprised to hear of people coming to learn "a fresh approach to technical computing" and then being annoyed because it is not 100% like the previous programming environments they used before, that they become angry because something is different. If a new user happens to shoot themselves in the foot with squarematrix + scalar, they are going to think "damn, I shouldn't have made assumptions", they will read some more docs and talk to other users and everything will click. I feel that the discussion around this change (and equally the change from expm to exp in #23233, etc) is surrounded by a lot of fear and uncertainty, on behalf of the imagined new user. Well, my imagination is different. When I learn a new programming language, I am generally excited and willing to learn how things work, drawn to it for one reason or another. I can't blame the language for my assumptions - it is what it is, before I came along. I make mistakes, I learn, and I move on.

To explain why I'm writing so much about this: To me, ultimately, this change is the final stage in an evolution of our array and linear algebra system involving many steps, such as #4774, the removal of auto-broadcasting ops in favour of dot notation, the replacement of expm with exp and so-on. The argument for this is precisely the same one for expm -> exp, with precisely the same potential for missing .s, so it seems inconsistent to me to merge #23233 and not this. With this change, we can express everything we normally express on paper with linear algebra. Taylor expansions will just work. Everything that should be a ring, is a ring. The linear algebra system we have built for Numer, AbstractVector and AbstractMatrix will be... complete. Whole. Self-consistent. I hope I'm not being too poetic or dramatic, but there it is - that's what I believe.

@perrutquist
Copy link
Contributor

perrutquist commented Nov 13, 2018

Personally, I would love it if Julia 2.0 established equivalence between scalar numbers and UniformScaling via conversion and promotion. However, this would imply 1 == [1 0; 0 1] returning true. (It currently evaluates to false.) So it would be a big change.

Going half-way and only overloading + and -, means that iszero(a - b) can be true at the same time as a == b is false, and that is a bit inconsistent.

@StefanKarpinski
Copy link
Sponsor Member

Frankly, 1 == [1 0; 0 1] seems like a very bad idea.

@perrutquist
Copy link
Contributor

I agree that 1 == [1 0; 0 1] may seem bad, but i.m.o. the current behavior: (0 == [0 0; 0 0]) === false seems worse. It implies that there is a way of comparing a scalar to a matrix, but that comparison doesn't fit any reasonable mathematical definition.

Unless 1 == [1 0; 0 1], this PR violates the principle that two things whose difference can be computed and is zero are considered equal. I think that is at least as important as the distributive property given as motivation.

@andyferris
Copy link
Member

andyferris commented Nov 14, 2018

@perrutquist In my view, promotion and conversion (which AFAICT kinda defines equivalence between different Julia types) are quite different things. The proposal of this PR is to have numbers promote to UniformScaling in the context of a certain limited set of linear algebra operations. There is no desire to somehow say they are equivalent. For example, for the purposes of some linear algebra operations we might consider 1 and [1] to be equivalent. They're both rank-1 vector spaces, they're both linear operators, they're both elements of fields, etc. But Julia is a programming language not just a maths program - an array is a container with it's own interface which numbers don't necessarily share, and for software engineering tasks one of the best things about Julia as compared to say MATLAB is the way different things you interact with in different ways as a programmer are not considered equivalent. And yes, we do have a fallback definition ==(::Any, ::Any) = false so that by default objects of different types are considered not equal (rather than throwing an error). Basically - == is a Julia programming language concept, and ==(::Number, ::Number) is overloaded to make sense for particular mathematical comparisons. Not the other way around.

@dalum
Copy link
Contributor Author

dalum commented Nov 14, 2018

I think the reason why I feel resistance towards setting 1 == [1 0; 0 1] is that it violates the transitive property of equivalence, i. e., if a == b, b == c then a == c. If b is scalar and a, c are square matrices with different sizes, this property would be broken. This property still holds with this PR, i. e., if a == c is false, a + b == c + b is also false.

I think the transitive property is more important than the implied comparative relationship between scalars and matrices. I don't think == should error when given objects of types that could never compare equal.

@dalum
Copy link
Contributor Author

dalum commented Nov 14, 2018

W. r. t. Matlab users, then I asked one of the Matlab users in my local circle what they would expect array + scalar to do in Julia. They said they would expect it to add to all of the elements, and you have to write eye if you want it on the diagonal. They added, however, that this was annoying and would prefer that it added to the diagonal by default.

This is an anecdote, of course, and since I'm obviously biased, I cannot rule out the possibility that my framing of the question had some influence. However, I think that if the argument against this PR is that Matlab converts will get bugs in their code, then it should be backed up by stronger data than feelings and anecdotes.

Does anyone know if a related community survey has been conducted?

@perrutquist
Copy link
Contributor

Very good point about the transitive property of equivalence. I guess that means it is impossible to define == in a way that satisfies all basic principles. So then the question instead becomes where to draw the line.

(Note that this transitive property is already broken is a is a UniformScaling and b and c are different size matrices.)

Somewhat related: The fact that == function errs on the side of not throwing an exception is unusual for Julia, which usually avoids "do what I mean". I'd personally prefer e.g. [0 0] == [0 0 0] to throw a DimensionMissmatch rather than silently returning false.

@andyferris
Copy link
Member

I'd personally prefer e.g. [0 0] == [0 0 0] to throw a DimensionMissmatch rather than silently returning false.

Again, this won't work for software engineering tasks. Arrays are also our lists, and it's perfectly valid to ask whether two lists are the same, even if they have different lengths. (Also note that when you want this comparison on Vector{Any}s to work by recursively calling == on the elements, then you need == defined between any two Julia types).

(Note that this transitive property is already broken is a is a UniformScaling and b and c are different size matrices.)

This is unfortunate, yes...

I can't remember where (or what the conclusion was), but it was once discussed whether UniformScaling is even needed anymore after the functionality in this PR is merged. I'm trying to think of what other contexts I use I and UniformScaling that I mightn't do just as well with a Number.

@dalum
Copy link
Contributor Author

dalum commented Nov 14, 2018

I think this is the discussion you are thinking of, @andyferris? #22880 (comment)

But perhaps it would be better to figure out if the automatic promotion should happen first or not, and then use that as a springboard to see if UniformScaling could be deprecated. 🙂

EDIT: It seems like there was very little discussion surrounding the introduction of I == matrix (#24380) not to mention the fact that it isn't transitive.

@perrutquist
Copy link
Contributor

Here's another idea: Let LinearAlgebra introduce a new abstract type AbstractSquareMatrix <: AbstractMatrix. Then allow +(::Number, ::AbstractSquareMatrix) while still disallowing adding scalars to general matrices. This makes the behavior opt-in, as one is not obliged to use a SquareMatrix type just because a matrix happens to be square.

The documentation should state that these types refer to matrices that are square in the linear algebra sense. For example, an image that happens to be square should probably not use a SquareMatrix type.

Hermitian, Symmetric and Diagonal could be subtypes of AbstractSquareMatrix and hence be subject to the new behavior by default, I guess. (I don't think there's much risk of confusion with these.)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 12, 2021

Seems to be a lot of contention here, with more down-votes than upvotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet