Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new syntax for transpose #21037

Closed
StefanKarpinski opened this issue Mar 15, 2017 · 103 comments · Fixed by #38062
Closed

new syntax for transpose #21037

StefanKarpinski opened this issue Mar 15, 2017 · 103 comments · Fixed by #38062
Labels
linear algebra Linear algebra parser Language parsing and surface syntax

Comments

@StefanKarpinski
Copy link
Sponsor Member

Now that .op is generally the vectorized form of op, it's very confusing that .' means transpose rather than the vectorized form of ' (adjoint, aka ctranspose). This issue is for discussing alternative syntaxes for transpose and/or adjoint.

@mbauman
Copy link
Sponsor Member

mbauman commented Mar 15, 2017

Andreas tried Aᵀ (and maybe Aᴴ) in #19344, but it wasn't very well received. We could similarly pun on ^ with special exponent types T (and maybe H) such that A^T is transpose, but that's rather shady, too. Not sure there are many other good options that still kinda/sorta look like math notation.

@StefanKarpinski
Copy link
Sponsor Member Author

I kind of think that t(A) might be the best, but it's unfortunate to "steal" another one-letter name.

@nalimilan
Copy link
Member

Moving my comment from the other issue (not that it solves anything, but...):

+1 for using something else than .'.

I couldn't find languages with a special syntax for transposition, except for APL which uses the not-so-obvious , and Python which uses *X (which would be confusing for Julia). Several languages use transpose(X); R uses t(X). That's not pretty, but it's not worse than .'. At least you're less tempted to use ' by confusing it with .': it would be clear that these are very different operations.

See Rosetta code. (BTW, the Julia example actually illustrates conjugate transpose...)

@mauro3
Copy link
Contributor

mauro3 commented Mar 15, 2017

Could one of the other ticks be used? ` or "

@ararslan
Copy link
Member

-100 to changing adjoint, since it's one of the awesome things that makes writing Julia code as clear as writing math, plus conjugate transpose is usually what you want anyway so it makes sense to have an abbreviated syntax for it.

As long as we have the nice syntax for conjugate transpose, a postfix operator for regular transpose seems mostly unnecessary, so just having it be a regular function call seems fine to me. transpose already works; couldn't we just use that? I find the t(x) R-ism unfortunate, as it's not clear from the name what it's actually supposed to do.

Using a different tick would be kind of weird, e.g. A` can look a lot like A' depending on the font, and A" looks too much like A''.

@ararslan ararslan added design Design of APIs or of the language itself linear algebra Linear algebra speculative Whether the change will be implemented is speculative labels Mar 15, 2017
@stevengj
Copy link
Member

stevengj commented Mar 16, 2017

If we make the change in #20978, then a postfix transpose actually becomes more useful than it is now. e.g. if you have two vectors x and y and you want to apply f pairwise on them, you can do e.g. f.(x, y.') ... with #20978, this will be applicable to arrays of arbitrary types.

Honestly, I think our best option is still to leave it as-is. None of the suggestions seem like a clear improvement to me. .' has the advantage of familiarity from Matlab. The . actually is somewhat congruent with dot-call syntax in examples like f.(x, y.'), and suggests (somewhat correctly) that the transpose "fuses" (it doesn't produce a temporary copy thanks to RowVector and future generalizations thereof).

In fact, we could even take it further, and make f.(x, g.(y).') a fusing operation. i.e. we change .' transpose to be non-recursive ala #20978 and we extend its semantics to include fusion with other nested dot calls. (If you want the non-fusing version, you would call transpose.)

@StefanKarpinski
Copy link
Sponsor Member Author

I like that plan a lot, @stevengj.

@StefanKarpinski
Copy link
Sponsor Member Author

One wrinkle: presumably the @. macro does not turn y' into y.' (since that would be wrong). It could, however, turn y' into some kind of fused adjoint operation.

@stevengj
Copy link
Member

stevengj commented Mar 16, 2017

The main problem is finding a clean way to make f.(x, g.(y).') have fusing semantics. One possibility would be to transform it to f.(x, g.(y.')) and hence to broadcast(x,y -> f(x, g(y)), x, y.')?

Note that, for this to work properly, we might need to restore the fallback transpose(x) = x method, in which case we might as well let transpose remain recursive.

@StefanKarpinski
Copy link
Sponsor Member Author

I think deciding whether transpose should be recursive or not is orthogonal to whether we make it participate in dot syntax fusion. The choice of making it non-recursive is not motivated by that.

@stevengj
Copy link
Member

@StefanKarpinski, if restore a fallback transpose(x) = x, then most of the motivation for changing it to be non-recursive goes away.

@jebej
Copy link
Contributor

jebej commented Jun 5, 2017

What's the problem if the fallback is restored but we still have the transpose be non-recursive?

@stevengj
Copy link
Member

stevengj commented Jun 5, 2017

@jebej, recursive transpose is more correct when it is used as a mathematical operation on linear operators. If I remember correctly, the main reason for making it non-recursive was so that we don't have to define the transpose(x) = x fallback, rather than throwing a MethodError.

But it would not be terrible to have the fallback but still be non-recursive.

mkborregaard referenced this issue in JuliaPlots/Plots.jl Jun 10, 2017
@bkamins
Copy link
Member

bkamins commented Jun 28, 2017

Let me add two comments (I have looked through the earlier discussion and did not notice them - sorry if I have omitted something):

  • documentation for permutedims says This is a generalization of transpose for multi-dimensional arrays. Transpose is equivalent to permutedims(A, [2,1]).; it is a bit misleading as it suggests on the first read a generalization of transpose function, which it is not.
  • How one is supposed to make a transpose of a vector x=["a", "b"]? Actually y=x.' works and creates a new variable but getindex fails on it. AFAIK you have to use reshape(x, 1, :) or much slower hcat(x...) to achieve it but it is unnatural to have a different syntax for Vector (permutedims does not work here).

@andreasnoack
Copy link
Member

What is your use case for transposing a vector of strings?

@bkamins
Copy link
Member

bkamins commented Jun 28, 2017

Consider the following scenario for instance:

x = ["$(j+i)" for j in 1:3, i in 1:5]
y = ["$i" for i in 5:9]

and I want to append y after the last row of x. And the simplest way is to vcat a transpose of y.

Comes up in practice when incrementally logging text data to a Matrix{String} (I could use Vector{Vector{String}}), but often matrix is more useful (or then again there is a question how to convert Vector{Vector{String}} to Matrix{String} by vertically concatenating consecutive elements).

@mbauman
Copy link
Sponsor Member

mbauman commented Jun 28, 2017

Another use-case: transposing is the simplest way to make two vectors orthogonal to each other in order to broadcast a function over the cartesian product (f.(v, w.')).

@Sacha0
Copy link
Member

Sacha0 commented Sep 21, 2017

Data point: Yesterday I encountered a party confused by the postfix "broadcast-adjoint" operator and why it behaves like transpose. Best!

@ttparker
Copy link

ttparker commented Nov 8, 2017

FWIW, I strongly feel that we should get rid of the .' syntax. As someone more familiar with Julia than with Matlab, I expected it to mean vectorized adjoint and I got really tripped up when it didn't. Julia isn't Matlab and shouldn't be bound by Matlab's conventions - if in Julia, a dot means vectorization of the adjacent function, then this should be consistent across the language and shouldn't randomly have the one horrible exception that .' is formally unrelated to '.

I think it's fine to just have transpose without any special "tick" notation, since the vast majority of the time, it's called on a matrix of real numbers, so ' would be equivalent if you really want to save typing. If we want to make a fusing version of transpose, then I really don't think that .' is the right syntax.

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Nov 8, 2017
@JeffBezanson
Copy link
Sponsor Member

That's a good point. Arguably only the adjoint needs super-compact syntax.

@JeffBezanson JeffBezanson added the parser Language parsing and surface syntax label Nov 8, 2017
@StefanKarpinski
Copy link
Sponsor Member Author

Let's just call this transpose and deprecate .'. In the future, we can consider if we want .' as pointwise adjoint or if we just want to leave it perma-deprecated to avoid trapping Matlab users.

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Nov 9, 2017
@StefanKarpinski
Copy link
Sponsor Member Author

It seems confusing that we would have a'' === a but ''(a) === a'. It seems better to use Base.apostrophe as the name instead (or something like that).

@ttparker
Copy link

ttparker commented Aug 5, 2018

Might it be better to split this discussion off into a new Github issue, since it's about ' syntax that's not directly related to matrix transposition?

@perrutquist
Copy link
Contributor

Is there an automated way of splitting issues, or should I simply open a new one and link to the discussion here?

@ararslan
Copy link
Member

ararslan commented Aug 7, 2018

The latter

@briochemc
Copy link
Contributor

briochemc commented Aug 28, 2018

The only situation where a superscript-T notation would really be appropriate is if you have a numerical matrix whose indices you want to permute, but you really don't want the adjoint linear operator. Such situations certainly exist, but may be too rare to justify introducing new syntax.

I guess I'm way too late for the discussion, but I'd like to point one use that I think is worth mentioning: Applying the complex-step differentiation to a real-valued function which has transpose inside of it. (I personally figured out that I needed .' in MATLAB and julia for this particular reason.)

I'll give an example with multiple occurences of transpose (maybe I could avoid doing it this way?)

using LinearAlgebra

# f : Rⁿ → R
#     x  ↦ f(x) = xᵀ * x / 2
f(x) = 0.5 * transpose(x) * x

# Fréchet derivative of f
# Df : Rⁿ → L(Rⁿ, R)
#      x  ↦ Df(x) : Rⁿ → R (linear, so expressed via multiplication)
#                   h  ↦ Df(x)(h) = Df(x) * h
Df(x) = transpose(x) 

# Complex-step method version of Df
function CSDf(x) 
    out = zeros(eltype(x), 1, length(x))
        for i = 1:length(x)
        x2 = copy(x) .+ 0im
        h = x[i] * 1e-50
        x2[i] += im * h
        out[i] = imag(f(x2)) / h
    end
    return out
end

# 2nd Fréchet derivative
# D2f : Rⁿ → L(Rⁿ ⊗ Rⁿ, R)
#       x  ↦ D2f(x) : Rⁿ ⊗ Rⁿ → R (linear, so expressed via multiplication)
#                     h₁ ⊗ h₂ ↦ D2f(x)(h₁ ⊗ h₂) = h₁ᵀ * D2f(x) * h₂
D2f(x) = Matrix{eltype(x)}(I, length(x), length(x))

# Complex-step method version of D2f
function CSD2f(x)
    out = zeros(eltype(x), length(x), length(x))
    for i = 1:length(x)
        x2 = copy(x) .+ 0im
        h = x[i] * 1e-50
        x2[i] += im * h
        out[i, :] .= transpose(imag(Df(x2)) / h)
    end
    return out
end 

# Test on random vector x of size n
n = 5
x = rand(n)
Df(x)  CSDf(x)
D2f(x)  CSD2f(x)

# test that the 1st derivative is correct Fréchet derivative= eps(norm(x))
for i = 1:10
    h =* randn(n) # random small y
    println(norm(f(x + h) - f(x) - Df(x) * h) / norm(h)) # Fréchet check
end

# test that the 2nd derivative is correct 2nd Fréchet derivative
for i = 1:10
    h₁ = randn(n) # random h₁
    h₂ =* randn(n) # random small h₂
    println(norm(Df(x + h₂) * h₁ - Df(x) * h₁ - transpose(h₁) * D2f(x) * h₂) / norm(h₂)) # Fréchet check
end
# Because f is quadratic, we can even check that f is equal to its Taylor expansion
h = rand(n)
f(x + h)  f(x) + Df(x) * h + 0.5 * transpose(h) * D2f(x) * h

The point being that f and Df must be defined using transpose and must not use the adjoint.

@c42f
Copy link
Member

c42f commented Aug 29, 2018

I don't think the complex step method is super relevant in julia. Isn't it a neat hack/workaround to get automatic differentiation in cases where a language supports efficient builtin complex numbers, but an equivalently efficient Dual number type can't be defined? That's not the case in julia, which has really nice automatic differentiation libraries.

@briochemc
Copy link
Contributor

briochemc commented Aug 29, 2018

I agree about using Dual numbers instead of the complex-step method and that's a very good point you are making (I personally have already replaced all my complex-step-method evaluations with dual-number ones in julia). However, I do think that this is still a valid use case, for demonstration purposes, teaching tricks (see, e.g., Nick Higham talking about the complex-step method at Julia Con 2018), and portability (in other words, I worry that MATLAB's version of the code above using complex numbers would be cleaner).

@mattcbro
Copy link

Coming from the world of Engineers and possibly Physicists who use complex arrays more than real arrays, not having a transpose operator is a bit of a pain. (Complex phasor representation for a harmonic time dependency is ubiquitous in our field.) I personally would favor the numpy syntax of x.H and x.T, though my only consideration is conciseness .

The density of the transpose operator relative to Hermitian transpose is about 1 to 1 in my code. So the unconjugated transpose is equally important to me. A lot of the use of transpose is to create outer products and to size arrays correctly for interfacing to other code or for matrix multiplication.

I intend for now to simply provide a macro or one character function for the operation, however what is the proper equivalent to the old functionality, transpose() or permutedims()?

@andyferris
Copy link
Member

andyferris commented Aug 31, 2018

transpose is intended for linear algebra and is recursive, and permutedims is for non-recursive arrangement of data of any type.

It’s interesting you say you use transpose as much as adjoint. I used to be the same, but mostly because I tended to make mistakes where my data was real so I tended to transpose but actually the adjoint was the correct operation (generalized to the complex case - adjoint was the right operation for my algorithm). There are (many) valid exceptions, of course.

@Jutho
Copy link
Contributor

Jutho commented Aug 31, 2018

In everything related to electrodynamics, you often use space-like vectors and want to use vector operations in R^n (typically n=3), i.e. transpose in particular, even though your vectors are complex-valued because you've taken a Fourier transform. It seems like @mattcbro is talking about this kind of applications.

That being said, when reading up on syntax discussions, I am often contemplating that for me, personally, I could not imagine that a slightly more verbose syntax is the thing that slows down my programming speed or efficiency. Thinking about the algorithm itself and the most natural/efficient way of implementing it takes much more time.

@stevengj
Copy link
Member

stevengj commented Aug 31, 2018

In everything related to electrodynamics, you often use space-like vectors and want to use vector operations in R^n (typically n=3), i.e. transpose in particular, even though your vectors are complex-valued because you've taken a Fourier transform.

Not necessarily. Often you want time-average quantities from the Fourier amplitudes, in which case you use the complex dot product, e.g. ½ℜ[𝐄*×𝐇] is the time-average Poynting flux from the complex Fourier components and ¼ε₀|𝐄|² is a time-average vacuum energy density. On the other hand, since the Maxwell operator is (typically) a complex-symmetric operator ("reciprocal"), you often use an unconjugated "inner product" for (infinite-dimensional) algebra on the fields 𝐄(𝐱) etc. over all space.

@Jutho
Copy link
Contributor

Jutho commented Aug 31, 2018

That's true, I had the word often in the first sentence, but removed it apparently :-).

@mattcbro
Copy link

Well if you want to go there, Electromagnetic quantities are even more concisely written in a Clifford Algebraic formulation, often called Geometric algebra. Those algebras have multiple automorphisms and antiautomorphisms that play a critical role in the formulation of the theory, especially when considering scattering problems.

These algebras typically have a concise matrix representation and those morphisms are often easily computed via complex transpose, Hermitian transpose and conjugation.

Nevertheless as I stated earlier my primary use of transpose is often to arrange my arrays to interface with other arrays, other code, and to get matrix multiply to work against the correct dimension of a flattened array.

@c42f
Copy link
Member

c42f commented Aug 31, 2018

I personally would favor the numpy syntax of x.H and x.T

Easy to implement now in 1.0, and should be efficient:

function Base.getproperty(x::AbstractMatrix, name::Symbol)
    if name === :T
        return transpose(x) 
    #elseif name === :H # can also do this, though not sure why we'd want to overload with `'`
    #    return adjoint(x)
    else
        return getfield(x, name)
    end
end 

This is surprisingly easy and kind of neat. The downside seems to be that orthogonal uses of getproperty don't compose with each other. So anybody implementing getproperty on their specific matrix type will need to implement the generic behavior by hand.

@c42f
Copy link
Member

c42f commented Aug 31, 2018

orthogonal uses of getproperty don't compose

Hmm. I wonder if that implies x.T "should" have been lowered to getproperty(x, Val(:T)). I shudder to think what that would do to the poor compiler though.

@andyferris
Copy link
Member

I’m sure everyone has their opinion - but to me it’s almost a feature that it’s hard to build a generic interface out of dot syntax. Don’t get me wrong, it’s a really great feature and wonderful for defining named tuple-like structs, and so-on.

(It’s also possible to add a Val dispatch layer to your types pretty easily).

@mattcbro
Copy link

mattcbro commented Sep 1, 2018

@c42f 's code works like a charm. Unfortunately for me I'm trying to write code that works on versions 0.64 and up, which forces me to use either transpose or my own defined function T(A) = transpose(A). Perhaps a macro would have been a little cleaner and slightly more efficient.

@c42f
Copy link
Member

c42f commented Sep 1, 2018

To be clear, I'm not suggesting defining this particular getproperty is a good idea for user code. It's just likely to break things in the longer term ;-) Though perhaps one day we'll have a good enough feel for the consequences that we could have x.T defined in Base.

But in a general sense, I do wonder why this kind of property use for defining "getters" in generic interfaces is actually bad. For example, generic field getter functions currently have a whopping big namespace problem which is simply solved by judicious use of getproperty. It's far nicer to write x.A than to write MyModule.A(x), some longer ugly function name like get_my_A(x), or to export the extremely generic name A from a user module. The only problem as I see it, is the expected ability to override the meaning of .B for subtypes independently of .A being defined generically on a super type. Hence the half serious comment about Val.

@StefanKarpinski
Copy link
Sponsor Member Author

Funny idea:

julia> x'̄
ERROR: syntax: invalid character "̄"

The character looks kind of like a T but it's actually a ' with a bar over it. Not sure if serious...

@KristofferC
Copy link
Sponsor Member

screen shot 2018-09-10 at 11 29 56

@mbauman
Copy link
Sponsor Member

mbauman commented Sep 10, 2018

Yeah, looks like that on GitHub to me, too. But it's an overbar. Copy and paste into my terminal shows:

screen shot 2018-09-10 at 10 31 24 am

Too clever and cute. I still do like the combining characters, though, and I think 'ᵀ is nice.

@johnalx
Copy link

johnalx commented Apr 28, 2020

-100 to changing adjoint, since it's one of the awesome things that makes writing Julia code as clear as writing math, plus conjugate transpose is usually what you want anyway so it makes sense to have an abbreviated syntax for it.

There is a certain arrogance to a statement like this. Consider that some finite proportion of developers explicitly do not want adjoint() but need transpose().

Case and point for us working with symbolic calculations for modeling the default ' operator would cause for example the pseudo-inverse (A'*A)\(A*b) or a quadratic form v'*A*v to return erroneously long and complex results that cannot be reduced.

Maybe the solution is some kind of compiler directive declaring the meaning of '.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra parser Language parsing and surface syntax
Projects
None yet
Development

Successfully merging a pull request may close this issue.