Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe replace Matern12, Matern32 and Matern52 by Matern{nu} #518

Open
FelixBenning opened this issue Jun 6, 2023 · 14 comments
Open

Maybe replace Matern12, Matern32 and Matern52 by Matern{nu} #518

FelixBenning opened this issue Jun 6, 2023 · 14 comments

Comments

@FelixBenning
Copy link

I looked through existing issues and didn't find someone suggesting this before. But you could in principle replace the special case matern functions by a parametrized type. The advantage is that 1.5 and 3/2 is equivalent:

julia> struct Matern{T} end

julia> Matern{1.5}() isa Matern{3/2}
true

And you would automatically get a common supertype. You can also have a fallback for p/2 (if you were to use some sort of trait system to determine if p is an integer presumably) and a fallback for general nu. The only reason why you wouldn't use such a Matern implementation is, since you can then not differentiate w.r.t. nu. But this is not supported anyway at the moment.

@theogf
Copy link
Member

theogf commented Jun 6, 2023

With this setup, how would distinguish with the free \nu parameter Matern kernel? That would unfortunate to have Matern{T} and MaternKernel.
And we cannot use T as a free parameter as it would explode the number of compiled methods.

@devmotion
Copy link
Member

One problem is the following:

julia> struct Matern{T} end

julia> Matern{1.5}() isa Matern{3/2}
true

julia> Matern{1.5}() isa Matern{3//2}
false

julia> Matern{1.5f0}() isa Matern{3/2}
false

Floating point numbers of type Float32 are common e.g. in ML and when working with GPUs, so the examples are not merely artificial. Additionally, the number of specializations increases massively by making kernels of each order a separate type.

Hence IMO encoding the order in the type system is the wrong approach. The better approach, I think, to unify these implementations is to have a common supertype and a trait that returns the order of the kernel (which is hard-coded for kernels with fixed order).

@FelixBenning
Copy link
Author

FelixBenning commented Jun 6, 2023

I mean you could precompile just 1/2, 3/2, 5/2 and let the compiler worry about the other nu when a user actually uses it. Then the number of precompiled types doesn't really increase and I don't know if that matters for the JIT compiler. You could work around not 1.5 === 3//2 issue with the use of traits:

abstract type Nu end
struct Nu32 <: Nu end
struct Nu52 <: Nu end
struct Nup2 <: Nu end
struct NuGeneral <: Nu end

function  NuOrder(::Matern{T})
    if T == 3/2:
            return Nu32()
    elseif T == 5/2:
            return Nu52()
    elseif isint(T/2) # placeholder
            return Nup2()
    else
          return NuGeneral()
end

(k::Matern)(x,y) = kernelCall(NuOrder(k), k, x,y)

or you just have users remember to pass Float64. Right now they also have to remember to call Matern32 and not Matern15 I guess. In both cases the errors could be made to look similar.

The issue of differentiating Matern{T} from MaternKernel is difficult. I don't know what a good solution would be.

@theogf
Copy link
Member

theogf commented Jun 6, 2023

I am also on the side of @devmotion . Even with your solution, someone optimizing nu, would end up having a new compiled version every time. Also I don't really see the advantages of your proposal. As remembering if it's Matern32 or Matern15 (I would never think of this one personally) does not seem like a large problem. by itself.

The simplification that would make sense is indeed what @devmotion proposes. This would save the implementation of Base.show and metric

@willtebbutt
Copy link
Member

willtebbutt commented Jun 6, 2023

Background

To my mind, the two important facts about the Matern family of kernels are:

  1. the half-integer Matern kernels have a lot of useful properties that make them especially convenient to work with (e.g. simplified expressions for the kernel, and there being a linear SDE to which a GP with a half-integer Matern kernel exactly corresponds to -- central to TemporalGPs.jl).
  2. Matern kernels of any other order are cool, but don't have any special properties that makes knowing the order statically especially useful, so you might as well store the order as a field in a generic Matern kernel.

Our current implementation is a response to these two points -- we have special-case implementations for the commonly-used half-integer Matern kernels, and a generic Matern kernel for everything else. I believe this is a simple-to-understand approach, and allows us to make good use of the above properties.

One possible criticism of the current approach is that we only support special implementations of the first three half-integer Matern kernels (1/2, 3/2, 5/2). I'm not aware of a general formula that would allow us to usefully provide a clever implementation that supports general half-integer Matern kernels, so I'm not sure that there's much utility in offering general half-integer Matern kernels. If you were to tell me that such a formula exists, I would be very keen to modify our implementation to ensure that we support all half-integer Matern kernels.

Moreover, if statically knowing the order of Matern kernels which aren't half-integer offers some useful path to optimisation, I would like to know about it, and I believe would warrant a re-think of our current design.

My thoughts on this proposal

I don't especially mind if we change our implementation, but it's not obvious to me that the aesthetic value of being able to write Matern{1.5} or Matern{3//2} is sufficient to warrant changing the implementation, and I suspect that the proposals will increase our code-complexity slightly.

edit: I do very much appreciate you taking the time to open this issue and discuss though @FelixBenning -- I'm very happy to be convinced of your position -- I'm just not swayed yet!

@FelixBenning
Copy link
Author

FelixBenning commented Jun 6, 2023

@willtebbutt there is a general formula for half integers (at least according to wikipedia):

$$C_{p+1/2}(d) = \sigma^2\exp\left(-\frac{\sqrt{2p+1}d}{\rho}\right)\frac{p!}{(2p)!}\sum_{i=0}^p\frac{(p+i)!}{i!(p-i)!}\left(\frac{2\sqrt{2p+1}d}{\rho}\right)^{p-i}$$

https://en.wikipedia.org/wiki/Mat%C3%A9rn_covariance_function#Simplification_for_%CE%BD_half_integer

it might make sense to dig up a more reliable source before implementation, but in principle this would be possible (Wikipedia cites Abramowitz and Stegun (1972)).

@devmotion
Copy link
Member

I did these calculations a few weeks ago and wrote about it in my thesis 😅😅😅 It's quite simple and there exist multiple references (eg., the GPML book).

But I assumed that @willtebbutt was not asking about a formula per se but whether there exists a formula that can be evaluated accurately and efficiently - and I don't think that this formulation as a higher-order polynomial satisfies these properties apart from the first few half-integers.

@willtebbutt
Copy link
Member

Hahaha -- I should definitely have been aware of this / thought to check before I wrote something. As promised, I'll concede that maybe we should consider expanding our current collection of half-integer kernels to the general half-integer case, even if the above isn't super-stable (I'm assuming what we have implemented is just special cases of the above formula).

I feel like this suggests producing a specialised half-integer matern kernel, and a general kernel though, rather than allowing a type parameter in the general kernel which specialises, since I'd like to restrict static-specialisation to half-integer kernels.

Does this make sense, or is there something that I'm missing?

@devmotion
Copy link
Member

Can't we just add a branch in the general kernel, in case we want to exploit this formula (I'm still not convinced that it's better than just directly evaluating the general formula)? That seems the simplest option to me.

@willtebbutt
Copy link
Member

willtebbutt commented Jun 6, 2023

My view is that it's essential to encode the order statically when it's half-integer, and moreover that it's only a good idea to do it when it's half-integer, because when it's not half-integer you're almost certainly going to want to optimise it (albeit at the moment we have to do this in a gradient-free manner), and when you optimise it you don't want the type to change for each new value you try.

@devmotion
Copy link
Member

My view is that it's essential to encode the order statically when it's half-integer

That's the point that's not clear to me - why is it essential? To dispatch to the different formula?

@willtebbutt
Copy link
Member

To dispatch to the different formula, and to dispatch to different linear SDEs in the low-dimensional setting (1D is implemented in TemporalGPs.jl -- the non-half-integer Matern kernels don't have SDEs to which they correspond precisely).

@devmotion
Copy link
Member

IMO the formula is not a completely convincing argument since it can be implemented with simple branches (and I still assume generally you don't want to use it anyway). I think the more interesting argument are the SDEs. From a quick look at TemporalGPs it seems that types would help with type stability because TemporalGPs.to_sde is implemented with static arrays (which would be of different dimensions for different half-integer orders). I'm not sure where the cut-off is in this specific application, but based on StaticArrays's rule of thumb (normal arrays for arrays with > 100 elements) only a few additional half-integer kernels beyond Matern12Kernel, Matern32Kernel, and Matern52Kernel would benefit from the static optimizations. Are any such higher-order half-integer kernels relevant in practice?

@willtebbutt
Copy link
Member

I agree with you re the TemporalGPs assessment. I don't think we're really interested in much beyond 5/2. Maybe 7/2, and potentially 9/2, but I've never worried about them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants