Additive kernels #59

femtomc · 2019-11-10T03:02:06Z

I started writing code to implement additive kernels as in Additive Gaussian Processes.

Here's how I imagine usage (similar to the Stretched kernel, where the struct has a field with to wrap a kernel).

# Additive kernel
struct Additive{
		Dim <: Int, 
		Ker <: Kernel
		} <: Kernel
    dim::Int
    ker::Ker
    
    # Inner constructor.
    Additive(dim::Int, ker::K) where {K <: Kernel} = new{Int, Kernel}(dim, ker)
end

# Binary methods
ew(k::Additive, x::AbstractVector, x′::AbstractVector) = ew(k.ker, x[k.dim, :], x′[k.dim, :])
pw(k::Additive, x::AbstractVector, x′::AbstractVector) = pw(k.ker, x[k.dim, :], x′[k.dim, :])
ew(k::Additive, x::ColVecs, x′::ColVecs) = ew(k.ker, x.X[k.dim, :], x′.X[k.dim, :])
pw(k::Additive, x::ColVecs, x′::ColVecs) = pw(k.ker, x.X[k.dim, :], x′.X[k.dim, :])

# Unary methods
ew(k::Additive, x::AbstractVector) = ew(k.ker, x[k.dim, :])
pw(k::Additive, x::AbstractVector) = pw(k.ker, x[k.dim, :])
ew(k::Additive, x::ColVecs) = ew(k.ker, x.X[k.dim, :])
pw(k::Additive, x::ColVecs) = pw(k.ker, x.X[k.dim, :])

# Here's an additive model implementation. This is a Gaussian process with a sum of simple squared-exponential kernels, scaled by a length scale.
@model function AdditiveGPModel(dim_features)
    params = [(randn(rng), randn(rng)) for i in 1:dim_features]
    ker_array = [params[i][1] * Additive(i, EQ()) for i in 1:dim_features]
    k = foldl(+, ker_array)
    additive_gp = GP(k, GPC())
    return additive_gp, ker_array, params
end

# Make model.
x = ColVecs(randn(rng, (10, 1)))
y = [3.0]
f₁, collection, params = AdditiveGPModel(10)

# Try inference.
fx = f₁(x)
post = f₁ | Obs(fx, y)

This code doesn't work yet - error is:
MethodError: objects of type Stheno.GPC are not callable

Would this be an interesting enhancement?

The text was updated successfully, but these errors were encountered:

willtebbutt · 2019-11-10T10:12:23Z

Hi @McCoyBecker thanks for giving this a go. Additive models are definitely something I'm keen on supporting, but we should already be able to do them with the select transform.

select(f, d)

produces a new GP that just depends on the dth dimensions of whatever input data is provided. You can use this to build an additive GP model as follows:

using Stheno
using Stheno: @model

@model function additive_gp(D::Int, ls::AbstractVector{<:Real}, σs::AbstractVector{<:Real})
    fs = [σs[d] * stretch(select(GP(eq()), d), 1 / ls[d]) for d in 1:D]
    f = foldl(+, fs)
    return fs, f
end

D = 2
N = 100
length_scales = exp.(0.1 .* randn(D))
process_std_devs = exp.(0.1 .* randn(D))
fs, f = additive_gp(D, length_scales, process_std_devs)

# Sample from the sum
x = ColVecs(randn(D, N))
y = rand(f(x, 1e-9))

# Compute logpdf (because we can)
logpdf(f(x, 1e-9), y)

Hopefully this does what you've got in mind without introducing an extra kernel. Additionally, you can look at the posterior processes associated with each of the components you're adding:

fs_posteriors = [fs[d] | Obs(f(x, 1e-9), y) for d in 1:D]

Does this cover the use case you're interested in?

femtomc · 2019-11-10T14:47:35Z

Ah! See I saw Select but didn't quite use it correctly.

For my use case, I really just need the posterior over the full additive GP:

post = f | Obs(f(x), y)

I think that's why I tried to write a kernel first, instead of compose a bunch of GPs together. I'll try this out!

willtebbutt · 2019-11-10T20:46:23Z

Great, please let me know how you get on 🙂. I'll leave this issue open for now in case you run into any difficulties.

willtebbutt · 2019-11-18T23:08:14Z

@McCoyBecker just checking in. How's this going?

femtomc · 2019-11-19T01:06:28Z

I think I got it working without the Select - but then I got slammed with work on other projects. I'll pick it up again soon and try out my implementation (I got it fitting on synthetic data but haven't tested on the actual set) and then use the Select version.

willtebbutt · 2019-11-19T01:09:48Z

Good to know. It dawned on me that there will probably be some scaling issues if you want to sum a large number of processes at the minute. If this even vaguely looks like a bottleneck with the select implementation please let me know and I'll fix it -- should only be a very minor alteration to the codebase.

femtomc · 2019-11-24T21:19:04Z

Hi Will - just an update: this week we'll be scaling up to a larger dataset. Will report back.

willtebbutt · 2020-01-16T13:26:17Z

Hi @McCoyBecker any word on how stuff went with the larger data set?

femtomc · 2020-03-27T21:04:57Z

Many months later - I'm back working on this now. I'm running into a lot of issues with failing Cholesky tests with the kernels I implemented so I'm going to use your select suggestion and report back.

femtomc · 2020-03-27T21:54:47Z

I'm running into a weird bottleneck here with Optim (specifically Nelder-Mead)...

unpack(D, θ) = [θ[i] for i in 1:D], [θ[i] for i in D:2*D]

function nlml(D::Int64, 
              θ::Array{Float64, 1},
              x::ColVecs, 
              y::Array{Float64, 1}
             )
    ls, σs = unpack(D, θ)
    fs, f = additive_gp(D, ls, σs)
    println(ls)
    return -logpdf(f(x, 1e-9), y)
end

function hyper_fit_nm(
                      D::Int64, 
                      ls::AbstractVector{<:Real},
                      σs::AbstractVector{<:Real},
                      x::ColVecs, 
                      y::AbstractArray
                     )
    params = Array{Float64, 1}([])
    append!(params, ls)
    append!(params, σs)
    results = Optim.optimize(θ -> nlml(D, θ, x, y),
                             params,
                             NelderMead(),
                             Optim.Options(g_tol = 1e-12,
                             iterations = 10,
                             store_trace = false,
                             show_trace = true)
                            )
    fit_params = results.minimizer
    return fit_params
end

is my set of fitting utility functions for hyperparameter optimization. Then I'm trying to run the model:

function test()
    # Select test dims.
    D = 50
    dim_labels = 1

    # Make up data - x is a ColVecs, y is an Array.
    x = ColVecs(randn(rng, (D, dim_labels)))
    y = [randn()*100 for i in 1:dim_labels]

    # Make sure you can instantiate a model.
    length_scales = exp.(0.1 .* randn(D))
    process_std_devs = exp.(0.1 .* randn(D))
    fs, f = additive_gp(D, length_scales, process_std_devs)

    # Try BFGS.
    fit_params = hyper_fit_nm(D, length_scales, process_std_devs, x, y)

end

But I think it's relevant because the custom kernels never hit this issue.

Optim never starts showing the trace.
I'm getting these insane compiler errors? I'll try and grab one and post it here.

After a few more tries, I can't get the optimization to start. I'm not sure if this is an issue in the @model macro or select. I'll take a poke around the code.

femtomc · 2020-03-27T22:34:47Z

It's scaling with the dimension D in the scratch code above - it's growing exponentially in D.

Edit: the offending call is the call to rand on the GP, so the call to cholesky in rand is taking a very long time.

module SelectGP

using Stheno
using Stheno: @model

using Random
Random.seed!(314159)

@model function additive_gp(D::Int, ls::AbstractVector{<:Real}, σs::AbstractVector{<:Real})
    fs = [σs[d] * stretch(select(GP(eq()), d), 1 / ls[d]) for d in 1:D]
    f = foldl(+, fs)
    return fs, f
end

D = 22
N = 100
length_scales = exp.(0.1 .* randn(D))
process_std_devs = exp.(0.1 .* randn(D))
fs, f = additive_gp(D, length_scales, process_std_devs)
println(f)

# Sample from the sum
x = ColVecs(randn(D, N))
y = rand(f(x, 1e-9))

# Compute logpdf (because we can)
#println(logpdf(f(x, 1e-9), y))

end #module

So, it's actually the call to cov inside of the Cholesky decomp. I think it's a tuple type thing? Because it has to unpack the nested compositions through the recursive call to cov in compose.jl : line 16.

willtebbutt · 2020-04-07T10:52:23Z

Thanks for digging down into this.

It seems plausible to me that compile times would scale very poorly in this case. I imagine that the run-time complexity is quadratic, but the distinction doesn't really matter in this case.

The run-issue likely stems from Stheno's inability to "know" whether or not two processes are independent without doing some work. The current implementation of the Sum kernel doesn't exploit independence, so will be bad.

One way to resolve this would be to implement a special IndependentGPs object, that wraps a collection of GP objects that it knows are independent by construction. You could then define various operations on them that exploit this knowledge, such as sum. I was thinking of doing something like this anyway to provide better multi-output GP support. This should deal with both compile- and run-time issues.

Is there any obvious reason that you can think of why this wouldn't work in your case, or shall I make a PR for you to try out?

femtomc · 2020-04-07T12:05:50Z

There is nothing I can think of. Yes please!

andreaskoher · 2021-04-15T10:57:30Z

If you are working with time series data, this bottleneck may be greatly reduced once TemporalGPs plays nicely with Stheno

willtebbutt · 2021-04-15T11:22:27Z

Good to know that it would be helpful. I should say that it's some way off at the minute, as it's going to be a bit involved to make it work.

Do you need Stheno's ability to decompose processes during training, or do you just want to decompose stuff once after you've trained the model?

edit: @andreaskoher I misread your comment to mean that you were in need of TemporalGPs + Stheno. I'll leave my comment here anyway though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additive kernels #59

Additive kernels #59

femtomc commented Nov 10, 2019

willtebbutt commented Nov 10, 2019 •

edited

femtomc commented Nov 10, 2019

willtebbutt commented Nov 10, 2019

willtebbutt commented Nov 18, 2019

femtomc commented Nov 19, 2019 •

edited

willtebbutt commented Nov 19, 2019

femtomc commented Nov 24, 2019

willtebbutt commented Jan 16, 2020

femtomc commented Mar 27, 2020

femtomc commented Mar 27, 2020 •

edited

femtomc commented Mar 27, 2020 •

edited

willtebbutt commented Apr 7, 2020 •

edited

femtomc commented Apr 7, 2020

andreaskoher commented Apr 15, 2021

willtebbutt commented Apr 15, 2021 •

edited

Additive kernels #59

Additive kernels #59

Comments

femtomc commented Nov 10, 2019

willtebbutt commented Nov 10, 2019 • edited

femtomc commented Nov 10, 2019

willtebbutt commented Nov 10, 2019

willtebbutt commented Nov 18, 2019

femtomc commented Nov 19, 2019 • edited

willtebbutt commented Nov 19, 2019

femtomc commented Nov 24, 2019

willtebbutt commented Jan 16, 2020

femtomc commented Mar 27, 2020

femtomc commented Mar 27, 2020 • edited

femtomc commented Mar 27, 2020 • edited

willtebbutt commented Apr 7, 2020 • edited

femtomc commented Apr 7, 2020

andreaskoher commented Apr 15, 2021

willtebbutt commented Apr 15, 2021 • edited

willtebbutt commented Nov 10, 2019 •

edited

femtomc commented Nov 19, 2019 •

edited

femtomc commented Mar 27, 2020 •

edited

femtomc commented Mar 27, 2020 •

edited

willtebbutt commented Apr 7, 2020 •

edited

willtebbutt commented Apr 15, 2021 •

edited