Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additive kernels #59

Open
femtomc opened this issue Nov 10, 2019 · 15 comments
Open

Additive kernels #59

femtomc opened this issue Nov 10, 2019 · 15 comments

Comments

@femtomc
Copy link

femtomc commented Nov 10, 2019

I started writing code to implement additive kernels as in Additive Gaussian Processes.

Here's how I imagine usage (similar to the Stretched kernel, where the struct has a field with to wrap a kernel).

# Additive kernel
struct Additive{
		Dim <: Int, 
		Ker <: Kernel
		} <: Kernel
    dim::Int
    ker::Ker
    
    # Inner constructor.
    Additive(dim::Int, ker::K) where {K <: Kernel} = new{Int, Kernel}(dim, ker)
end

# Binary methods
ew(k::Additive, x::AbstractVector, x′::AbstractVector) = ew(k.ker, x[k.dim, :], x′[k.dim, :])
pw(k::Additive, x::AbstractVector, x′::AbstractVector) = pw(k.ker, x[k.dim, :], x′[k.dim, :])
ew(k::Additive, x::ColVecs, x′::ColVecs) = ew(k.ker, x.X[k.dim, :], x′.X[k.dim, :])
pw(k::Additive, x::ColVecs, x′::ColVecs) = pw(k.ker, x.X[k.dim, :], x′.X[k.dim, :])

# Unary methods
ew(k::Additive, x::AbstractVector) = ew(k.ker, x[k.dim, :])
pw(k::Additive, x::AbstractVector) = pw(k.ker, x[k.dim, :])
ew(k::Additive, x::ColVecs) = ew(k.ker, x.X[k.dim, :])
pw(k::Additive, x::ColVecs) = pw(k.ker, x.X[k.dim, :])

# Here's an additive model implementation. This is a Gaussian process with a sum of simple squared-exponential kernels, scaled by a length scale.
@model function AdditiveGPModel(dim_features)
    params = [(randn(rng), randn(rng)) for i in 1:dim_features]
    ker_array = [params[i][1] * Additive(i, EQ()) for i in 1:dim_features]
    k = foldl(+, ker_array)
    additive_gp = GP(k, GPC())
    return additive_gp, ker_array, params
end

# Make model.
x = ColVecs(randn(rng, (10, 1)))
y = [3.0]
f₁, collection, params = AdditiveGPModel(10)

# Try inference.
fx = f₁(x)
post = f₁ | Obs(fx, y)

This code doesn't work yet - error is:
MethodError: objects of type Stheno.GPC are not callable

Would this be an interesting enhancement?

@willtebbutt
Copy link
Member

willtebbutt commented Nov 10, 2019

Hi @McCoyBecker thanks for giving this a go. Additive models are definitely something I'm keen on supporting, but we should already be able to do them with the select transform.

select(f, d)

produces a new GP that just depends on the dth dimensions of whatever input data is provided. You can use this to build an additive GP model as follows:

using Stheno
using Stheno: @model

@model function additive_gp(D::Int, ls::AbstractVector{<:Real}, σs::AbstractVector{<:Real})
    fs = [σs[d] * stretch(select(GP(eq()), d), 1 / ls[d]) for d in 1:D]
    f = foldl(+, fs)
    return fs, f
end

D = 2
N = 100
length_scales = exp.(0.1 .* randn(D))
process_std_devs = exp.(0.1 .* randn(D))
fs, f = additive_gp(D, length_scales, process_std_devs)

# Sample from the sum
x = ColVecs(randn(D, N))
y = rand(f(x, 1e-9))

# Compute logpdf (because we can)
logpdf(f(x, 1e-9), y)

Hopefully this does what you've got in mind without introducing an extra kernel. Additionally, you can look at the posterior processes associated with each of the components you're adding:

fs_posteriors = [fs[d] | Obs(f(x, 1e-9), y) for d in 1:D]

Does this cover the use case you're interested in?

@femtomc
Copy link
Author

femtomc commented Nov 10, 2019

Ah! See I saw Select but didn't quite use it correctly.

For my use case, I really just need the posterior over the full additive GP:

post = f | Obs(f(x), y)

I think that's why I tried to write a kernel first, instead of compose a bunch of GPs together. I'll try this out!

@willtebbutt
Copy link
Member

Great, please let me know how you get on 🙂. I'll leave this issue open for now in case you run into any difficulties.

@willtebbutt
Copy link
Member

@McCoyBecker just checking in. How's this going?

@femtomc
Copy link
Author

femtomc commented Nov 19, 2019

I think I got it working without the Select - but then I got slammed with work on other projects. I'll pick it up again soon and try out my implementation (I got it fitting on synthetic data but haven't tested on the actual set) and then use the Select version.

@willtebbutt
Copy link
Member

Good to know. It dawned on me that there will probably be some scaling issues if you want to sum a large number of processes at the minute. If this even vaguely looks like a bottleneck with the select implementation please let me know and I'll fix it -- should only be a very minor alteration to the codebase.

@femtomc
Copy link
Author

femtomc commented Nov 24, 2019

Hi Will - just an update: this week we'll be scaling up to a larger dataset. Will report back.

@willtebbutt
Copy link
Member

Hi @McCoyBecker any word on how stuff went with the larger data set?

@femtomc
Copy link
Author

femtomc commented Mar 27, 2020

Many months later - I'm back working on this now. I'm running into a lot of issues with failing Cholesky tests with the kernels I implemented so I'm going to use your select suggestion and report back.

@femtomc
Copy link
Author

femtomc commented Mar 27, 2020

I'm running into a weird bottleneck here with Optim (specifically Nelder-Mead)...

unpack(D, θ) = [θ[i] for i in 1:D], [θ[i] for i in D:2*D]

function nlml(D::Int64, 
              θ::Array{Float64, 1},
              x::ColVecs, 
              y::Array{Float64, 1}
             )
    ls, σs = unpack(D, θ)
    fs, f = additive_gp(D, ls, σs)
    println(ls)
    return -logpdf(f(x, 1e-9), y)
end

function hyper_fit_nm(
                      D::Int64, 
                      ls::AbstractVector{<:Real},
                      σs::AbstractVector{<:Real},
                      x::ColVecs, 
                      y::AbstractArray
                     )
    params = Array{Float64, 1}([])
    append!(params, ls)
    append!(params, σs)
    results = Optim.optimize-> nlml(D, θ, x, y),
                             params,
                             NelderMead(),
                             Optim.Options(g_tol = 1e-12,
                             iterations = 10,
                             store_trace = false,
                             show_trace = true)
                            )
    fit_params = results.minimizer
    return fit_params
end

is my set of fitting utility functions for hyperparameter optimization. Then I'm trying to run the model:

function test()
    # Select test dims.
    D = 50
    dim_labels = 1

    # Make up data - x is a ColVecs, y is an Array.
    x = ColVecs(randn(rng, (D, dim_labels)))
    y = [randn()*100 for i in 1:dim_labels]

    # Make sure you can instantiate a model.
    length_scales = exp.(0.1 .* randn(D))
    process_std_devs = exp.(0.1 .* randn(D))
    fs, f = additive_gp(D, length_scales, process_std_devs)

    # Try BFGS.
    fit_params = hyper_fit_nm(D, length_scales, process_std_devs, x, y)

end

But I think it's relevant because the custom kernels never hit this issue.

  1. Optim never starts showing the trace.
  2. I'm getting these insane compiler errors? I'll try and grab one and post it here.

After a few more tries, I can't get the optimization to start. I'm not sure if this is an issue in the @model macro or select. I'll take a poke around the code.

@femtomc
Copy link
Author

femtomc commented Mar 27, 2020

It's scaling with the dimension D in the scratch code above - it's growing exponentially in D.

Edit: the offending call is the call to rand on the GP, so the call to cholesky in rand is taking a very long time.

module SelectGP

using Stheno
using Stheno: @model

using Random
Random.seed!(314159)

@model function additive_gp(D::Int, ls::AbstractVector{<:Real}, σs::AbstractVector{<:Real})
    fs = [σs[d] * stretch(select(GP(eq()), d), 1 / ls[d]) for d in 1:D]
    f = foldl(+, fs)
    return fs, f
end

D = 22
N = 100
length_scales = exp.(0.1 .* randn(D))
process_std_devs = exp.(0.1 .* randn(D))
fs, f = additive_gp(D, length_scales, process_std_devs)
println(f)

# Sample from the sum
x = ColVecs(randn(D, N))
y = rand(f(x, 1e-9))

# Compute logpdf (because we can)
#println(logpdf(f(x, 1e-9), y))

end #module

So, it's actually the call to cov inside of the Cholesky decomp. I think it's a tuple type thing? Because it has to unpack the nested compositions through the recursive call to cov in compose.jl : line 16.

@willtebbutt
Copy link
Member

willtebbutt commented Apr 7, 2020

Thanks for digging down into this.

It seems plausible to me that compile times would scale very poorly in this case. I imagine that the run-time complexity is quadratic, but the distinction doesn't really matter in this case.

The run-issue likely stems from Stheno's inability to "know" whether or not two processes are independent without doing some work. The current implementation of the Sum kernel doesn't exploit independence, so will be bad.

One way to resolve this would be to implement a special IndependentGPs object, that wraps a collection of GP objects that it knows are independent by construction. You could then define various operations on them that exploit this knowledge, such as sum. I was thinking of doing something like this anyway to provide better multi-output GP support. This should deal with both compile- and run-time issues.

Is there any obvious reason that you can think of why this wouldn't work in your case, or shall I make a PR for you to try out?

@femtomc
Copy link
Author

femtomc commented Apr 7, 2020

There is nothing I can think of. Yes please!

@andreaskoher
Copy link

If you are working with time series data, this bottleneck may be greatly reduced once TemporalGPs plays nicely with Stheno

@willtebbutt
Copy link
Member

willtebbutt commented Apr 15, 2021

Good to know that it would be helpful. I should say that it's some way off at the minute, as it's going to be a bit involved to make it work.

Do you need Stheno's ability to decompose processes during training, or do you just want to decompose stuff once after you've trained the model?

edit: @andreaskoher I misread your comment to mean that you were in need of TemporalGPs + Stheno. I'll leave my comment here anyway though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants