Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restrict observations to a specific (sub-) process #176

Open
andreaskoher opened this issue Apr 15, 2021 · 7 comments
Open

restrict observations to a specific (sub-) process #176

andreaskoher opened this issue Apr 15, 2021 · 7 comments

Comments

@andreaskoher
Copy link

andreaskoher commented Apr 15, 2021

Hey @willtebbutt ,

again a really cool package and sorry for bothering you a lot these days. I am trying to estimate the day of week effect in a time series. For the sake of an example let's assume the data was generated from the process below:

x_₃ = x_₁ = 1. : 365.
x_₂ = zeros(length(x_₃))
x_₂[1:7:end] .= 1 # there is some effect every, say Monday

effect = 1.

f = @gppp let
    f₁ = GP( Matern52Kernel()  ScaleTransform(1/100) )
    f₂ = GP( effect * LinearKernel() )
    f₃ = f₁ + f₂
end

x₁ = GPPPInput(:f₁, x_₁)
x₂ = GPPPInput(:f₂, x_₂)
x  = BlockData(x₁, x₂)

fx = f(x, 1e-3)
y₁, y₂ = split(x, rand(fx))
y₃ = y₁ + y₂
plot(x_₃, y₃)

dayofweekeffect

In order to fit hyper parameters, I have to evaluate logpdf(fx, y₃). This fails of course because it expects observations for the latent process f₂ as well as f₃. Is there some way to restrict observations to f₃ only?

@willtebbutt
Copy link
Member

willtebbutt commented Apr 15, 2021

again a really cool package and sorry for bothering you a lot these days

Thanks. It's no bother at all -- I'm keen to be told all of the things that don't work as intended / about anything that's unclear!

Is there some way to restrict observations to f₃ only?

Yes :) If you just replace fx = f(x, 1e-6) with fx = f(x₃), that should do it.

You'll not need the split function anymore, as you're only returning samples from a single process from f. Soy = rand(fx) should be fine.

The over-arching point here is that both GPPPInputs and BlockData are acceptable input vectors for a GPPP. a GPPPInput will just get you points in a single process from f, whereas a BlockData lets you get multiple processes at once.

Would I be correct in assuming that it would be helpful if I clarified this in the example / docs?

@andreaskoher
Copy link
Author

Many thanks for the quick response!
and sorry for the badly explained example (I updated it slightly, especially ConstantKernel -> LinearKernel).
I also think the docs are good and I feel that I understood GPPPInput and BlockData, but lets see :)

So assuming that I have only (x_₃, y₃) as an observable time series and I further assume that it can be decomposed into a smooth underlying part y₁ and a constant (unknown) effect for one particular day of the week, say Monday. As covariates, I have the day of year x_₃ = x_₁ and a boolean vector that indicates each Monday x_₂. Now, my goal is to decompose y₃ into y₁ and y₂, i.e. estimate the Monday effect and find the underlying smooth function.

Feeding only x_₃ into f without the covariates x_₂ that indicate the special day, would not give the correct posterior:

x₃ = GPPPInput(:f₃, x_₃)
fx = f(x₃, 1e-3)
f′ = posterior(fx, y₃)
plot(x_₃, rand( f′(x₃, 1e-9) ))

dayofweekeffect_posterior

@willtebbutt
Copy link
Member

willtebbutt commented Apr 15, 2021

Ahhh I see. Sorry, I missed the detail about x_1 and x_2!

To make sure that I'm understanding properly, would it be correct to say that the function you're after is something like:

f(day_of_year, is_monday) = f_1(day_of_year) + f_2(is_monday)

?

@andreaskoher
Copy link
Author

Yes that's it :) (and I just realized that I haven't updated the example)
So it is two input and one output dimensions in this example. It is also similar to cover example from the textbook Bayesian Data Analysis 3 (p. 505-509) where Aki Vehtari used GPStuff to decompose a time series and here x_₂ would indicate special holidays.

@willtebbutt
Copy link
Member

willtebbutt commented Apr 16, 2021

Just taken a look at the example -- looks like he's chopping it up by multipling with another function.

The way I would think to do this would be to use the multiply-by-the-function formulation that's use in BDA, and to write something like

using AbstractGPs
using Plots
using Stheno

# The points in time under consideration.
x = 1:365

# Create is_monday function. Let's pretend these actually correspond to Mondays.
is_monday(t) = rem(t, 7) == 0 ? 1 : 0

effect_size = 0.1

f = @gppp let
    f₁ = GP(Matern52Kernel()  ScaleTransform(1/100))
    f₂ = is_monday * (effect_size * GP( ConstantKernel() ))
    f₃ = f₁ + f₂
end;

# Compute the posterior given observations only of f₃.
x₃ = GPPPInput(:f₃, x)
fx = f(x₃, 0.01)
y₃ = rand(fx)
f_post = posterior(fx, y₃)

# Compute the posterior marginals over the sum of the two processes.
x3 = GPPPInput(:f₃, x)
marginals(f_post(x3))

# Compute the posterior marginals over the effect size, f₂.
x2 = GPPPInput(:f₂, x)
marginals(f_post(x2))

# Compute the posterior marginals over the other latent process.
x1 = GPPPInput(:f₁, x)
marginals(f_post(x1))

# Plot everything.
plot(x, f_post(x3, 1e-6); color=:blue, ribbon_scale=3)
plot!(x, f_post(x2, 1e-6); color=:green, ribbon_scale=3)
plot!(x, f_post(x1, 1e-6); color=:red, ribbon_scale=3)
scatter!(x, y₃; color=:black, markersize=1)

Does this do what you need?

Actually, if you're doing this, would you be up for contributing it to the examples section when you're done, with a reference to those pages of BDA?

edit: updated kernel + effect size + observation noise variance to make the visuals nicer.

@andreaskoher
Copy link
Author

andreaskoher commented Apr 18, 2021

Yes that's it, really nice!
More abstractly the solution is of the form

f(x) = f₁(x) + I(x)f₂(x) where I(x) is an indicator function and f₂ is a constant kernel GP.

This directly corresponds to the text book example. Interestingly, however, the authors implementation with GPStuff (Part b) is more like your comment above, i.e.

f(x₁,x₂) = f₁(x₁) + f₂(x₂) where f₂ is a GP with a linear kernel and x₂ is a boolean vector.

The latter approach seems somewhat more general but, in any case, your approach solves my use case and the text book example.

Thanks for that and I will be happy to add an example to Stheno.

@willtebbutt
Copy link
Member

The latter approach seems somewhat more general but, in any case, your approach solves my use case and the text book example.

Excellent. Yeah, it's an interesting one. On some level, the additive implementation I previously proposed / that is used in GPStuff feels like what you would do if you could not, for some reason or another, implement it using the indicator function route. I guess they're equivalent though, 🤷

Thanks for that and I will be happy to add an example to Stheno.

Excellent, I look forward to it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants