Flux integration + Neural Kernel Network #78

willtebbutt · 2020-02-10T17:52:36Z

Relates to this issue.

A MWE an be found in the examples directory on this branch. It contains a basic demo of

how to compose a Flux model with a Stheno model (in the literal sense of composition)
a very basic, but working, NKN implementation. It's missing a load of stuff, but it gives us something to talk around.

The examples are all packaged nicely in a project, so it should be straightforward to get everything up and running.

The questions now are

have I covered all of the functionality of interest to GPFlux?
does this miss any important interface requirements?

@HamletWantToCode what do you think?

The text was updated successfully, but these errors were encountered:

HamletWantToCode · 2020-02-11T08:43:05Z

Hi @willtebbutt , the examples works well, thanks ;)

have I covered all of the functionality of interest to GPFlux?

Not all, I'm still extending it. The next main feature is to make GP in GPFlux works as a normal neural network layer ( I'm currently working on it ), which I think may support e.g. Deep Gauss Process ( Neil Lawrence ) and Variational Gauss process ( David Blei ). ( hope I understand the idea of these papers correctly )

does this miss any important interface requirements?

For these two functionalities, I think you have provided enough interfaces, it's good.

For Flux integration, I personally like the second implementation you provided:

dσ², dg = Zygote.gradient(
    function(σ², g)

        # Manually transform data
        gx = ColVecs(g(x.X))

        # Construct GP and compute marginal likelihood using transformed data
        f = σ² * GP(eq(), GPC())
        fx = f(gx, 0.1)
        return logpdf(fx, y)
    end,
    σ², g,
)

I think it's more clear and straightforward, this is also the way used in GPFlux.

HamletWantToCode · 2020-02-11T09:19:16Z

I found there are related issues here, this can be realized in NKN. Also sparse GP mentioned here is considered to be added to GPFlux in future.

willtebbutt · 2020-02-11T11:47:48Z

The next main feature is to make GP in GPFlux works as a normal neural network layer

Could you elaborate a little on what this will look like?

I found there are related issues here, this can be realized in NKN. Also sparse GP mentioned here is considered to be added to GPFlux in future.

I actually don't think that the linear combination of kernels thing is the right way to go about implementing the NKN. As I showed on the branch linked above, I think the right way is probably to have a custom NKN kernel that accepts a collection of primitive kernels, and a Chain (or some other Flux construct) that contains them).

I think it's more clear and straightforward, this is also the way used in GPFlux.

Good to know -- this approach just works out of the box, so there's literally no need to provide explicit integration in Flux to make this work -- I just need to implement worked examples :)

HamletWantToCode · 2020-02-11T14:06:23Z

I think the right way is probably to have a custom NKN kernel that accepts a collection of primitive kernels, and a Chain (or some other Flux construct) that contains them).

I agree on that, in fact, GPFlux implement a NeuralKernelNetwork type to support NKN, which slightly modifies Flux's Chain.

struct NeuralKernelNetwork{T<:Tuple} <: AbstractKernel
	layers::T
	NeuralKernelNetwork{T}(ls...) where {T} = new{T}(ls)
end

I actually don't think that the linear combination of kernels thing is the right way to go about implementing the NKN

Maybe I don't make myself clear, here I mean most composite kernels can be viewed as special cases of NKN ( linear combination of kernels can also be viewed as a NKN that only has linear layer, weights of the linear layer is the same as coefficients infront of kernels ). In GPFlux, I use NKN as backend for addition kernel and product kernel.

const ProductCompositeKernel = NeuralKernelNetwork{Tuple{Primitive, typeof(allProduct)}}
const AddCompositeKernel = NeuralKernelNetwork{Tuple{Primitive, typeof(allSum)}}

Could you elaborate a little on what this will look like?

This idea comes from the fact that GP is equivalent to a neural network layer which has infinitely many neurons ( some constraints are needed for the weights of this layer ) ( this is indicated by Neal in 1994, proof can be found here in section 2 ). Give me some time, I will try to provide you an example this week :)

willtebbutt · 2020-02-11T14:16:02Z

Maybe I don't make myself clear, here I mean most composite kernels can be viewed as special cases of NKN ( linear combination of kernels can also be viewed as a NKN that only has linear layer, weights of the linear layer is the same as coefficients infront of kernels ). In GPFlux, I use NKN as backend for addition kernel and product kernel.

Good points. I wonder whether there are any performance implications associated with this though... hmmm.

This idea comes from the fact that GP is equivalent to a neural network layer which has infinitely many neurons ( some constraints are needed for the weights of this layer ) ( this is indicated by Neal in 1994, proof can be found here in section 2 ). Give me some time, I will try to provide you an example this week :)

Ah, I see -- this line of work is slightly different from the Deep GP or variational GP stuff, so I'll be interested to see what you come up with :) Its not clear to me how this will play with the usual Flux way of doing this, in particular how it plays with distributions over functions rather than the deterministic objects that Flux works with.

HamletWantToCode · 2020-02-16T18:03:02Z

Hi @willtebbutt , I just finish some initial work on Gaussian process layer we discussed last week, implementation can be found in this notebook on this branch. I strongly suggest you to run this notebook by:

git clone git@github.com:HamletWantToCode/GPFlux.jl.git
git checkout develop
In Julia REPL, run add with the location of the GPFlux folder
Then run the notebook

willtebbutt · 2020-02-16T19:34:54Z

Ah I see. Looks like a nice API to aim for -- definitely needs some way to perform inference though as you're currently just sampling from the prior each time the function is evaluated.

As regards testing with finite differencing -- you just have to be really careful to use exactly the same seed each time you evaluate the function. In particular, you should deterministically set the seed inside the function.

I've added a Stheno.jl version of this proposal here for reference - the point being that Stheno.jl can do all of this stuff with minimal modification.

HamletWantToCode · 2020-02-17T03:09:25Z

Excellent, it's great to know that Stheno has built-in support for this :)

Based on our previous discussion, I think we both agree that this Flux integration should include:

Using neural network for feature extraction, then classifying/regressing on the extracted features by GP, training the neural network and GP jointly ( also known as deep kernel learning ).
A new kernel type Neural Kernel Network, composite kernels can be built on top of NKN.
Using GP as a non-parametric layer inside a neural network, this is parameter efficient, robust to overfitting and is able to propagate uncertainties.

AFAIK, these functionalities aren't included in Julia community. In Python, Gpytorch supports the first one, and Tensorflow recently has the last one supported in it's extension ( Pyro may support both ). I'd like to have these APIs integrated in Stheno.

willtebbutt · 2020-02-17T12:08:51Z

Excellent. Let's tackle point number 1 first then, as I think it's likely the most straightforward in the sense that there's no real integration to be done, we just need example code and good documentation. Do you agree / what kinds of resources do you think would be helpful to address this?

HamletWantToCode · 2020-02-17T17:01:38Z

Do you agree / what kinds of resources do you think would be helpful to address this?

I think recover some experiments on deep kernel learning paper is a good starting point, Gpytorch also use it as a demo. I will working on this these days.

willtebbutt · 2020-02-18T00:17:34Z

Sounds good. Probably best to start with a small toy dataset where exact inference is tractable. You could also do something with pseudo-points quite straightforwardly -- see Stheno's elbo function :)

HamletWantToCode · 2020-02-22T11:53:33Z

Hi @willtebbutt , I have implemented a simple step function fitting example here, it use a feedforward neural network plus a GP with ARD kernel, I also write a binary classification example that use Stheno and Turing, examples are writen in a jupyter notebook and are packaged in a project, so it should be straightforward to get everything up and running.

I noticed Stheno has a model zoo but it seems outdated now, so where should these examples go into ?

HamletWantToCode · 2020-02-22T12:29:12Z

I have run into a problem when trying to make Stheno's output data type to be Float32 ( In Flux, the default data type for neural network parameters is Float32, this could greatly reduce the computation cost when we have large dataset ). Below is an example to reproduce my problem:

using Stheno

X = rand(Float32, 4, 10);
y = rand(Float32, 10);
l = rand(Float32, 4);
σ² = rand(Float32);
kernel = σ²*stretch(EQ(), 1.0f0 ./ l);
K_matrix = pw(kernel, ColVecs(X))  |> eltype    # Float32

gp = GP(0.0f0, kernel, GPC());
noisy_prior = gp(ColVecs(X), 0.01f0);

rand(noisy_prior) |> eltype   # Float64 !!!
logpdf(noisy_prior, y) |> eltype   # Float64 !!!

Though I convert all the input parameter type to Float32, the last two line still give me Float64 type

willtebbutt · 2020-02-24T19:00:22Z

The example looks really good @HamletWantToCode . Will take a proper look once we're satisfied that the type stability issues have been resolved.

I'm creating a new examples folder at the minute. I've got a branch with them all on that I'll aim to get on to master in the next day or so. It'll be best if you just add a new sub-directory in there once it's available.

willtebbutt · 2020-02-26T08:40:02Z

Right, I would say that the first item has been more-or-less completed.

Are you up for getting on with adding the Neural-Kernel-Network Kernel @HamletWantToCode ?

HamletWantToCode · 2020-02-26T14:04:01Z

Yeah, here are my ideas about implementation of Neural-Kernel-Network kernel:

NKN should be a subtype of Stheno's Kernel type, and should have ew & pw interfaces
it's construction is similar to Flux's Chain, three types of layer can be added to it: primitive layer ( contains basic kernels ), linear layer ( superposition of kernel ) and product layer ( kernel production ), also we have to include activation functions ( currently exp function )
it's parameters ( including kernel hyperparameters & linear layer's weight and bias ) can be efficiently extracted ( here I mean we may need a function like Flux's params ) and redistributed ( this is because Optim.jl package only allows us to write all parameters into a 1D array ).

this is an example implementation of parameter redistribution function, hope I have made it clear:

function dispatch!(model, xs::AbstractVector)
    loc = 1
    for p in params(model)
        lp = length(p)
	x = reshape(xs[loc:loc+lp-1], size(p))
        copyto!(p, x)
	loc += lp
    end
    model
end

What do you think ? @willtebbutt

willtebbutt · 2020-02-26T14:39:11Z

I think this plan sounds very reasonable.

Completely agree
I wonder whether we could just directly use Flux's chain type here?
This also sounds completely reasonable. It's hard to know exactly how we want this to look until there's a PR though.

Maybe open a PR with your plan from above and we can discuss further on that?

willtebbutt · 2020-02-26T14:39:24Z

Again, didn't mean to close...

HamletWantToCode · 2020-02-26T14:58:09Z

I wonder whether we could just directly use Flux's chain type here?

I also consider using it, but there maybe some performance related problem, I'll write something first and then we can discuss it explicitly.

Maybe open a PR with your plan from above and we can discuss further on that ?

I'll open a PR once I finish it :)

willtebbutt · 2020-03-20T09:12:43Z

Great job with the Neural Kernel Network implementation @HamletWantToCode . I noticed that there aren't currently any of the activation function layers implemented. Would you be up for adding the basics quickly before we move on to the third item on the list?

HamletWantToCode · 2020-03-20T09:50:39Z

The reason I don't include activation functions in the PR are:

Only limited type of functions are allowed, currently just polynomials with positive coefficients & exp function which can be easily implemented by users.
( Personal opinion ) In practice ( for time series data ), I found activation functions aren't that useful as it is in normal neural network ( since kernels already represent nonlinearity ).

Personally, I'd like to include the activation functions once more of them are found and when they are proved to be useful.

willtebbutt · 2020-03-20T09:57:27Z

Fair enough. I'm happy not to worry about them for the time being -- as you say, we can always add them later if the need arises.

willtebbutt mentioned this issue Feb 24, 2020

Make forwards-pass type stable for Float32 #83

Merged

willtebbutt closed this as completed in #83 Feb 25, 2020

willtebbutt reopened this Feb 26, 2020

willtebbutt closed this as completed Feb 26, 2020

willtebbutt reopened this Feb 26, 2020

HamletWantToCode mentioned this issue Mar 7, 2020

AbstractModel type & neural kernel network #94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux integration + Neural Kernel Network #78

Flux integration + Neural Kernel Network #78

willtebbutt commented Feb 10, 2020

HamletWantToCode commented Feb 11, 2020

HamletWantToCode commented Feb 11, 2020

willtebbutt commented Feb 11, 2020

HamletWantToCode commented Feb 11, 2020

willtebbutt commented Feb 11, 2020

HamletWantToCode commented Feb 16, 2020

willtebbutt commented Feb 16, 2020

HamletWantToCode commented Feb 17, 2020

willtebbutt commented Feb 17, 2020

HamletWantToCode commented Feb 17, 2020

willtebbutt commented Feb 18, 2020

HamletWantToCode commented Feb 22, 2020

HamletWantToCode commented Feb 22, 2020

willtebbutt commented Feb 24, 2020

willtebbutt commented Feb 26, 2020

HamletWantToCode commented Feb 26, 2020

willtebbutt commented Feb 26, 2020

willtebbutt commented Feb 26, 2020

HamletWantToCode commented Feb 26, 2020

willtebbutt commented Mar 20, 2020

HamletWantToCode commented Mar 20, 2020

willtebbutt commented Mar 20, 2020

Flux integration + Neural Kernel Network #78

Flux integration + Neural Kernel Network #78

Comments

willtebbutt commented Feb 10, 2020

HamletWantToCode commented Feb 11, 2020

HamletWantToCode commented Feb 11, 2020

willtebbutt commented Feb 11, 2020

HamletWantToCode commented Feb 11, 2020

willtebbutt commented Feb 11, 2020

HamletWantToCode commented Feb 16, 2020

willtebbutt commented Feb 16, 2020

HamletWantToCode commented Feb 17, 2020

willtebbutt commented Feb 17, 2020

HamletWantToCode commented Feb 17, 2020

willtebbutt commented Feb 18, 2020

HamletWantToCode commented Feb 22, 2020

HamletWantToCode commented Feb 22, 2020

willtebbutt commented Feb 24, 2020

willtebbutt commented Feb 26, 2020

HamletWantToCode commented Feb 26, 2020

willtebbutt commented Feb 26, 2020

willtebbutt commented Feb 26, 2020

HamletWantToCode commented Feb 26, 2020

willtebbutt commented Mar 20, 2020

HamletWantToCode commented Mar 20, 2020

willtebbutt commented Mar 20, 2020