Regarding the Covariance Matrix of Multi-Output Gaussian Processes #28484

LecJackS · 2024-02-20T14:17:59Z

LecJackS
Feb 20, 2024

I'm fitting data (2d input, 2d output) to a Gaussian Process using a Gaussian Process Regressor but when trying to get the covariance matrix I'm getting only 2 values, not 4 as in a 2x2 covariance matrix.

The documentation says that this is the correct output shape, so I have some questions about it:

predict(X, return_std=False, return_cov=False)

Returns:
y_mean : ndarray of shape (n_samples,) or (n_samples, n_targets)
Mean of predictive distribution a query points.

y_std : ndarray of shape (n_samples,) or (n_samples, n_targets), optional
Standard deviation of predictive distribution at query points. Only returned when return_std is True.

y_cov : ndarray of shape (n_samples, n_samples) or (n_samples, n_samples, n_targets), optional
Covariance of joint predictive distribution a query points. Only returned when return_cov is True.

Should I interpret the two values returned for each sample are meant to be the diagonal values of a diagonal covariance matrix?
What about the non-diagonal elements? Do the GPR assumes independence between output dimensions? So it's equivalent to fit two different GPR instances with 1 output dimension, one instance for each target data dimension?
And lastly, why I'm getting always the same two values for the returned covariance matrix? I tried using real-world noisy data and some toy problems, but the covariance values returned by the GPR are always equal for both dimensions (the same one as if returning std but squared, as they correspond to the variance). Maybe I'm wrong here expecting something else, but what should be the point of returning two values if they are always the same?

This is a simple example where I expect to have different covariance values for both target dimensions as I move only the first dimension of x:

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RationalQuadratic

# Simple dataset
X = np.array([[0,0], [0.5, 0.5], [1,1]])
y = np.array([[0,0], [0,   0.5], [0,1]])

kernel = RationalQuadratic()
gpr = GaussianProcessRegressor(kernel=kernel, random_state=42)
gpr.fit(X, y)

# Move x in the first dimension from 0 to 1
n = 5
x_steps_d0 = np.linspace(0, 1, n)
x_steps_d1 = [0] * n

new_xs = np.array(list(zip(x_steps_d0, x_steps_d1)))

# Predict covariance at each point in this walk from a fitted point to an unfitted point
y_pred, y_cov = gpr.predict(new_xs, return_cov=True)

print("All the same:", np.all(y_cov[:, :, 0] == y_cov[:, :, 1]))
print("y_pred:\n", y_pred.round(3))
print("y_cov:\n", y_cov.round(3))

Which prints:

All the same: True

y_pred:
[[0. 0. ]
[0. 0.121]
[0. 0.245]
[0. 0.368]
[0. 0.489]]

y_cov:
[[[0. 0. ]
[0. 0. ]
[0. 0. ]
[0. 0. ]
[0. 0. ]]

[[0. 0. ]
[0.003 0.003]
[0.005 0.005]
[0.008 0.008]
[0.01 0.01 ]]

[[0. 0. ]
[0.005 0.005]
[0.011 0.011]
[0.016 0.016]
[0.021 0.021]]

[[0. 0. ]
[0.008 0.008]
[0.016 0.016]
[0.024 0.024]
[0.032 0.032]]

[[0. 0. ]
[0.01 0.01 ]
[0.021 0.021]
[0.032 0.032]
[0.042 0.042]]]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the Covariance Matrix of Multi-Output Gaussian Processes #28484

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Regarding the Covariance Matrix of Multi-Output Gaussian Processes #28484

LecJackS Feb 20, 2024

Replies: 0 comments

LecJackS
Feb 20, 2024