Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict_f and predict_y as NamedTuple #1657

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

antonykamp
Copy link

@antonykamp antonykamp commented Mar 28, 2021

PR type: enhancement

Related issue: #1567

Summary

Proposed changes

  • By requesting predict_f or predict_y, GPflow returns an instance of MeanAndVariance, a subclass of NamedTuple. By that, the user can access mean and variance individually.
  • You can find a hint to this enhancement in the example "Basic (Gaussian likelihood) GP regression model".

What alternatives have you considered?

I've considered shorter names for.mean and .variance but rejected this idea because the current resulting code is a) more readable b) already shorter than before.

Minimal working example

import gpflow
import numpy as np

rng = np.random.RandomState(0)

data = rng.randn(100, 2), rng.randn(100, 1)
Xtest, _ = rng.randn(Ntest, D), rng.randn(Ntest, 1)
kernel = Matern32() + gpflow.kernels.White()
model_gp = gpflow.models.GPR(data, kernel=kernel)

mu_f = model_gp.predict_f(Xtest).mean # instead of 'model.predict_f(Xtest)[0]' or 'mu_f, _ = model.predict_f(Xtest)'
var_y = model_gp.predict_y(Xtest).variance # instead of 'model.predict_y(Xtest)[1]' or '_, var_y = model.predict_y(Xtest)'

PR checklist

  • New features: code is well-documented
    • detailed docstrings (API documentation)
    • notebook examples (usage demonstration)
  • The bug case / new feature is covered by unit tests
  • Code has type annotations
  • I ran the black+isort formatter (make format)
  • I locally tested that the tests pass (make check-all)

Release notes

Fully backwards compatible: yes

Commit message (for release notes):

  • MeanAndVariance as NamedTuple
  • Added test about backwards compatibility
  • Added hint in docs, format

(Is likely to be rebased)

@antonykamp antonykamp changed the title Antonykamp/predict as named tuple predict_f and predict_y as NamedTuple Mar 28, 2021
Copy link
Member

@st-- st-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution - I think this will be really helpful! I'd just like to hear from some of the other people involved with the GPflow project what their thoughts on the naming of the variance field are. :)

@@ -179,6 +179,9 @@
_ = plt.xlim(-0.1, 1.1)


# %% [markdown]
# Moreover, you can get the mean and variance data individually for example as `m.predict_f(xx).mean` instead of `m.predict_f(xx)[0]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Moreover, you can get the mean and variance data individually for example as `m.predict_f(xx).mean` instead of `m.predict_f(xx)[0]`.
# Moreover, you can get the mean and variance predictions individually, using for `m.predict_f(xx).mean` instead of `m.predict_f(xx)[0]` and `m.predict_f(xx).variance` instead of `m.predict_f(xx)[1]`.

(note- if we rename the variance field we'll have to update this)

MeanAndVariance = Tuple[tf.Tensor, tf.Tensor]

class MeanAndVariance(NamedTuple):
""" NamedTuple to access mean- and variance-function separately """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""" NamedTuple to access mean- and variance-function separately """
""" NamedTuple that holds mean and variance as named fields """

""" NamedTuple to access mean- and variance-function separately """

mean: tf.Tensor
variance: tf.Tensor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a general question at other maintainers (@vdutor @markvdw @awav) & anyone else ...: what should the name of this field be? In code we often abbreviate the variance to "var", but also it sometimes represents the covariance... or maybe that should be a different type? MeanAndVariance and MeanAndCov (and return a different one depending on full_cov etc)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the idea of the two types. Haven't thought about it in detail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you always know for sure if you've got the cov or var?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass in full_cov=False and full_output_cov=False, you get the marginals back. If one of full_cov or full_output_cov is True, you get the covariance over inputs or outputs, respectively. If both are True, you should get the N P x N P covariance matrix (though this combination isn't actually implemented in several cases, I believe). So the output type is solely determined by the full_cov and full_output_cov arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw you can use typing.overload and typing.Literal to give more information than just MeanAndVariance | MeanAndCov, if you wanted to do that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a small example, which should work as wanted:
on_hover
on_space

But I' not comfortable with the if-else statement :/

from typing import Literal, NamedTuple, overload

class MeanAndVariance(NamedTuple):
    mean: int
    variance: int

class MeanAndCovariance(NamedTuple):
    mean: int
    covariance: int

@overload
def predict_f(auto_cov: Literal[False]) -> MeanAndVariance:
    ...

@overload
def predict_f(auto_cov: Literal[True]) -> MeanAndCovariance:
    ...

def predict_f(auto_cov: bool = False) -> (MeanAndVariance | MeanAndCovariance):
    # calculations
    return  MeanAndCovariance(1, 2) if auto_cov else MeanAndVariance(1, 2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a good way around the if-else, and I think it's better to be explicit than the ambiguity of having to remember whether it's a [N, Q] or [Q, N, N] tensor ...:) I'd be happy with this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it pointed out, typing.Literal is only available from Python 3.8 and up :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonykamp the typing_extensions module provides backports for older versions of Python, it does seem to include Literal. :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this pattern with overload and Tuple to the abstract method predict_f in gpflow/models/models.py. Should the overloads of predict_f be marked as abstract too? In any case I see no chance to test the Ellipsis operator of each overloaded funciton :/

Also, I wanted to ask if the parameters of the model constructor should be listed in the parametrization?

@st--
Copy link
Member

st-- commented Apr 12, 2021

Hi @antonykamp could you make sure to update RELEASE.md and CONTRIBUTORS.md as part of this PR? (See #1660 and #1661).

@antonykamp
Copy link
Author

Hi @antonykamp could you make sure to update RELEASE.md and CONTRIBUTORS.md as part of this PR? (See #1660 and #1661).

I'll take care of it after the "literal-overload-naming"-question is resolved. Otherwise, I would have to change it multiple times probably :)

@antonykamp
Copy link
Author

Closed because it's old :)

@antonykamp antonykamp closed this Dec 13, 2022
@sc336 sc336 reopened this Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants