Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FitStatistics / Priors ideas. #4237

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

nbiederbeck
Copy link
Contributor

This is a very preliminary proposal for implementing (1) more flexible FitStatistics, and (2) priors on Parameters.

With this implementation, both can have the same base class, are equally serializable, can have non-breaking defaults, and can be provided by users.

Dear reviewer

Certainly I am missing something obvious. I open this PR, so we can look at the same code at the same time.

Please see also my slides from the coding sprint and the earlier notebook from Axel

Copy link
Contributor

@registerrier registerrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nbiederbeck .

I have left inline comments. Maybe adding docstrings would help.

My main comment is that FitStatistic should be independent of the Dataset structure but only expect array-like inputs.

@@ -163,6 +163,9 @@ def parameters(self):
[getattr(self, name) for name in self.default_parameters.names]
)

def stat_sum(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the stat_sum should rather be on the Models or Parameters object. That's where we have the global information on the models used.
Is there a use case for computing the priors associated to a single model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that priors might inherit from ModelBase. In which case the evaluate or __call__ methods would be used to compute the associated statistic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. To me it seems that a single model is a subset of Models (e.g. len(Models) == 1) right? Then this would surely be better placed there. Please forgive my ignorance about the Gammapy internals and where put this best.

Copy link
Member

@adonath adonath Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should figure out a clean API here. I think the Models.stat_sum() is confusing a bit as well. Long-term we would like to have a more separated API between MapDataset, Models and Fit. I am very sure that we will evolve into a direction where models are passed to e.g MapDataset.stat_sum(models=models). So I would propose to evolve into a direction of e.g. implementing a PriorFitStatistic class, which takes the Models class and evaluates the priors. Like PriorFitStatistic(beta=1.).stat_sum(models=models). By now it seems to me this PriorFitStatistics will only need to be supported for Datasets.

@@ -145,6 +147,12 @@ def __init__(
self.interp = interp
self.scale_method = scale_method

def stat_sum(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does very little in practice. It is probably not needed. Everything could calculated in the global stat_sum where the loop over parameters is performed.

class FitStatistic:
"""Calculate -2 * log(L)."""

def stat_sum(self, dataset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does not make sense to bind FitStatistic to the Dataset classes. They should only expect array-like input. (see CountsStatistic).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are totally right about that!

@@ -180,6 +200,7 @@ class MapDataset(Dataset):
psf = LazyFitsData(cache=True)
mask_fit = LazyFitsData(cache=True)
mask_safe = LazyFitsData(cache=True)
fit_statistic = CashFitStatistic()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely require a setter method because we will have to check that a FitStatistic is applicable to a Dataset.

@registerrier registerrier marked this pull request as draft December 7, 2022 08:02
Comment on lines +488 to +500
if par.prior is not None:
args = {}
# this is almost duplicated code from gammapy.dataset.map
for key in signature(par.prior.stat_sum).parameters.keys():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently this might evaluate all Priors for all Parameters. This would of course has to be fixed properly. I was thinking about

parameters.x
# but need to do
parameters['x'].value

and so each parameter has the attribute .value.

@@ -319,3 +322,41 @@ def test_stat_contour():

# Check that original value state wasn't changed
assert_allclose(dataset.models.parameters["y"].value, 300)


def test_with_prior():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is certainly at the wrong position, but I added it to show how setting and evaluating priors can work.

@nbiederbeck
Copy link
Contributor Author

While doing this, I was thinking about how to implement #3955. I need to further investigate this, but I assume it is going in a good direction. Please let me know what you think. Thank you.

@codecov
Copy link

codecov bot commented Dec 7, 2022

Codecov Report

Merging #4237 (1032771) into main (ec9df57) will decrease coverage by 0.12%.
The diff coverage is 61.68%.

@@            Coverage Diff             @@
##             main    #4237      +/-   ##
==========================================
- Coverage   94.83%   94.71%   -0.12%     
==========================================
  Files         216      216              
  Lines       30501    30597      +96     
==========================================
+ Hits        28925    28980      +55     
- Misses       1576     1617      +41     
Impacted Files Coverage Δ
gammapy/irf/background.py 97.00% <ø> (ø)
gammapy/irf/edisp/map.py 96.47% <ø> (ø)
gammapy/modeling/models/core.py 94.50% <35.71%> (-1.66%) ⬇️
gammapy/datasets/map.py 92.17% <59.61%> (-1.69%) ⬇️
gammapy/modeling/parameter.py 94.43% <66.66%> (-2.42%) ⬇️
gammapy/data/data_store.py 94.42% <100.00%> (ø)
gammapy/datasets/core.py 92.83% <100.00%> (+0.05%) ⬆️
gammapy/irf/core.py 93.11% <100.00%> (ø)
gammapy/irf/io.py 92.42% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@nbiederbeck nbiederbeck force-pushed the fit-statistics-and-priors branch 2 times, most recently from e5d7932 to 6d23f1c Compare February 14, 2023 09:56
@@ -1080,13 +1179,18 @@ def plot_residuals(
return ax_spatial, ax_spectral

def stat_sum(self):
"""Total statistic function value given the current model parameters."""
counts, npred = self.counts.data.astype(float), self.npred().data
"""Total likelihood given the current model parameters."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really the likelihood or -2log(L)?

This is a very preliminary proposal for implementing
(1) more flexible FitStatistics
(2) priors on Parameters

With this implementation, both can have the same base class,
are equally serializable, can have non-breaking defaults, and be
provided by users.

Signed-off-by: Noah Biederbeck <noah.biederbeck@tu-dortmund.de>
Signed-off-by: Noah Biederbeck <noah.biederbeck@tu-dortmund.de>
Signed-off-by: Noah Biederbeck <noah.biederbeck@tu-dortmund.de>
Signed-off-by: Noah Biederbeck <noah.biederbeck@tu-dortmund.de>
Signed-off-by: Noah Biederbeck <noah.biederbeck@tu-dortmund.de>
@registerrier registerrier added this to the 1.2 milestone Apr 21, 2023
@bkhelifi
Copy link
Member

@nbiederbeck can you update your code of your branch from the main, such that the tests are passing? thanks

@nbiederbeck
Copy link
Contributor Author

Hi @bkhelifi, thanks for reaching out. My plate is full this month, I can start looking into this in November. If someone steps in earlier and takes over, I'm happy to assist.

@registerrier registerrier modified the milestones: 1.2, wishlist Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants