Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oslo method errors #187

Draft
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

vetlewi
Copy link
Collaborator

@vetlewi vetlewi commented Sep 27, 2021

This PR introduces a new method for estimating the relative errors of the individual points of the NLD and gSF. A more detailed description on how it works, etc. will be made available once ready for merge.

The PR is still work in progress

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@fzeiser fzeiser added needs release note Suggestion Suggestion for new feature/changes labels Sep 29, 2021
ompy/dist/fermi_dirac.py Outdated Show resolved Hide resolved
setup.py Outdated
@@ -171,7 +171,8 @@ def write_version_py(filename='ompy/version_setup.py'):
"uncertainties>=3.0.3",
"tqdm",
"pathos",
"pybind11>=2.6.0"
"pybind11>=2.6.0",
"pymc3>=3.11.2,<4.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making pymc3, pymulintest (...) optional later on? -> So one could have a leightweight version of ompy reading and plotting spectra.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about the same. I'm not sure exactly how this is best solved. Maybe that could be the goal for next major version of the OMpy project?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an issue. See #195.

release_note.md Outdated Show resolved Hide resolved
Comment on lines +51 to +52
error_estimator (ErrorFinder): The algorithm used to estimate the
relative errors of the extracted NLD and γSF.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistent naming: error_estimator? -> ErrorEstimator class...?

Comment on lines +64 to +65
rel_err_missing (float): Relative error used for points that cannot be
estimated by error_estimator object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would that be the case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes if the statistics are low some of the realizations may have different number of points in the extracted NLD and GSF (values that are not nan).

regenerate: Optional[bool] = None):
"""Decompose each first generation matrix in an Ensemble

If `regenerate` is `True` it saves the extracted nld and gsf to file,
or loads them if already generated. Exposes the vectors in the
attributes self.nld and self.gsf.

If the error_estimator attribute or argument is set then relative
errors will be estimated using the provided algorithm. Points in the
nld and gsf that the algorithm are unable to estimate will be set to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: /Relative error/ of points (...) will be set to self.rel_err_missing.

Comment on lines +189 to +194
nld.save(self.path / f'nld_{i}.npy') # Overwrite with errors!
gsf.save(self.path / f'gsf_{i}.npy') # Overwrite with errors!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the meaning of the inline comment? is this supposed to happen here (but not yet...)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of attribute error_estimator is set or not, the extracted NLD/GSF will be written to disk here: https://github.com/vetlewi/ompy/blob/1ab2618ec0c2b2e7b8c5935242183587e3451f00/ompy/extractor.py#L169-L170

If error_estimator is set then these two lines will overwrite the generated files on the disk. Could change the comment to Update files on disk with errors or something similar

ompy/error_finder.py Outdated Show resolved Hide resolved
This class uses pyMC3 to calculate the relative errors of the data points
in the extracted NLDs and γSFs. The class implements two different models,
one logarithmic and one linear. The logarithmic model will usually be more
stable and should be used if the linear doesn't converge.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something about the article (to be submitted)

Comment on lines 291 to 293
Tuple of nld energy points, gsf energy points and the observed
matrix for the NLD and γSF.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the observed matrix for the nld and gsf? Outdated? The return in definitively different now

return vec_all_common

def condition_data(self, _nlds: List[Vector], _gsfs: List[Vector],
) -> Tuple[ndarray, ndarray, ndarray, ndarray]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return signature inconsistent (see below)?

ompy/error_finder.py Outdated Show resolved Hide resolved
ompy/error_finder.py Outdated Show resolved Hide resolved
else:
return nld_rel_err, gsf_rel_err

def keep_only(self, vecs: List[Vector]) -> List[Vector]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to below condition_data?

ompy/error_finder.py Outdated Show resolved Hide resolved
Comment on lines 364 to 377
# Make a copy of the values arrays
nld_array = [nld.values.copy() for nld in nlds]
gsf_array = [gsf.values.copy() for gsf in gsfs]

del nld_array[n]
del gsf_array[n]

nld_array = np.array(nld_array)
gsf_array = np.array(gsf_array)

# Set the observed data
nld_obs.append(nld.values[idx_nld]/nld_array)
gsf_obs.append(gsf.values[idx_gsf]/gsf_array)
Copy link
Collaborator

@fzeiser fzeiser Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This is somewhat funky. Why do you need to manually delete the array -- but after setting it by a copy?
    I if I understand this correctly, it is because you want to get rid of the reference dataset. Maybe some comments would be appreciated here.
  • I'm very confused about this section:
              # Set the observed data
              nld_obs.append(nld.values[idx_nld]/nld_array)
              gsf_obs.append(gsf.values[idx_gsf]/gsf_array)
    Why/ what does it do? Why is it a devision (...)? Is it q in the article, and not the nld (gsf)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this entire part of the code. Should be much easier to follow now. Please see the format_data function (https://github.com/vetlewi/ompy/blob/621c95c4797bf11fefd6a95a032c9296fc0c7054/ompy/error_finder.py#L282-L322)

Comment on lines 123 to 120
nld_obs (ndarray): The observation tensor of the NLD
gsf_obs (ndarray): The observation tensor of the γSF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where and everywhere else: Would it be better to say _ref and the reference tensor, instead of observed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored. Should be more understandable.

nld_obs = np.array(nld_obs)
gsf_obs = np.array(gsf_obs)

idx_coef_nld = np.repeat(np.arange(N-1), M_nld).reshape(N-1, M_nld)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have got no idea what idx_coef_nld is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idx_coef_nld is just a masking/indexing that are used in the pyMC3 model to broadcast the $D$, $F$ and $\alpha$ parameters to the correct shape. I've renamed these as coef_mask_nld, values_mask_nld, coef_mask_gsf and values_mask_gsf.

Comment on lines +129 to +126
median (bool, optional): If the resulting relative errors should be
the mean or the median.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm falling out here: What do you mean by mean or median? This does not seem to be described in the article?

Copy link
Collaborator Author

@vetlewi vetlewi Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What to adopt as the value for the relative uncertainty. The mean (average) or the median (50% percentile). This can probably be removed. If the model works as expected these should coincide, I think. If anyone wants to dive deeper into the results they should probably look at the samples directly.

ompy/error_finder.py Outdated Show resolved Hide resolved
Comment on lines +166 to +171
mid = np.median if median else np.mean
nld_rel_err = Vector(E=E_nld, values=mid(trace['σ_ρ'], axis=0),
std=np.std(trace['σ_ρ'], axis=0), units='MeV')
gsf_rel_err = Vector(E=E_gsf, values=mid(trace['σ_f'], axis=0),
std=np.std(trace['σ_f'], axis=0), units='MeV')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, not quite sure about this step, yet.

ompy/error_finder.py Outdated Show resolved Hide resolved
vetlewi and others added 20 commits October 28, 2021 14:58
Co-authored-by: Fabio Zeiser <fzeiser@users.noreply.github.com>
Co-authored-by: Fabio Zeiser <fzeiser@users.noreply.github.com>
Co-authored-by: Fabio Zeiser <fzeiser@users.noreply.github.com>
The prior parameters for σ_ρ and σ_f (`lam` & `mu`) can now be modified by the user.
I've added an attribute to make it simple to suppress warnings
A slightly different expression is used as it gives slightly better numerical accuracy at the extremes (q=0 and q=1).
I've removed the linear version for now. I don't think I want to spend time on making sure everything works correctly just yet. Might come back at a later time when I'm confident that everything else are as it should be.
I've refactored the `ErrorFinder` class significantly to make it much less complex.  It should be easier to follow the code and the paper.
The normalizers will automatically remove any nan's in the input GSF or NLD. This will ensure that the normalizers actually gives useful results. It should be considered if the user should be informed if any nan's were removed. Maybe add best solved by adding an optional `logger` argument to the `cut_nan()` method that will be used to inform the user if there is nans in the vector?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Suggestion Suggestion for new feature/changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants