Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/corr matrix and inverse cov matrix as input in least squares function for correlated fits #223

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

PiaLJP
Copy link
Contributor

@PiaLJP PiaLJP commented Jan 10, 2024

Added kwargs corr_matrix and inv_chol_cov_marix to the least_squares function.
Note: For the kwargs to work correlated_fit=True needs to be set too.
(1) kwarg corr_matrix
Now it is possible to hand over a correlation matrix of your own choosing for a correlated fit.
In the least_squares function a Cholesky decomposition is then applied to the correlation matrix
and the resulting lower triangular matrix is then combined with the y errors and inverted such that
a lower triangular matrix corresponding to the inverse covariance matrix is extracted (and later used
for the definition of the correlated chi squared function).
(2) kwarg inv_chol_cov_marix
It is also possible to hand over the inverse covariance matrix directly (as a lower triangular matrix)
if it has been constructed from a Cholesky decomposition of the correlation matrix in the same way
as detailed in (1).

@PiaLJP PiaLJP requested a review from fjosw as a code owner January 10, 2024 17:52
@fjosw fjosw requested a review from s-kuberski January 11, 2024 07:55
@s-kuberski
Copy link
Collaborator

Hi Pia,
thanks a lot for adding this feature, I think it will be quite useful. I'll take some time to test your implementation before I provide further feedback.

Copy link
Collaborator

@s-kuberski s-kuberski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I have written a lot of text, not because I don't like your code but because I think we need a little bit of discussion concerning the capabilities of the routine with respect to the new parameters. Thanks again for starting this!

@@ -151,6 +151,12 @@ def func_b(a, x):
For details about how the covariance matrix is estimated see `pyerrors.obs.covariance`.
In practice the correlation matrix is Cholesky decomposed and inverted (instead of the covariance matrix).
This procedure should be numerically more stable as the correlation matrix is typically better conditioned (Jacobi preconditioning).
corr_matrix: array, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have the general question if we want to allow to pass the correlation matrix or the covariance matrix. The latter would allow to work with modified weights for single data points without having to mess around with the uncertainties. Imagine the situation, where you want to give more weight to points close to the continuum in an extrapolation. Together with the expected chi^2, this can have a well-defined meaning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning the expectec chisquare: This part of the code has not yet been touched. If we want people to allow to pass any kind of covariance/correlation matrix, it has to be clear if this is meant to be a better estimator than the internal one or if it is just some modified version for use in the chisquare function. The computation and interpretation of the expected chisquare would depend on this. This requires some thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) I think it's nice to be able to just hand over the correlation matrix because the y errors are in the code already so it's nice not to have to construct them again outside of the least_squares function. I also thought it would be a safer option where the least_squares function does most of the work for you (inverting and including the errors).
(2) Additionally I thought it would be nice to have one less restrictive option as to also incorporate modified weights for single data points like you say. (more on that below)

I suppose it would be useful to be able to hand over a covariance/correlation matrix for the expected chisquare.
Sometimes the matrix constructed in the code is ill-defined and you do not get a reliable expected chisquare. And it would also be interesting to see the effect of a covariance/correlation matrix of your choosing especially if it's to be compared with the correlated fit version with that same matrix.

@@ -151,6 +151,12 @@ def func_b(a, x):
For details about how the covariance matrix is estimated see `pyerrors.obs.covariance`.
In practice the correlation matrix is Cholesky decomposed and inverted (instead of the covariance matrix).
This procedure should be numerically more stable as the correlation matrix is typically better conditioned (Jacobi preconditioning).
corr_matrix: array, optional
if correlated_fit=True is set as well, can provide a correlation matrix (without y errros!) of your own choosing for a correlated fit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would exactly specify what is needed here, i.e., a square matrix of floats with the dimension N where N is the length of 'y'. Once could also write that this is the correlation matrix of the Observables in y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good idea! It even has to be positive definite otherwise it's not invertible and the Cholesky decomposition is also not applicable, right? So the way the correlation matrix is constructed by
the covariance function in pyerrors (used in the least squares for a correlated fit without any cov/corr matrix input) only guarantees a positive semi-definite matrix so you can run into a problem then too
or am I missing something? Of course the condition number of the correlation matrix can also just be bad even if the matrix is positive definite (and therefore in principle invertible). In both cases the code
produces an error but wouldn't it still be good to also stress that these two conditions (positive definite and well-conditioned) have to be met by the correlation matrix handed over as corr_matrix?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. It has to be a symmetrix, positive definite matrix. The condition number is an implicit problem, I would say.

@@ -151,6 +151,12 @@ def func_b(a, x):
For details about how the covariance matrix is estimated see `pyerrors.obs.covariance`.
In practice the correlation matrix is Cholesky decomposed and inverted (instead of the covariance matrix).
This procedure should be numerically more stable as the correlation matrix is typically better conditioned (Jacobi preconditioning).
corr_matrix: array, optional
if correlated_fit=True is set as well, can provide a correlation matrix (without y errros!) of your own choosing for a correlated fit
inv_chol_cov_matrix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece is a bit tricky. In principle, the code currently expects a matrix that is based on the algorithm which is performed inside our routine. It is very easy to do something wrong, when you try to come up with the correct inverted Cholesky matrix without copying what is done in the code.

One could then ask the question whether this parameter is needed. It could be helpful when performing many fits to the same set of data because it saves some time, when the inversion would not have to be done every time. However, providing the covariance matrix could already help a lot in reducing the cost and the inversion alone is probably not too expensive.

I did think that it would be helpful to provide the inverse, but now I cannot really come up with a good example where this would be the case. Do you have something in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I thought it would be good to have one option where you are completely free to hand over whatever type of inverse covariance matrix you want. But as the code relies on the Cholesky decomposition to access the 'square root' of the chi squared function (in general_chisqfun()) which is cheaper and is later also needed to define the chisqfunc_residuals function I thought it would be good to keep this form of the matrix.
I also didn't see a nice cheap way to access this if the inverse covariance matrix is handed over as whole and then only the full chi squared function is evaluated. So I essentially did not see a nice way of defining the general_chisqfun() function (l. 331). Do you know a nice way around that?
What I had in mind was your case above of using modified weights for different data points (see above).

inv_chol_cov_matrix
if correlated_fit=True is set as well, can provide an inverse covariance matrix (y errros, dy_f included!) of your own choosing for a correlated fit
The matrix must be a lower triangular matrix constructed from a Cholesky decomposed correlation matrix (e.g. chol = np.linalg.cholesky(corr)
so that the inverse covariance matrix is defined by chol_inv = scipy.linalg.solve_triangular(chol, covdiag, lower=True) with covdiag = np.diag(1 / np.asarray(dy_f)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is connected to the comment above: The current description is very hard to understand if the user does not look into the code but just uses package because some variables like dy_f are not introduced.

chol_inv = scipy.linalg.solve_triangular(chol, covdiag, lower=True)
if 'inv_chol_cov_matrix' in kwargs:
chol_inv = kwargs.get('inv_chol_cov_matrix')
print('Warning: The inverse covariance matrix handed over must be a lower triangular matrix constructed from a Cholesky decomposed correlation matrix.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting a warning into a print statement is dangerous because it is easily overlooked. However, I think I would not print anything here. If the user is doing the correct thing, the code should silently do it's job.
[Let's first discuss the question if the parameter is needed before we go into further details for the internal checks].

if 'corr_matrix' in kwargs:
corr = kwargs.get('corr_matrix')
if (corr.shape[0] != len(dy_f)):
raise TypeError('The number of columns of the correlation matrix needs to be equal to the number of y errors.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace "y errors" by number of observables in y? Or "lenght of y"?

diff_inv_cov = fitp_inv_cov[i] - fitpc[i]
diff_inv_cov.gamma_method()
assert(diff_inv_cov.is_zero(atol=0.0))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also test if an uncorrelated fit gives the exact same result as a correlated fit with a diagonal covariance matrix. This should be a quite strong check.

@s-kuberski
Copy link
Collaborator

Hi,
thanks for your explanation. I started these distributed comments but I'll try to condense everything to a single message now...
Based on what I understand from our discussion, we would provide the user with two possibilities:

  • Passing a covariance matrix that is assumed to be a proper estimator of the true covariance matrix. This would allow the user to use a different definition than the one that is used inside the current routine or to use some sort of shrinkage algorithm which would also deliver a proper estimator.
  • Passing an inverted covariance matrix (in proper shape for the fit). This would allow to use modified weights which, together with the expected chisquare, could still be useful for fits.

I think that this is a good solution (I'll try to think more if I can come up with cases that are not covered).

I have a few proposals for this strategy:

  • The correlation matrix that is provided by the user should then also be used in the expected chisquare, because we assume that it is a good estimator for the true correlation matrix.
  • To enable the user to provide the covariance matrix in the proper form, we could think of putting everything in lines

    pyerrors/pyerrors/fits.py

    Lines 300 to 308 in df1873b

    corr = covariance(y_all, correlation=True, **kwargs)
    covdiag = np.diag(1 / np.asarray(dy_f))
    condn = np.linalg.cond(corr)
    if condn > 0.1 / np.finfo(float).eps:
    raise Exception(f"Cannot invert correlation matrix as its condition number exceeds machine precision ({condn:1.2e})")
    if condn > 1e13:
    warnings.warn("Correlation matrix may be ill-conditioned, condition number: {%1.2e}" % (condn), RuntimeWarning)
    chol = np.linalg.cholesky(corr)
    chol_inv = scipy.linalg.solve_triangular(chol, covdiag, lower=True)
    into a separate function, e.g., def invert_corr_cov_cholesky(corr, covdiag). This could then be called inside the fit routine but also from outside such that a user can first come up with a correlation matrix of their own choosing and then invert it before passing it to the fit routine. We would have to explain to the user how corr and covdiag are defined. In principle it could also be a possibility to have def invert_cov_cholesky(cov) where the weights are computed inside the routine. (Is it clear what I mean?)
  • I think all checks regarding positive definiteness and the condition number would still be in place if the user used the (newly provided) function to invert the covariance matrix of their choosing.
  • It would certainly be very useful to demonstrate all of the possibilities in the examples and to write very accurate doc strings.

@PiaLJP
Copy link
Contributor Author

PiaLJP commented Jan 19, 2024

Hi Simon,
thanks for the summary and suggestions! I basically agree with all your points I just have a few additional comments.

Passing a covariance matrix that is assumed to be a proper estimator of the true covariance matrix. This would allow the user to use a different definition than the one that is used inside the current routine or to use some sort of shrinkage algorithm which would also deliver a proper estimator.

Just to be sure you meant to say correlation matrix? So you are happy with kwargs options corr_matrix and inv_chol_cov_matrix as they are now?

I thinks it's a good idea to define def invert_corr_cov_cholesky(corr, covdiag) outside theleast_squaresfunction. But I have noticed that is not necessary to construct the inverse covariance matrix in exactly that way. The inverse covariance matrix (kwarg inv_chol_cov) would only have to be a lower triangular matrix from a Cholesky decomposition whether the decomposition is carried out only after the covariance matrix has been inverted does not matter. (The result can be slightly different of course if the condition number is not ideal so this might be a reason to favour the way the inverse covariance is calculated in the code at the moment since the 'size' (fewer non-zero elements) of the matrix is reduced early on...). So the kwarg is more flexible than I thought originally and a Cholesky decomposition always exists (if the matrix is symmetric and positive definite which it should be anyway). Then def invert_corr_cov_cholesky(corr, covdiag) would not have to be used of course so

I think all checks regarding positive definiteness and the condition number would still be in place if the user used the (newly provided) function to invert the covariance matrix of their choosing.

would not hold in the way you say. The Cholesky decomposition only works if positive definiteness holds but the condition number could still be large. I would still like to allow this so that if you are provided with an inverse covariance matrix (not unusual for cross-checking results) from someone else for instance you can still use it. We could just add in the description that the condition number should not be too large as you can also see from the function
def invert_corr_cov_cholesky(corr, covdiag)

In principle it could also be a possibility to have def invert_cov_cholesky(cov) where the weights are computed inside the routine. (Is it clear what I mean?)

Here you're referring to modified weights for single data points? So you would like to be able to hand over a list/dict of weights to this function?

The correlation matrix that is provided by the user should then also be used in the expected chisquare, because we assume that it is a good estimator for the true correlation matrix.

Ok so this would be the same kwarg corr_matrix or the kwarg inv_chol_cov_matrix but the expected chisquare would of course only be calculated if the the kwarg expected_chisquare is set in addition (so in an uncorrelated fit). Is that what you had in mind?

@s-kuberski
Copy link
Collaborator

s-kuberski commented Jan 26, 2024

Hi,
sorry for the delay...

Just to be sure you meant to say correlation matrix? So you are happy with kwargs options corr_matrix and inv_chol_cov_matrix as they are now?
Yes, sorry, that should have been "correlation matrix".

I didn't fully get your next paragraph. Do you mean that one could also provide an inverted correlation matrix and then, in the fit routine, do the cholesky decomposition? I think the two steps do not necessarily commute but, as you write, this is probably more of an issue of the condition number is bad.

If we would provide the routine to invert and decompose a matrix, which then can be passed to the fit routine, we would remain fully flexible, right? People could come up with a matrix of their choosing and use pyerrors to Cholesky-decompose and invert, before passing it to the fit routine. In this case, all of the checks would still be in place (you could allow to explicitly turn errors into warnings, if you'd like). The other case would be to use an inverse matrix from somewhere else (e.g., to crosscheck someone other's work). In that case (I would think that it is an unusual one), the user would have to do the Cholesky decomposition themselves, before passing the matrix to the code.

Here you're referring to modified weights for single data points? So you would like to be able to hand over a list/dict of weights to this function?

I was wondering how to allow for modified weights for single data point, exactly. This would not go into the correlation matrix but into the "error vector". I am still a bit confused by all the possibilities... If we do not think about the expected chi^2, it would be fine to just be able to pass a correlation matrix, the "error vector" (something related to covdiag in the current implementation) or an inverted correlation matrix. As soon as one takes into account the expected chi^2, it matters if these matrices and weights are considered to be "better estimators for the true covariance matrix" or "modified covariance matrices" which can be turned to be useful with the expected chi^2.

@fjosw
Copy link
Owner

fjosw commented May 24, 2024

Hey @PiaLJP & @s-kuberski, any updates on this PR? Do you still want to proceed of should we close it for now?

@s-kuberski
Copy link
Collaborator

Hi,
we discussed a bit offline to be more efficient. As I understood, @PiaLJP will get back to implementing the changes that we discussed in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants