-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/corr matrix and inverse cov matrix as input in least squares function for correlated fits #223
base: develop
Are you sure you want to change the base?
Feature/corr matrix and inverse cov matrix as input in least squares function for correlated fits #223
Conversation
Hi Pia, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I have written a lot of text, not because I don't like your code but because I think we need a little bit of discussion concerning the capabilities of the routine with respect to the new parameters. Thanks again for starting this!
@@ -151,6 +151,12 @@ def func_b(a, x): | |||
For details about how the covariance matrix is estimated see `pyerrors.obs.covariance`. | |||
In practice the correlation matrix is Cholesky decomposed and inverted (instead of the covariance matrix). | |||
This procedure should be numerically more stable as the correlation matrix is typically better conditioned (Jacobi preconditioning). | |||
corr_matrix: array, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have the general question if we want to allow to pass the correlation matrix or the covariance matrix. The latter would allow to work with modified weights for single data points without having to mess around with the uncertainties. Imagine the situation, where you want to give more weight to points close to the continuum in an extrapolation. Together with the expected chi^2, this can have a well-defined meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning the expectec chisquare: This part of the code has not yet been touched. If we want people to allow to pass any kind of covariance/correlation matrix, it has to be clear if this is meant to be a better estimator than the internal one or if it is just some modified version for use in the chisquare function. The computation and interpretation of the expected chisquare would depend on this. This requires some thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) I think it's nice to be able to just hand over the correlation matrix because the y errors are in the code already so it's nice not to have to construct them again outside of the least_squares
function. I also thought it would be a safer option where the least_squares
function does most of the work for you (inverting and including the errors).
(2) Additionally I thought it would be nice to have one less restrictive option as to also incorporate modified weights for single data points like you say. (more on that below)
I suppose it would be useful to be able to hand over a covariance/correlation matrix for the expected chisquare.
Sometimes the matrix constructed in the code is ill-defined and you do not get a reliable expected chisquare. And it would also be interesting to see the effect of a covariance/correlation matrix of your choosing especially if it's to be compared with the correlated fit version with that same matrix.
@@ -151,6 +151,12 @@ def func_b(a, x): | |||
For details about how the covariance matrix is estimated see `pyerrors.obs.covariance`. | |||
In practice the correlation matrix is Cholesky decomposed and inverted (instead of the covariance matrix). | |||
This procedure should be numerically more stable as the correlation matrix is typically better conditioned (Jacobi preconditioning). | |||
corr_matrix: array, optional | |||
if correlated_fit=True is set as well, can provide a correlation matrix (without y errros!) of your own choosing for a correlated fit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would exactly specify what is needed here, i.e., a square matrix of floats with the dimension N where N is the length of 'y'. Once could also write that this is the correlation matrix of the Observables in y
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good idea! It even has to be positive definite otherwise it's not invertible and the Cholesky decomposition is also not applicable, right? So the way the correlation matrix is constructed by
the covariance
function in pyerrors (used in the least squares
for a correlated fit without any cov/corr matrix input) only guarantees a positive semi-definite matrix so you can run into a problem then too
or am I missing something? Of course the condition number of the correlation matrix can also just be bad even if the matrix is positive definite (and therefore in principle invertible). In both cases the code
produces an error but wouldn't it still be good to also stress that these two conditions (positive definite and well-conditioned) have to be met by the correlation matrix handed over as corr_matrix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. It has to be a symmetrix, positive definite matrix. The condition number is an implicit problem, I would say.
@@ -151,6 +151,12 @@ def func_b(a, x): | |||
For details about how the covariance matrix is estimated see `pyerrors.obs.covariance`. | |||
In practice the correlation matrix is Cholesky decomposed and inverted (instead of the covariance matrix). | |||
This procedure should be numerically more stable as the correlation matrix is typically better conditioned (Jacobi preconditioning). | |||
corr_matrix: array, optional | |||
if correlated_fit=True is set as well, can provide a correlation matrix (without y errros!) of your own choosing for a correlated fit | |||
inv_chol_cov_matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece is a bit tricky. In principle, the code currently expects a matrix that is based on the algorithm which is performed inside our routine. It is very easy to do something wrong, when you try to come up with the correct inverted Cholesky matrix without copying what is done in the code.
One could then ask the question whether this parameter is needed. It could be helpful when performing many fits to the same set of data because it saves some time, when the inversion would not have to be done every time. However, providing the covariance matrix could already help a lot in reducing the cost and the inversion alone is probably not too expensive.
I did think that it would be helpful to provide the inverse, but now I cannot really come up with a good example where this would be the case. Do you have something in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I thought it would be good to have one option where you are completely free to hand over whatever type of inverse covariance matrix you want. But as the code relies on the Cholesky decomposition to access the 'square root' of the chi squared function (in general_chisqfun()
) which is cheaper and is later also needed to define the chisqfunc_residuals function I thought it would be good to keep this form of the matrix.
I also didn't see a nice cheap way to access this if the inverse covariance matrix is handed over as whole and then only the full chi squared function is evaluated. So I essentially did not see a nice way of defining the general_chisqfun()
function (l. 331). Do you know a nice way around that?
What I had in mind was your case above of using modified weights for different data points (see above).
inv_chol_cov_matrix | ||
if correlated_fit=True is set as well, can provide an inverse covariance matrix (y errros, dy_f included!) of your own choosing for a correlated fit | ||
The matrix must be a lower triangular matrix constructed from a Cholesky decomposed correlation matrix (e.g. chol = np.linalg.cholesky(corr) | ||
so that the inverse covariance matrix is defined by chol_inv = scipy.linalg.solve_triangular(chol, covdiag, lower=True) with covdiag = np.diag(1 / np.asarray(dy_f))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is connected to the comment above: The current description is very hard to understand if the user does not look into the code but just uses package because some variables like dy_f
are not introduced.
chol_inv = scipy.linalg.solve_triangular(chol, covdiag, lower=True) | ||
if 'inv_chol_cov_matrix' in kwargs: | ||
chol_inv = kwargs.get('inv_chol_cov_matrix') | ||
print('Warning: The inverse covariance matrix handed over must be a lower triangular matrix constructed from a Cholesky decomposed correlation matrix.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think putting a warning into a print statement is dangerous because it is easily overlooked. However, I think I would not print anything here. If the user is doing the correct thing, the code should silently do it's job.
[Let's first discuss the question if the parameter is needed before we go into further details for the internal checks].
if 'corr_matrix' in kwargs: | ||
corr = kwargs.get('corr_matrix') | ||
if (corr.shape[0] != len(dy_f)): | ||
raise TypeError('The number of columns of the correlation matrix needs to be equal to the number of y errors.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe replace "y errors" by number of observables in y
? Or "lenght of y
"?
diff_inv_cov = fitp_inv_cov[i] - fitpc[i] | ||
diff_inv_cov.gamma_method() | ||
assert(diff_inv_cov.is_zero(atol=0.0)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also test if an uncorrelated fit gives the exact same result as a correlated fit with a diagonal covariance matrix. This should be a quite strong check.
Hi,
I think that this is a good solution (I'll try to think more if I can come up with cases that are not covered). I have a few proposals for this strategy:
|
Hi Simon,
Just to be sure you meant to say correlation matrix? So you are happy with kwargs options I thinks it's a good idea to define
would not hold in the way you say. The Cholesky decomposition only works if positive definiteness holds but the condition number could still be large. I would still like to allow this so that if you are provided with an inverse covariance matrix (not unusual for cross-checking results) from someone else for instance you can still use it. We could just add in the description that the condition number should not be too large as you can also see from the function
Here you're referring to modified weights for single data points? So you would like to be able to hand over a list/dict of weights to this function?
Ok so this would be the same kwarg |
Hi,
I didn't fully get your next paragraph. Do you mean that one could also provide an inverted correlation matrix and then, in the fit routine, do the cholesky decomposition? I think the two steps do not necessarily commute but, as you write, this is probably more of an issue of the condition number is bad. If we would provide the routine to invert and decompose a matrix, which then can be passed to the fit routine, we would remain fully flexible, right? People could come up with a matrix of their choosing and use
I was wondering how to allow for modified weights for single data point, exactly. This would not go into the correlation matrix but into the "error vector". I am still a bit confused by all the possibilities... If we do not think about the expected chi^2, it would be fine to just be able to pass a correlation matrix, the "error vector" (something related to |
Hey @PiaLJP & @s-kuberski, any updates on this PR? Do you still want to proceed of should we close it for now? |
Hi, |
Added kwargs
corr_matrix
andinv_chol_cov_marix
to theleast_squares
function.Note: For the kwargs to work
correlated_fit=True
needs to be set too.(1) kwarg
corr_matrix
Now it is possible to hand over a correlation matrix of your own choosing for a correlated fit.
In the
least_squares
function a Cholesky decomposition is then applied to the correlation matrixand the resulting lower triangular matrix is then combined with the y errors and inverted such that
a lower triangular matrix corresponding to the inverse covariance matrix is extracted (and later used
for the definition of the correlated chi squared function).
(2) kwarg
inv_chol_cov_marix
It is also possible to hand over the inverse covariance matrix directly (as a lower triangular matrix)
if it has been constructed from a Cholesky decomposition of the correlation matrix in the same way
as detailed in (1).