New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a score function for SRM #478
base: master
Are you sure you want to change the base?
Conversation
This PR partially deals with #69 by adding cross-validated logp for SRM. |
@qihongl, could you please review? |
Thanks @qihongl ! It looks like we did not add. I will maybe work on it this weekend or next week. |
@lcnature Ah do you mean there are some new updates that haven't been added to this PR? |
@lcnature, please remember about this pull request. It seems close to completion. |
------- | ||
|
||
ll_score : Log-likelihood of test-subject's data | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to have an input validation here to make sure all d
in data
have the same number of samples?
I noticed that later on line 616, you are skipping any d
in data
that is None
. I was wondering if this can be done earlier, such a removing None
from the very beginning, or warn the user that there is None in the data
list? Feel free to decide what's more user-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah and this function probably want to assert len(data) >= 2
w = [] | ||
for subject in range(subjects): | ||
_w = self._update_transform_subject(data[subject], self.s_) | ||
w.append(_w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary but I think the following is slightly faster
w = [self._update_transform_subject(data[subject], self.s_) for subject in range(subjects)]
w.append(_w) | ||
|
||
x, mu, rho2, trace_xtx = self._init_structures(data, subjects) | ||
sigma_s = self.sigma_s_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary but how about just use self.sigma_s_
on line 597?
|
||
# Invert Sigma_s using Cholesky factorization | ||
(chol_sigma_s, lower_sigma_s) = scipy.linalg.cho_factor( | ||
sigma_s, check_finite=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
referred in a comment above
wt_invpsi_x = np.zeros((self.features, samples)) | ||
trace_xt_invsigma2_x = 0.0 | ||
for subject in range(subjects): | ||
if data[subject] is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
referred in a comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm not familiar with mpi but I think the code makes sense (in terms of matching the formulas in the paper)!
I think some input validation for data
at the beginning of this function would be useful here. And I have 2 other very minor comments. Ah, and a test would be nice. Perhaps a test that computes the score for some toy data?
Thanks!
This adds a function on SRM which calculates the log-likelihood of test-subjects' data for model selection.
We followed a style of scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.score
Co-authored-by: @lcnature .