Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a score function for SRM #478

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Kajiyu
Copy link

@Kajiyu Kajiyu commented Aug 13, 2020

This adds a function on SRM which calculates the log-likelihood of test-subjects' data for model selection.
We followed a style of scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.score

Co-authored-by: @lcnature .

@lcnature
Copy link
Contributor

lcnature commented Aug 14, 2020

This PR partially deals with #69 by adding cross-validated logp for SRM.

@mihaic
Copy link
Member

mihaic commented Aug 24, 2020

@qihongl, could you please review?

@qihongl
Copy link
Member

qihongl commented Feb 23, 2021

Oops sorry about the delayed reply! @lcnature @Kajiyu I didn't notice it! I will take a look in 2 weeks.

btw is there a test added to this module?

@lcnature
Copy link
Contributor

Thanks @qihongl ! It looks like we did not add. I will maybe work on it this weekend or next week.

@qihongl
Copy link
Member

qihongl commented Mar 2, 2021

@lcnature Ah do you mean there are some new updates that haven't been added to this PR?

@mihaic
Copy link
Member

mihaic commented Mar 8, 2021

@lcnature, please remember about this pull request. It seems close to completion.

-------

ll_score : Log-likelihood of test-subject's data
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to have an input validation here to make sure all d in data have the same number of samples?

I noticed that later on line 616, you are skipping any d in data that is None. I was wondering if this can be done earlier, such a removing None from the very beginning, or warn the user that there is None in the data list? Feel free to decide what's more user-friendly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah and this function probably want to assert len(data) >= 2

Comment on lines +584 to +587
w = []
for subject in range(subjects):
_w = self._update_transform_subject(data[subject], self.s_)
w.append(_w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary but I think the following is slightly faster

w = [self._update_transform_subject(data[subject], self.s_) for subject in range(subjects)]

w.append(_w)

x, mu, rho2, trace_xtx = self._init_structures(data, subjects)
sigma_s = self.sigma_s_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary but how about just use self.sigma_s_ on line 597?


# Invert Sigma_s using Cholesky factorization
(chol_sigma_s, lower_sigma_s) = scipy.linalg.cho_factor(
sigma_s, check_finite=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referred in a comment above

wt_invpsi_x = np.zeros((self.features, samples))
trace_xt_invsigma2_x = 0.0
for subject in range(subjects):
if data[subject] is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referred in a comment above

Copy link
Member

@qihongl qihongl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm not familiar with mpi but I think the code makes sense (in terms of matching the formulas in the paper)!
I think some input validation for data at the beginning of this function would be useful here. And I have 2 other very minor comments. Ah, and a test would be nice. Perhaps a test that computes the score for some toy data?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants