-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new classifier ShrinkageLDA to gda.py #565
base: master
Are you sure you want to change the base?
Conversation
Interested in us guiding you @Twitterfax to polish up the PR (make your code tested, not failing as now, etc), or just want us to do that (might take longer ;) ) |
Please do guide me. It would be much appreciated :) For example: one thing I wasn't sure about was whether scikit-learn was installed along with PyMVPA by default. If it is not available I'd have to integrate the shrinkage estimators manually. I'm guessing that this could be a problem. How can I inspect the Travis CI details? |
Great:
I would also recommend installing codecov browser extension which would help to see which lines of your code are covered by the tests and which not, right in your browser while reviewing differences in that "Files changed" tab: https://github.com/codecov/browser-extension#choose-your-browser-below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic quick review #1: changes to be done to get testing going etc
mvpa2/clfs/gda.py
Outdated
__tags__ = GDA.__tags__ + ['linear', 'lda', 'shrinkage'] | ||
|
||
def __init__(self, shrinkage_estimator='LedoitWolf', block_size=1000, \ | ||
**kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shave a look at some other classifiers, e.g. at GDA or others to how to ideally define parameters (via those Parameter
definitions in the class)
no need for trailing \
here since it is within ()
mvpa2/clfs/gda.py
Outdated
@@ -35,12 +35,12 @@ | |||
from mvpa2.base.constraints import EnsureChoice | |||
from mvpa2.base.state import ConditionalAttribute | |||
#from mvpa2.measures.base import Sensitivity | |||
|
|||
from sklearn.covariance import LedoitWolf, OAS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this code depends on sklearn presence it might be better to move it into a separate file (e.g. shrinklda.py
or alike) which would help to import it conditionally on sklearn presence within the warehouse
mvpa2/clfs/gda.py
Outdated
if shrinkage_estimator is 'LedoitWolf': | ||
self.shrinkage_estimator = LedoitWolf(block_size=block_size) | ||
elif shrinkage_estimator is 'OAS': | ||
self.shrinkage_estimator = OAS() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better delay this instantiation until _train, and do not even assign that instance to any ShrinkageLDA attribute then.
It would allow to change that parameter of a classifier instance and retrain and would keep its repr
correct
mvpa2/clfs/gda.py
Outdated
# because it only computes the class-dependent SSCP matrices. The | ||
# shrinkage coefficient, however, needs to be computed from the whole | ||
# time series (with samples centered on their respective class means) | ||
params = self.params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we go -- that is where you would find those Parameter
s after you move them out to be defined at the class level, e.g. params.shrinkage_estimator
mvpa2/clfs/gda.py
Outdated
# Store prior probabilities | ||
self.priors = self._get_priors(nlabels, nsamples, nsamples_per_class) | ||
|
||
if __debug__ and 'ShrinkageLDA' in debug.active: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would need to add ShrinkageLDA among debug targets alongside with GDA etc:
$> git grep GDA -- mvpa2/base/
mvpa2/base/__init__.py: debug.register('GDA', "GDA - Gaussian Discriminant Analyses")
mvpa2/clfs/warehouse.py
Outdated
@@ -18,7 +18,7 @@ | |||
MulticlassClassifier, RegressionAsClassifier | |||
from mvpa2.clfs.smlr import SMLR | |||
from mvpa2.clfs.knn import kNN | |||
from mvpa2.clfs.gda import LDA, QDA | |||
from mvpa2.clfs.gda import LDA, QDA, ShrinkageLDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after you move that definition into a dedicated file, you could place the import within if externals.exists('skl')
below so it would be imported only when sklearn installed
Then add an actual instance of the class, similarly to how it is done for LDA, GDA etc in this file. This should gets its testing going and we will continue from there
Awesome! Thanks for your time and help. I will try to implement the suggested changes. |
Codecov Report
@@ Coverage Diff @@
## master #565 +/- ##
=========================================
+ Coverage 75.71% 75.8% +0.08%
=========================================
Files 364 366 +2
Lines 41453 41555 +102
Branches 6683 6694 +11
=========================================
+ Hits 31386 31500 +114
+ Misses 8213 8194 -19
- Partials 1854 1861 +7
Continue to review full report at Codecov.
|
…nal on availability of sklearn
Hi Yaroslav. Thanks again for your feedback and patience. Things look ok now... I suppose(?) |
Good. Now it would be great to add a dedicated unit test which would compare results against sklearn wrapped implementation you mentioned - should be closer to identical right? I can provide a bit more detail later or you could check out other tests to see how it could be done |
Ok, I have used test_gnb.py etc. as a template to write the test. Let me know if it needs improvement. How would such a test typically be used? |
…the wrapped shrinkage LDA from sklearn.
Look into .travis.yml (which is a bit a mouthful) for how we invoke testing using nose. To test locally just run
to your added tests only |
Travis CI doesn't complain about my coding anymore. I take it as a positive sign ;) |
… of 20 on my machine)
Hi Yaroslav, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for a delay. almost there -- what about actual cv error? (see my comments)
mvpa2/tests/test_shrinklda.py
Outdated
|
||
# First check if scikit-learn is available at all | ||
if not externals.exists('skl') or externals.versions['skl'] < '0.17': | ||
self.skipTest('sklearn (0.17 or higher) unavailable') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha -- never used it this way ... according to docs this should indeed raise an exception. So no need for explicit else:
block indentation -- could be dedented since would not run if there is old sklearn and this exception is raised.
anyways -- you could just use a line
skip_if_no_external('skl', min_version='0.17')
mvpa2/tests/test_shrinklda.py
Outdated
delta_t = [] | ||
for clf in [pymvpa_clf, skl_clf]: | ||
cv = CrossValidation(clf, partitioner) | ||
_ = cv(ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, test could compare those results -- your implementation should be not just faster but also providing a similar (better?) result, right? ;-) otherwise -- what would be the point of having a faster but notably worse implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
1.) Yes, that's a cleaner solution with 'skip_if_no_external'.
2.) I though about this but I didn't know how to define the similarity of results. One idea I had was to show that the cv error of the new LDA implementation is more similar to the cv error from the old LDA version than from SVM classification. This is indeed the case in - say - 95 % of the cases. I figured this would not be acceptable for a test which would probably be run routinely together with a bunch of other tests. You don't want to risk the whole test failing just because you're in the last 5 % (or so) of this little test. Do you have a solution for this?
Otherwise I suggest that I elaborate a bit more on the documentation of the new ShrinkageLDA class to explain what it does. The code is quite simple and with a bit more explanation I could help the user convince himself/herself that what it does is indeed basic LDA with shrinkage on the within-class covariance matrix.
Thanks for the feedback! 1.) Yes, this is a much cleaner solution. Otherwise I suggest that I could improve the documentation of the new class. The code is relatively simple but with a little more detail on what the algorithm does the user could easily convince himself/herself that it indeed does what it's supposed to do to implement LDA with shrinkage on the within-class covariance matrix. In the end, this is the point right? In this way the "validity" of the new algorithm would result directly from its implementation rather than from a comparison with other implementations. Cheers, |
Sorry @Twitterfax again for the delay with the feedback, I will try to do better ;)
re unstable test(s): we do allow for those which only so some times fail. It is somewhat of our curse but we do have such and do not fix seed to guarantee that it is always the same data/result. you could use either
|
Dear PyMVPA users,
Since I couldn't find an implementation of "shrinkage LDA" (e.g., Pereira & Botvinick, NeuroImage, 2011) in PyMVPA, I implemented my own based on Gaussian Discriminant Analysis (gda.py). I figured it could be helpful for other users as well?
It is considerably faster than using the scikit-learn implementation through the SKLLearnerAdapter, which I have been using until now:
SKLLearnerAdapter(LinearDiscriminantAnalysis(solver='lsqr', shrinkage='auto'))
It is also a little more flexible because you can choose between the Ledoit-Wolf estimator and the Oracle Approximating Shrinkage.
May it be useful.
Cheers,
Michael