Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH]: Sensitvitiy Analyzer for sklearns SGD classifier #596

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

adswa
Copy link
Contributor

@adswa adswa commented Jan 3, 2019

This PR contains a number of changes all aimed at providing a Sensitivity Analyzer for an SDG classifier.
It...

  • implements the subclass SKLLearnerAdapterWeights(Sensitivity) to get the feature weights from sklearns SGDClassifier, wrapped with the SKLLearnerAdapter
  • implements the subclass MulticlassClassifierSensitivity(BoostedClassifierSensitivityAnalyzer) to store the labels of the pairwise comparisons in a SampleAttribute of the weights Dataset.
  • implements the necessary get_sensitivity_analyzer() functions in clfs/skl/base.py and clfs/meta.py
  • adds the classifier to the warehouse

Current problems:

  • the solution to checking whether the classifier in question wrapped with SKLLearnerAdapter() is indeed an SGD is crude: if 'SGDClassifier' in str(self._skl_learner): This is because I couldn't find a way to get the information on the type of classifier before training. Probably there is a prettier solution.
  • 2 tests fail:
======================================================================
FAIL: mvpa2.tests.test_hdf5_clf.test_h5py_clfs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "/home/adina/Repos/PyMVPA/mvpa2/testing/tools.py", line 291, in newfunc
    func(*(arg + (filename,)), **kw)
  File "/home/adina/Repos/PyMVPA/mvpa2/tests/test_hdf5_clf.py", line 115, in test_h5py_clfs
    cmp_(error, error_)
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/numpy/testing/_private/utils.py", line 865, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/numpy/testing/_private/utils.py", line 789, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
 Single scenario lead to failures of unittest test_h5py_clfs:
  on
    lrn=<MulticlassClassifier> :
     
Arrays are not equal

(mismatch 100.0%)
 x: array([0.022222])
 y: array([0.033333])


======================================================================
FAIL: Some really basic testing for match_distribution
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/adina/Repos/PyMVPA/mvpa2/tests/test_stats_sp.py", line 211, in test_match_distribution
    self.assertTrue('norm' in names)
AssertionError: False is not true

----------------------------------------------------------------------
  • two tests raise errors:
======================================================================
ERROR: Test analyzers in split classifier
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/adina/Repos/PyMVPA/mvpa2/testing/sweep.py", line 69, in do_sweep
    method(*args_, **kwargs_)
  File "/home/adina/Repos/PyMVPA/mvpa2/tests/test_datameasure.py", line 119, in test_analyzer_with_split_classifier
    sens = sana(ds)
  File "/home/adina/Repos/PyMVPA/mvpa2/base/learner.py", line 258, in __call__
    return super(Learner, self).__call__(ds)
  File "/home/adina/Repos/PyMVPA/mvpa2/base/node.py", line 137, in __call__
    result = self._call(ds, **(_call_kwargs or self._get_call_kwargs(ds)))
  File "/home/adina/Repos/PyMVPA/mvpa2/measures/base.py", line 1053, in _call
    return self.__combined_analyzer._call(dataset)
  File "/home/adina/Repos/PyMVPA/mvpa2/measures/base.py", line 909, in _call
    sensitivity = analyzer(dataset)
  File "/home/adina/Repos/PyMVPA/mvpa2/base/learner.py", line 258, in __call__
    return super(Learner, self).__call__(ds)
  File "/home/adina/Repos/PyMVPA/mvpa2/base/node.py", line 137, in __call__
    result = self._call(ds, **(_call_kwargs or self._get_call_kwargs(ds)))
  File "/home/adina/Repos/PyMVPA/mvpa2/clfs/meta.py", line 1214, in _call
    senses.sa[clf.get_space()] = [(clf.clfs[i].poslabels[0], clf.clfs[i].neglabels[0]) for i in range(len(clf.clfs))]
  File "/home/adina/Repos/PyMVPA/mvpa2/base/collections.py", line 599, in __setitem__
    str(self)))
ValueError: Collectable 'targets' with length [6] does not match the required length [12] of collection '<SampleAttributesCollection: biases,lrn_index,targets>'.

======================================================================
ERROR: mvpa2.tests.test_emp_null.test_efdr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/adina/Repos/PyMVPA/mvpa2/testing/tools.py", line 344, in newfunc
    return func(*arg, **kwargs)
  File "/home/adina/Repos/PyMVPA/mvpa2/tests/test_emp_null.py", line 36, in test_efdr
    np.testing.assert_array_less(efdr.fdr(2.9), 0.15)
  File "/home/adina/Repos/PyMVPA/mvpa2/support/_emp_null.py", line 332, in fdr
    self.learn()
  File "/home/adina/Repos/PyMVPA/mvpa2/support/_emp_null.py", line 245, in learn
    medge = medge[whist]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 133 but corresponding boolean dimension is 132

@adswa
Copy link
Contributor Author

adswa commented Jan 3, 2019

Forgot to tag you, @yarikoptic

@yarikoptic yarikoptic self-assigned this Jan 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants