[ENH]: Sensitvitiy Analyzer for sklearns SGD classifier #596

adswa · 2019-01-03T00:46:40Z

This PR contains a number of changes all aimed at providing a Sensitivity Analyzer for an SDG classifier.
It...

implements the subclass SKLLearnerAdapterWeights(Sensitivity) to get the feature weights from sklearns SGDClassifier, wrapped with the SKLLearnerAdapter
implements the subclass MulticlassClassifierSensitivity(BoostedClassifierSensitivityAnalyzer) to store the labels of the pairwise comparisons in a SampleAttribute of the weights Dataset.
implements the necessary get_sensitivity_analyzer() functions in clfs/skl/base.py and clfs/meta.py
adds the classifier to the warehouse

Current problems:

the solution to checking whether the classifier in question wrapped with SKLLearnerAdapter() is indeed an SGD is crude: if 'SGDClassifier' in str(self._skl_learner): This is because I couldn't find a way to get the information on the type of classifier before training. Probably there is a prettier solution.
2 tests fail:

======================================================================
FAIL: mvpa2.tests.test_hdf5_clf.test_h5py_clfs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "/home/adina/Repos/PyMVPA/mvpa2/testing/tools.py", line 291, in newfunc
    func(*(arg + (filename,)), **kw)
  File "/home/adina/Repos/PyMVPA/mvpa2/tests/test_hdf5_clf.py", line 115, in test_h5py_clfs
    cmp_(error, error_)
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/numpy/testing/_private/utils.py", line 865, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/numpy/testing/_private/utils.py", line 789, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
 Single scenario lead to failures of unittest test_h5py_clfs:
  on
    lrn=<MulticlassClassifier> :
     
Arrays are not equal

(mismatch 100.0%)
 x: array([0.022222])
 y: array([0.033333])


======================================================================
FAIL: Some really basic testing for match_distribution
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/adina/Repos/PyMVPA/mvpa2/tests/test_stats_sp.py", line 211, in test_match_distribution
    self.assertTrue('norm' in names)
AssertionError: False is not true

----------------------------------------------------------------------

two tests raise errors:

======================================================================
ERROR: Test analyzers in split classifier
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/adina/Repos/PyMVPA/mvpa2/testing/sweep.py", line 69, in do_sweep
    method(*args_, **kwargs_)
  File "/home/adina/Repos/PyMVPA/mvpa2/tests/test_datameasure.py", line 119, in test_analyzer_with_split_classifier
    sens = sana(ds)
  File "/home/adina/Repos/PyMVPA/mvpa2/base/learner.py", line 258, in __call__
    return super(Learner, self).__call__(ds)
  File "/home/adina/Repos/PyMVPA/mvpa2/base/node.py", line 137, in __call__
    result = self._call(ds, **(_call_kwargs or self._get_call_kwargs(ds)))
  File "/home/adina/Repos/PyMVPA/mvpa2/measures/base.py", line 1053, in _call
    return self.__combined_analyzer._call(dataset)
  File "/home/adina/Repos/PyMVPA/mvpa2/measures/base.py", line 909, in _call
    sensitivity = analyzer(dataset)
  File "/home/adina/Repos/PyMVPA/mvpa2/base/learner.py", line 258, in __call__
    return super(Learner, self).__call__(ds)
  File "/home/adina/Repos/PyMVPA/mvpa2/base/node.py", line 137, in __call__
    result = self._call(ds, **(_call_kwargs or self._get_call_kwargs(ds)))
  File "/home/adina/Repos/PyMVPA/mvpa2/clfs/meta.py", line 1214, in _call
    senses.sa[clf.get_space()] = [(clf.clfs[i].poslabels[0], clf.clfs[i].neglabels[0]) for i in range(len(clf.clfs))]
  File "/home/adina/Repos/PyMVPA/mvpa2/base/collections.py", line 599, in __setitem__
    str(self)))
ValueError: Collectable 'targets' with length [6] does not match the required length [12] of collection '<SampleAttributesCollection: biases,lrn_index,targets>'.

======================================================================
ERROR: mvpa2.tests.test_emp_null.test_efdr
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/adina/env/USA/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/adina/Repos/PyMVPA/mvpa2/testing/tools.py", line 344, in newfunc
    return func(*arg, **kwargs)
  File "/home/adina/Repos/PyMVPA/mvpa2/tests/test_emp_null.py", line 36, in test_efdr
    np.testing.assert_array_less(efdr.fdr(2.9), 0.15)
  File "/home/adina/Repos/PyMVPA/mvpa2/support/_emp_null.py", line 332, in fdr
    self.learn()
  File "/home/adina/Repos/PyMVPA/mvpa2/support/_emp_null.py", line 245, in learn
    medge = medge[whist]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 133 but corresponding boolean dimension is 132

… of choice is an SGD. The way this is implemented is rather crude...

… warehouse

… if the MulticlassClassifier performs 1-vs-1 classification, else it just returns the feature weights

…d pairs as sampleattributes to the sensitivity dataset

adswa · 2019-01-03T01:27:41Z

Forgot to tag you, @yarikoptic

adswa added 9 commits January 2, 2019 19:34

necessary imports for sensitivity computation in linear SGD

3dea2fb

add 'linear' and 'has_sensitivity' tags to __tags__ if the classifier…

ef99e81

… of choice is an SGD. The way this is implemented is rather crude...

add 'sgd' to the known_labels of the warehouse

f11f82b

perform necessary imports for sklearns SGDClassifier in warehouse

f9230c5

add the MulticlassClassifier-version of sklearns SGDClassifier to the…

2ac9387

… warehouse

Return a sensitivity analyzer with compared pairs in SampleAttributes…

1fcac20

… if the MulticlassClassifier performs 1-vs-1 classification, else it just returns the feature weights

Implement MulticlassClassifierSensitivity class to attach the compare…

4f48e0b

…d pairs as sampleattributes to the sensitivity dataset

Implement sensitivity analyzer for linear skl classifier

6914512

Implement SKLLearnerWeights class to return feature weights

53310d9

yarikoptic self-assigned this Jan 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH]: Sensitvitiy Analyzer for sklearns SGD classifier #596

[ENH]: Sensitvitiy Analyzer for sklearns SGD classifier #596

adswa commented Jan 3, 2019

adswa commented Jan 3, 2019

[ENH]: Sensitvitiy Analyzer for sklearns SGD classifier #596

Are you sure you want to change the base?

[ENH]: Sensitvitiy Analyzer for sklearns SGD classifier #596

Conversation

adswa commented Jan 3, 2019

adswa commented Jan 3, 2019