Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX]: First attempt to get a classifiers plot function also build confusion matrices from a subset of labels #595

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

adswa
Copy link
Contributor

@adswa adswa commented Dec 14, 2018

This PR:

  • allows plotting of confusion matrices with a subset of labels of the original classification, if they are specified as strings within the labels parameter
  • disables plot() from blowing up if the length of provided labels does not match the dimensions of the matrix
  • add a small smoketest to see whether plotting of full or subsets of the labels blows up

…tring. Disable error message for dimension mismatch between matrix and label lengths
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 80.035% when pulling 4452807 on AdinaWagner:confusion into fe0b765 on PyMVPA:master.

@adswa
Copy link
Contributor Author

adswa commented Dec 14, 2018

(@yarikoptic)

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ultimate goal is somewhat reverse in this PR -- you are trying to plot subportion of the matrix (by providing subset of labels known to ConfusionMatrix), instead of needed for use "order" being a super set defining the order of the labels potentially with some which dataset at hands didn't have

@@ -969,6 +969,10 @@ def plot(self, labels=None, numbers=False, origin='upper',
v = None
if l is not None: v = labels_map_full[l]
labels_plot += [v]
elif (len(labels) != labels_order_filtered_set) and \
all(isinstance(x, str) for x in labels_order_filtered_set):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type shouldn't matter really - I might be having numeric labels
also, why not just to use sets operation to check exactly that labels_order_filtered_set includes all we would like to plot like

Suggested change
all(isinstance(x, str) for x in labels_order_filtered_set):
elif not (set(labels) - labels_order_filtered_set):

? because len is too weak of a check -- I could have completely different labels from what is provided to labels_order and condition would still pass, but then you would end up with no labels_plot

"Number of labels %d doesn't correspond the size" + \
" of a confusion matrix %s" % (NlabelsNN, matrix.shape)
# I believe this Error is superfluous -- or rather: wouldn't let
# subsets of matrices be printed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want subsets of matrices printed? that is related to the comment above. It could be a feature indeed, but I think if desired, it should be explicit, like adding an additional parameter allow_partial=False, so by default we would blow (so not to miss errors in specification) and only if explicitly instructed -- then to be allowed.

@@ -673,6 +673,26 @@ def test_confusionmatrix_nulldist(self):
cvnp[(np.array([0,1]), np.array([1,0]))])


def test_plotting_subset():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if we do allow (optionally) the subset to be visualized -- then yeah, worth having this test

our use case though is different -- it is when we provide labels (to order) which has more labels than it is known to the ca.stats.labels (like we include "brain" in providing the order, but then we had no "brain" label within our dataset we worked on)

res = cv(ds)
# get all labels
labels = cv.ca.stats.labels
cv.ca.stats.plot(labels=labels, numbers=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for unit testing ConfusionMatrix there is no need to do full blown classification. Just create an instance of ConfusionMatrix (see other tests) and work with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants