Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decision function for LinearSVC #51

Open
mrshanth opened this issue Jul 23, 2015 · 2 comments
Open

Decision function for LinearSVC #51

mrshanth opened this issue Jul 23, 2015 · 2 comments

Comments

@mrshanth
Copy link

Hi,

Can we get the confidence score, like we get it in sci-kit learn using decision function method?
I get the following error when I run the code:

svm_model.decision_function(Z[:,'X'])

error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/site-packages/sklearn/linear_model/base.py", line 199, in decision_function
    X = check_array(X, accept_sparse='csr')
  File "/usr/lib64/python2.6/site-packages/sklearn/utils/validation.py", line 344, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.

Thanks

@kszucs
Copy link
Contributor

kszucs commented Jul 27, 2015

Decision function method is not yet implemented. BTW it's pretty straightforward:

class LinearClassifierMixin(ClassifierMixin):
    """Mixin for linear classifiers.

    Handles prediction for sparse and dense X.
    """

    def decision_function(self, X):
        """Predict confidence scores for samples.

        The confidence score for a sample is the signed distance of that
        sample to the hyperplane.

        Parameters
        ----------
        X : {array-like, sparse matrix}, shape = (n_samples, n_features)
            Samples.

        Returns
        -------
        array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
            Confidence scores per (sample, class) combination. In the binary
            case, confidence score for self.classes_[1] where >0 means this
            class would be predicted.
        """
        if not hasattr(self, 'coef_') or self.coef_ is None:
            raise NotFittedError("This %(name)s instance is not fitted"
                                 "yet" % {'name': type(self).__name__})

        X = check_array(X, accept_sparse='csr')

        n_features = self.coef_.shape[1]
        if X.shape[1] != n_features:
            raise ValueError("X has %d features per sample; expecting %d"
                             % (X.shape[1], n_features))

        scores = safe_sparse_dot(X, self.coef_.T,
                                 dense_output=True) + self.intercept_
        return scores.ravel() if scores.shape[1] == 1 else scores

We need to create a spark version of LinearClassifierMixin, simply map the sklearn's decision_function method on the RDD, something like this:

class SparkLinearClassifierMixin(LinearClassifierMixin, SparkBroadcasterMixin):
    """Mixin for linear classifiers.

    Handles prediction for sparse and dense X.
    """

    __transient__ = ['coef_', 'intercept_']  #broadcastable variables, possibly larger arrays

    def decision_function(self, X):
        check_rdd(X, (sp.spmatrix, np.ndarray))

        mapper = self.broadcast(
            super(LinearClassifierMixin, self).decision_function, X.context)
        return X.map(mapper)

Finally extend SparkLinearSVC to support the functionality above:

class SparkLinearSVC(LinearSVC, SparkLinearClassifierMixin, SparkLinearModelMixin):

We plan to implement it in the next few weeks, but as always, contribution is appreciated :)

@kszucs
Copy link
Contributor

kszucs commented Nov 4, 2015

@mrshanth I saw You've implemented the decision function support. Would You make a pull request please? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants