pip version 0.6 does not support sparse feature vectors #1456

WladimirSidorenko · 2016-08-10T20:01:12Z

Problem Description

In contrast to the versions 0.4, PIP version 0.6 of the package does not support sparse feature vectors. This makes the new version backward incompatible to the previous state, so you should either roll this change back or increase the major version number.

0.4

import xgboost
xgboost.__version__
# '0.4'

from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import Pipeline

model = Pipeline([("vect", DictVectorizer()), ("clf", xgboost.XGBClassifier())])

x = [{"feat{:d}".format(x_i): 1}
             for x_i in xrange(10)]
y = [y_i for y_i in reversed(xrange(10))]

model.fit(x, y)
# Pipeline(steps=[('vect', DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sort=True,
#        sparse=True)), ('clf', XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
#       gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3,
#       min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
#       objective='multi:softprob', reg_alpha=0, reg_lambda=1,
#       scale_pos_weight=1, seed=0, silent=True, subsample=1))])
for x_i in x:
      model.predict_proba(x_i)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)
# array([[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]], dtype=float32)

0.6

import xgboost
xgboost.__version__
# '0.6'

from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import Pipeline

model = Pipeline([("vect", DictVectorizer()), ("clf", xgboost.XGBClassifier())])

x = [{"feat{:d}".format(x_i): 1}
             for x_i in xrange(10)]
y = [y_i for y_i in reversed(xrange(10))]

model.fit(x, y)
# Pipeline(steps=[('vect', DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sort=True,
#        sparse=True)), ('clf', XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
#       gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3,
#       min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
#       objective='multi:softprob', reg_alpha=0, reg_lambda=1,
#       scale_pos_weight=1, seed=0, silent=True, subsample=1))])
for x_i in x:
      model.predict_proba(x_i)
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/sidorenko/Projects/DiscourseSenser/venv/local/lib/python2.7/site-packages/sklearn/utils/metaestimators.py", line 37, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "/home/sidorenko/Projects/DiscourseSenser/venv/local/lib/python2.7/site-packages/sklearn/pipeline.py", line 240, in predict_proba
    return self.steps[-1][-1].predict_proba(Xt)
  File "/home/sidorenko/Projects/DiscourseSenser/venv/local/lib/python2.7/site-packages/xgboost/sklearn.py", line 477, in predict_proba
    ntree_limit=ntree_limit)
  File "/home/sidorenko/Projects/DiscourseSenser/venv/local/lib/python2.7/site-packages/xgboost/core.py", line 939, in predict
    self._validate_features(data)
  File "/home/sidorenko/Projects/DiscourseSenser/venv/local/lib/python2.7/site-packages/xgboost/core.py", line 1179, in _validate_features
    data.feature_names))
ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9'] ['f0']
expected f1, f2, f3, f4, f5, f6, f7, f8, f9 in input data

System Specification

uname -srm
Linux 3.19.0-42-generic x86_64

pip --version
pip 8.1.1

python --version
Python 2.7.6

phunterlau · 2016-08-11T03:18:08Z

@WladimirSidorenko it is the new change in xgboost itself in 0.6 instead of pip installation change. please refer to #1238 in short, from @abhishekkrthakur

It seems that this works only if the sparse matrics is CSC. It doesn't work for CSR or COO matrices like earlier versions.

however, if you want, you can always install a previous version pip install xgboost==0.4a30 since I haven't hide this one, just in case some one still loves 0.4 version./

WladimirSidorenko · 2016-08-11T09:31:30Z

duplicate issue ( #1238 )

WladimirSidorenko closed this as completed Aug 11, 2016

lock bot locked as resolved and limited conversation to collaborators Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pip version 0.6 does not support sparse feature vectors #1456

pip version 0.6 does not support sparse feature vectors #1456

WladimirSidorenko commented Aug 10, 2016

phunterlau commented Aug 11, 2016

WladimirSidorenko commented Aug 11, 2016

pip version 0.6 does not support sparse feature vectors #1456

pip version 0.6 does not support sparse feature vectors #1456

Comments

WladimirSidorenko commented Aug 10, 2016

Problem Description

0.4

0.6

System Specification

phunterlau commented Aug 11, 2016

WladimirSidorenko commented Aug 11, 2016