Sparse matrices w/ sklearn api #843

dagley11 · 2016-02-18T20:37:05Z

I'm getting very different scores when using a scipy.sparse.csr_matrix as an input vs a Pandas Data Frame for XGBRegressor.fit() I checked this behavior against scikit-learn using random forests and my scores did not deviate, so the issue is not with the sparse matrix. Any ideas?

tqchen · 2016-02-23T16:57:28Z

xgboost is optimized for sparse matrix treating non-presenting entries as misisng, while sklearn does not. and it really depends on what preprocessing was done to pandas data frame.

hlbkin · 2016-04-24T21:59:12Z

Having same issue. I have sparse dataset of user clicks on websites, and when I fit XGBclassifier (xgb sklearn wrapper) with src matrix I get way worse results than if doing with standard numpy.

My data is basically 1/0 (visited/nonvisited).

I still do not understand why results differ so much

bryan-woods · 2016-09-16T17:54:50Z

This was also noted in #1238. There is definitely a problem with sparse matrixes in the latest sklearn API.

tqchen · 2016-09-17T16:40:41Z

#1583

tqchen closed this as completed Sep 17, 2016

lock bot locked as resolved and limited conversation to collaborators Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse matrices w/ sklearn api #843

Sparse matrices w/ sklearn api #843

dagley11 commented Feb 18, 2016

tqchen commented Feb 23, 2016

hlbkin commented Apr 24, 2016

bryan-woods commented Sep 16, 2016

tqchen commented Sep 17, 2016

Sparse matrices w/ sklearn api #843

Sparse matrices w/ sklearn api #843

Comments

dagley11 commented Feb 18, 2016

tqchen commented Feb 23, 2016

hlbkin commented Apr 24, 2016

bryan-woods commented Sep 16, 2016

tqchen commented Sep 17, 2016