Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse matrices w/ sklearn api #843

Closed
dagley11 opened this issue Feb 18, 2016 · 4 comments
Closed

Sparse matrices w/ sklearn api #843

dagley11 opened this issue Feb 18, 2016 · 4 comments

Comments

@dagley11
Copy link

I'm getting very different scores when using a scipy.sparse.csr_matrix as an input vs a Pandas Data Frame for XGBRegressor.fit() I checked this behavior against scikit-learn using random forests and my scores did not deviate, so the issue is not with the sparse matrix. Any ideas?

@tqchen
Copy link
Member

tqchen commented Feb 23, 2016

xgboost is optimized for sparse matrix treating non-presenting entries as misisng, while sklearn does not. and it really depends on what preprocessing was done to pandas data frame.

@hlbkin
Copy link

hlbkin commented Apr 24, 2016

Having same issue. I have sparse dataset of user clicks on websites, and when I fit XGBclassifier (xgb sklearn wrapper) with src matrix I get way worse results than if doing with standard numpy.

My data is basically 1/0 (visited/nonvisited).

I still do not understand why results differ so much

@bryan-woods
Copy link
Contributor

This was also noted in #1238. There is definitely a problem with sparse matrixes in the latest sklearn API.

@tqchen
Copy link
Member

tqchen commented Sep 17, 2016

#1583

@tqchen tqchen closed this as completed Sep 17, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Oct 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants