-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse matrices w/ sklearn api #843
Comments
xgboost is optimized for sparse matrix treating non-presenting entries as misisng, while sklearn does not. and it really depends on what preprocessing was done to pandas data frame. |
Having same issue. I have sparse dataset of user clicks on websites, and when I fit XGBclassifier (xgb sklearn wrapper) with src matrix I get way worse results than if doing with standard numpy. My data is basically 1/0 (visited/nonvisited). I still do not understand why results differ so much |
This was also noted in #1238. There is definitely a problem with sparse matrixes in the latest sklearn API. |
I'm getting very different scores when using a scipy.sparse.csr_matrix as an input vs a Pandas Data Frame for XGBRegressor.fit() I checked this behavior against scikit-learn using random forests and my scores did not deviate, so the issue is not with the sparse matrix. Any ideas?
The text was updated successfully, but these errors were encountered: