Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed for a single record #12

Open
geoHeil opened this issue Dec 13, 2016 · 3 comments
Open

speed for a single record #12

geoHeil opened this issue Dec 13, 2016 · 3 comments

Comments

@geoHeil
Copy link

geoHeil commented Dec 13, 2016

Did you know about dmlc/xgboost#1849 (comment)

Apparently xgboost4j is quicker for batch predictions in the current version than this library.
Do you have a test which compares predicting a single new value and not 200k values? As described in the linked xgboost issue xgboost4j,s api is only supporting batch mode. What about your library?

I have tested on a dataset (containing 200,000 data) on spark. The xgboost4j-spark cost 1775736 milliseconds containing implicit data transformations. xgboost-predictor-java cost 4620104 milliseconds containing data transformations and 2907550 milliseconds without transformations. I think xgboost4j's prediction on a batch is faster and I will keep using xgboost4j.

@CasyWang
Copy link

CasyWang commented May 3, 2017

Any conclusion here?

@slevental
Copy link
Contributor

slevental commented Jun 26, 2017

Looks like benchmark results posted in the README.md file is quite misleading, they claim that current JVM version is few orders of magnitude faster than xgboost4j, and if you would run benchmark you will be able to get similar results. However, if you will dig deeper you would figure out that most of the time xgboost4j spend on creating DMatrix object - which is not in sparse format (by default) and has huge size: 100x100000. I believe that using sparse matrix format would boost performance. I've checked benchmark with DMatrix of size 80x100 - more suitable for my case and performance of xgboost4j was better (30-40% faster).

@edumucelli
Copy link

I have made a benchmark on some of the different libraries available, among them XGBoost4j and XGBoost-Predictor, you can take a look here if you are interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants