New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost Performance Issues #631
Comments
Hi Anca! So, we've run some tests above and noticed that xgboost4j is much slower than xgboost-predictor-java :(
PS: I also noticed that @hollinwilkins looked into xgboost-predictor in the past, he commented on the xgboost-predictor project about deploying it to Maven Central (comment here). I wonder if they considered that rather than xgboost4j and why it didn't work out ? |
Hey @lucagiovagnoli this seems fine to me, do you know if xgboost-predictor-java is available in maven already? Or is that ticket still pending? |
ok, cool :) It would be nice to see the difference in performance between the two :) |
Hi @ancasarb, It seems some others like @ytjia are suffering from performance degradation. We also fixed all known bugs in our fork https://github.com/Yelp/xgboost-predictor-java and we plan to contribute these back upstream |
I'd probably say that it was due to the Maven availability and being able to run on travis etc., but it was Hollin who did the integration. What performance degradation are you referring to from @ytjia? Is there an issue for that, or? |
My bad, I misused "degradation", I meant that xgboost4j is so slow for us that it's impractical to use at scale for online inference. I found this comment from @ytjia on pr #401 mentioning they are having the same problem. I really liked @hollinwilkins work integrating |
That sounds like a good plan @lucagiovagnoli. |
I can add some color on the performance issues mentioned by @ytjia. I believe |
Very useful analysis by @voganrc , we are looking at a spark xgboost4j model for online inference but after reading this thread it appears Hopefully, this enhancement will be part of next Mleap release. |
To clarify my understanding a little, can a spark model exported from |
@indranilr @changhiskhan feel free to review #645 which should solve this. There's some instructions on how to use the Predictor in the PR |
Hello,
I ran some JMH benchmarks that show MLeap to be significantly slower than other libraries for evaluating XGBoost models.
Here you can see throughput (ops / sec) as a function of library and batch size, where:
xgboost4j = https://github.com/dmlc/xgboost/tree/master/jvm-packages
xgboost-predictor-java = https://github.com/komiya-atsushi/xgboost-predictor-java
yelp-xgboost = https://github.com/Yelp/xgboost-predictor-java
mleap = https://github.com/combust/mleap
Given that Mleap makes use of
xgboost4j-spark
does anyone know why it would have half the throughput ofxgboost4j
? Also, is there a reason whymleap
does not observe constant throughput scaling likexgboost4j
does?Thanks!
-Ryan
The text was updated successfully, but these errors were encountered: