Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost Performance Issues #631

Closed
voganrc opened this issue Jan 22, 2020 · 12 comments · Fixed by #645
Closed

XGBoost Performance Issues #631

voganrc opened this issue Jan 22, 2020 · 12 comments · Fixed by #645
Assignees

Comments

@voganrc
Copy link
Contributor

voganrc commented Jan 22, 2020

Hello,

I ran some JMH benchmarks that show MLeap to be significantly slower than other libraries for evaluating XGBoost models.

image

Here you can see throughput (ops / sec) as a function of library and batch size, where:

xgboost4j = https://github.com/dmlc/xgboost/tree/master/jvm-packages
xgboost-predictor-java = https://github.com/komiya-atsushi/xgboost-predictor-java
yelp-xgboost = https://github.com/Yelp/xgboost-predictor-java
mleap = https://github.com/combust/mleap

Given that Mleap makes use of xgboost4j-spark does anyone know why it would have half the throughput of xgboost4j? Also, is there a reason why mleap does not observe constant throughput scaling like xgboost4j does?

Thanks!
-Ryan

@lucagiovagnoli
Copy link
Member

lucagiovagnoli commented Jan 23, 2020

Hi Anca! So, we've run some tests above and noticed that xgboost4j is much slower than xgboost-predictor-java :(
Historically we've used a fork of xgboost-predictor at Yelp (yelp-xgboost) so we're hitting a performance issue when running MLeap cause xgboost4j seems thousands of times slower :/

  • I was thinking to make a PR to allow deserializing the model binary as either an xgboost-predictor OR xgboost4j (based users' preference). Fortunately it seems that xgboost-predictor supports loading from xgboost4j binaries, see the ModelReader. I just wanted to check what you think before we get to it.

PS: I also noticed that @hollinwilkins looked into xgboost-predictor in the past, he commented on the xgboost-predictor project about deploying it to Maven Central (comment here). I wonder if they considered that rather than xgboost4j and why it didn't work out ?

@ancasarb

@ancasarb
Copy link
Member

Hey @lucagiovagnoli this seems fine to me, do you know if xgboost-predictor-java is available in maven already? Or is that ticket still pending?

@lucagiovagnoli
Copy link
Member

I've just noticed they are! It's mentioned on their README

and i found them on bintray :)

@ancasarb
Copy link
Member

ok, cool :) It would be nice to see the difference in performance between the two :)

@lucagiovagnoli
Copy link
Member

lucagiovagnoli commented Jan 31, 2020

Hi @ancasarb,
I just found out that mleap was actually using xgboost-predictor in 2017/2018 (#259)! That sounds great. I wonder if you know why it was removed in this PR #401. Any any lesson learnt or big blockers in using it? (wondering if it was just to switch back to the "official" implementation)

It seems some others like @ytjia are suffering from performance degradation. We also fixed all known bugs in our fork https://github.com/Yelp/xgboost-predictor-java and we plan to contribute these back upstream

@ancasarb
Copy link
Member

I'd probably say that it was due to the Maven availability and being able to run on travis etc., but it was Hollin who did the integration.

What performance degradation are you referring to from @ytjia? Is there an issue for that, or?

@lucagiovagnoli
Copy link
Member

My bad, I misused "degradation", I meant that xgboost4j is so slow for us that it's impractical to use at scale for online inference. I found this comment from @ytjia on pr #401 mentioning they are having the same problem.

I really liked @hollinwilkins work integrating xgboost-predictor in PR #259, if there's no real blocker I have a PR (almost ready) to include both xgboost4j (status quo) and xgboost-predictor (high performance) in MLeap so developers can de-serialize xgboost models using the library they prefer :)

@ancasarb
Copy link
Member

ancasarb commented Feb 4, 2020

That sounds like a good plan @lucagiovagnoli.

@changhiskhan
Copy link

I can add some color on the performance issues mentioned by @ytjia. I believe xgboost4j uses a static synchronized JNI entry point to communicate with the underlying xgboost code (which is not thread-safe). This isn't a problem in single-threaded applications, but in many real-time prediction applications, it's behind a multi-threaded service handling concurrent requests. We ended up using xgboost-predictor-java as a workaround.

@indranilr
Copy link

indranilr commented Feb 5, 2020

Very useful analysis by @voganrc , we are looking at a spark xgboost4j model for online inference but after reading this thread it appears xgboost-predictor-java is better choice for such scenarios. Would be quite useful if MLeap can be configured to use xgboost-predictor-java .

Hopefully, this enhancement will be part of next Mleap release.

@lucagiovagnoli lucagiovagnoli self-assigned this Feb 5, 2020
@indranilr
Copy link

I can add some color on the performance issues mentioned by @ytjia. I believe xgboost4j uses a static synchronized JNI entry point to communicate with the underlying xgboost code (which is not thread-safe). This isn't a problem in single-threaded applications, but in many real-time prediction applications, it's behind a multi-threaded service handling concurrent requests. We ended up using xgboost-predictor-java as a workaround.

To clarify my understanding a little, can a spark model exported from XGBoost4j-Spark (Example: xgbClassificationModel.write.overwrite().save(xgbClassificationModelPath) be used to predict using xgboost-predictor-java ? Would I need to export the XGBoost4j-Spark model to native binding first (Example : xgbClassificationModel.nativeBooster.saveModel(nativeModelPath) and then use xgboost-predictor-java to load and serve the model ?

@lucagiovagnoli
Copy link
Member

@indranilr @changhiskhan feel free to review #645 which should solve this. There's some instructions on how to use the Predictor in the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants