New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xgboost Predictor performant Op #645
xgboost Predictor performant Op #645
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at the tests in much detail, but the overall code looks good!
Will add more feedback after a closer look
...me/src/main/scala/ml/combust/mleap/xgboost/runtime/XGBoostPredictorClassificationModel.scala
Show resolved
Hide resolved
0b98581
to
7a942e2
Compare
I see in logs I think travis wants both language and scala version to be set at the same hierarchical level in the travis file:
I fixed it here and rebased already: 7dcd9bd EDIT: this is still failing even with scala 2.11.8 -- maybe because of |
...me/src/main/scala/ml/combust/mleap/xgboost/runtime/XGBoostPredictorClassificationModel.scala
Outdated
Show resolved
Hide resolved
7a942e2
to
dc4447c
Compare
Tests are fixed. As I suspected it had to do with scala 2.12 According to this line https://github.com/combust/mleap/blob/master/project/Dependencies.scala#L70 there's no dmlc xgboost for scala 2.12 |
@lucagiovagnoli could you please pull the latest master here, thank you! |
0cc18e2
to
a471a82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!!
...runtime/src/main/scala/ml/combust/mleap/xgboost/runtime/XGBoostPredictorClassification.scala
Show resolved
Hide resolved
...me/src/main/scala/ml/combust/mleap/xgboost/runtime/XGBoostPredictorClassificationModel.scala
Outdated
Show resolved
Hide resolved
mleap-xgboost-runtime/src/main/scala/ml/combust/mleap/xgboost/runtime/struct/FVecFactory.scala
Show resolved
Hide resolved
a471a82
to
8474e19
Compare
LGTM! |
Thanks for reviewing everyone! @ancasarb this has now been rebased on master and tests pass. let me know what you think |
Last diff:
|
9d0e1fb
to
7d9143e
Compare
@tinaxi: your comment disappeared after I amended the commit. RE: I streamlined those imports and converted to |
LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for updating the FVec code
Hi @ancasarb ! We've been using this internally successfully for a while now, do you think we could merge it into master? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucagiovagnoli the code changes seem fine. I had minor comments, nothing major.
I want to check something, we use xgboost-predictor only for classification models, but still use xgboost-4j for regression models, is that right? Why is that?
Right now, to use the xgboost-predictor, we'd need to change reference.conf, as you mentioned. I am just thinking if we could instead have two sub-module "mleap-xgboost-4j-runtime" and "mleap-xgboost-predictor" and then depending on the desired implementation, the user would choose between the two? And we can even have a "mleap-xgboost-runtime" base module, that store any code that's common between the two implementations? What do you think?
Could I also please ask you to update the RELEASE_NOTES.md (a first attempt at keeping some release notes) with these changes too? And could you please add any other xgboost related changes that I might have missed? Thank you!
@@ -22,6 +22,7 @@ object Common { | |||
fork in Test := true, | |||
javaOptions in test += sys.env.getOrElse("JVM_OPTS", ""), | |||
resolvers += Resolver.mavenLocal, | |||
resolvers += Resolver.jcenterRepo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required by xgboost-predictor new dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required cause that's where the predictor lives, it's how they suggest to import it on their main page: https://github.com/komiya-atsushi/xgboost-predictor-java (note that the original version is https://github.com/komiya-atsushi/xgboost-predictor-java, not https://github.com/h2oai/xgboost-predictor ;)
...ain/scala/ml/combust/mleap/xgboost/runtime/bundle/ops/XGBoostPredictorClassificationOp.scala
Show resolved
Hide resolved
mleap-xgboost-runtime/src/main/scala/ml/combust/mleap/xgboost/runtime/struct/FVecFactory.scala
Show resolved
Hide resolved
Thanks for reviewing! You raised an excellent question today.
So, I didn't modify the Regressor in this PR for 2 reasons:
I think your suggestion makes a lot of sense and would be less confusing than playing with reference.conf - exactly the kind of input I was looking for when asking for your opinion!
Example: say that we train a model using xgboost-spark, the json dump will hold this uniqueName xgboost.classifier. When loading from disk, that will be re-mapped to the xgboost-runtime here. A Predictor Op needs a different uniqueName - say To maintain this spirit of runtime-flexibility, I'm leaning towards choosing via the reference.conf file. Notice that this PR is a no-op for anyone who doesn't care about speed. The change in reference.conf is only needed to switch on to the Predictor, so this weirder user-experience will not reach most MLeap users (xgboost4j is still the default for xgboost-runtime). Let me know what you think!
Will do! |
In the meantime, I added RELEASE_NOTES and a new README for XGBoost-runtime |
@lucagiovagnoli How about this approach?
XGBoostClassificationOp using the "xgboost.classifier" op name
XGBoostPredictorClassificationOp using the "xgboost.classifier" op name as well We will have the xgboost-spark serialize as before to "xgboost.classifier" and "xgboost.regression" and the choice will come at runtime: what do you think? |
Oh I see, so if someone does not import the xgboost4j, but they import the Predictor, then the big merged reference.conf will only contain the PredictorOp? I haven't tested this but I think that makes a lot of sense. I'll try have a look testing this and splitting the code next week |
…predictor for higher performance
…s and was broken - New FVec Tensor factories - also fix a copy-pasted test
502cbef
to
a5c6de1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
in the source code of XGBoostPredictorClassification can we use both XGBoostPredictorClassification and XGBoostClassification in the same project, because now we have multi bundles with different op dependency, some depend on XGBoostClassification to support leaf prediction while some not. |
solved it by register op through code. |
I added a new MLeapOp to load xgboost models as Predictor objects (more performant than xgboost4j). Honorable mentions to @hollinwilkins who introduced some of this code in 2017(see #259)
In order to use this, it's enough to use a reference.conf file like the following:
and add this to your project's pom file:
This fixes #631 (for review @talalryz @ancasarb )