Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] XGBoost 1.0.0 Roadmap #4680

Closed
8 of 9 tasks
CodingCat opened this issue Jul 18, 2019 · 52 comments
Closed
8 of 9 tasks

[Roadmap] XGBoost 1.0.0 Roadmap #4680

CodingCat opened this issue Jul 18, 2019 · 52 comments

Comments

@CodingCat
Copy link
Member

CodingCat commented Jul 18, 2019

@dmlc/xgboost-committer please add your items here by editing this post. Let's ensure that

  • each item has to be associated with a ticket

  • major design/refactoring are associated with a RFC before committing the code

  • blocking issue must be marked as blocking

  • breaking change must be marked as breaking

for other contributors who have no permission to edit the post, please comment here about what you think should be in 1.0.0

I have created three new types labels, 1.0.0, Blocking, Breaking

@CodingCat CodingCat pinned this issue Jul 18, 2019
@thesuperzapper
Copy link
Contributor

Not a committer, but can we please target PySpark API for 1.0?
Issue: #3370
Current PR: #4656

@CodingCat
Copy link
Member Author

for other contributors who have no permission to edit the post, please comment here about what you think should be in 1.0.0

@thesuperzapper
Copy link
Contributor

Also, should we target moving exclusively to the Scala based Rabit tracker (for Spark) in 1.0?

@trams
Copy link
Contributor

trams commented Jul 20, 2019

I am also not a committer but me and the company I work in is very interested in fixing the performance issue with checkpointing (or at least mitigate it) #3946

@trivialfis
Copy link
Member

@trams @thesuperzapper I think this is an overview for everyone to have a feeling for what's coming next. It would be difficult to list everything coming since XGBoost is a community driven project. Just open a PR when it's ready.

Not a committer, but can we please target PySpark API for 1.0?

@thesuperzapper Let's track the progress. I certainly hope that I can start testing it. :-)

@thesuperzapper
Copy link
Contributor

There is also the secondary consideration, that we might not be ready for 1.0, and the API guarantees that come with that, for example, we could instead do 0.10.0 next?

@trivialfis
Copy link
Member

@thesuperzapper 1.0 is not gonna be a final version. It's just we are trying to do semantic versioning.

@RAMitchell
Copy link
Member

Added some gpu related items.

@chenqin
Copy link
Contributor

chenqin commented Aug 8, 2019

would like to get native xgb fix included.
#4753

@trivialfis
Copy link
Member

JSON is removed from the list. See #4683 (comment)

@thesuperzapper
Copy link
Contributor

I raised an issue for my above suggestion: #4781 (To remove the python Rabit tracker)

@Daniel8hen
Copy link
Contributor

FeatureImportance in the Spark version will be great as well (i.e. easily have the feature Importance)
#988

@trivialfis
Copy link
Member

Added regression test.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 21, 2019

@chenqin I'd like to hear from you about regression tests, since you have experience with managing ML in production. Any suggestions?

@chenqin
Copy link
Contributor

chenqin commented Aug 22, 2019

@chenqin I'd like to hear from you about regression tests, since you have experience with managing ML in production. Any suggestions?

I think we should cover regression test on various of workloads and benchmark against prediction accuracy and stability (equal or better) than previous version within approximate same time. Two candidates on top of my head are

https://archive.ics.uci.edu/ml/datasets/HIGGS

sparse Dmatrix
https://www.kaggle.com/c/ClaimPredictionChallenge

We can try various of tree methods and configurations to ensure good coverage

tree_method, configurations / dataset / standalone or cluster

declaimer:
I think it worth clarify a bit.

  • Release regression is not something we already done in the company I worked.

  • The data sets I proposed is arbitrary which may not used as benchmark to claim one framework better than another. (this is most concerning when I saw biased benchmarks from time to time)

  • In fact, the essence of tune and uncover proper features/settings have always been more important. Unfortunately we may not cover this in regression tests.

May be more organized plan is to build a automation tool where user can take and benchmark various settings against their private data-set and model in their own data center.

@thesuperzapper
Copy link
Contributor

We should add fixing #4779 as a requirement to ship 1.0

@codingforfun
Copy link

I add #4899 as a cleanup step.

@hcho3
Copy link
Collaborator

hcho3 commented Oct 5, 2019

@dmlc/xgboost-committer Since we have quite a few tasks left for 1.0, maybe we should make an interim release 0.91?

@thesuperzapper
Copy link
Contributor

thesuperzapper commented Oct 5, 2019

@hcho3 Or perhaps 0.10.0

@trivialfis
Copy link
Member

@thesuperzapper That will confuse version system. I don't mind a 0.91 release, but still I want to see proper procedures for regression tests.

@thesuperzapper
Copy link
Contributor

@trivialfis If master has API changes, shouldn't we bump a major version, which I guess would look like 0.100.0

@hcho3
Copy link
Collaborator

hcho3 commented Oct 5, 2019

@thesuperzapper The 1.0.0 version is the first version we would adopt semantic versioning scheme, so no, semantic versioning won't apply to the interim release. It's a bit tricky, since we have quite a lot to do until 1.0.0 is released.

@hcho3
Copy link
Collaborator

hcho3 commented Oct 8, 2019

@CodingCat How about 0.100 or 0.95? "Preview" sounds like the 1.0.0 release is just around the corner, but we have quite a few major features (PySpark) on the line.

@douglasren
Copy link

Does it support weight xgboost ?

@CodingCat
Copy link
Member Author

CodingCat commented Oct 9, 2019 via email

@thesuperzapper
Copy link
Contributor

@CodingCat at least from the point of view of xgboost4j-spark, that 1.0.0 preview won't be useful for most people, as almost no one is running Spark on 2.12. Additionally, you can't easily get a compiled binary as https://spark.apache.org/downloads.html dosen't distribute compiled versions of Spark for 2.12 with the Hadoop binaries included.

@CodingCat
Copy link
Member Author

CodingCat commented Oct 11, 2019 via email

@hcho3
Copy link
Collaborator

hcho3 commented Oct 11, 2019

@CodingCat @thesuperzapper I thought #4574 would allow for compiling XGBoost with both Scala 2.11 and 2.12? In that case, we should compile XGBoost with 2.11 and upload JAR to Maven.

@trivialfis
Copy link
Member

Removed:

I don't think we can get to there right now.

@jkbradley
Copy link
Contributor

@thesuperzapper It will be come easier to develop against the Apache Spark master (3.0) branch and Scala 2.12 after Spark releases a 3.0 preview (targeted pretty soon this fall). I'd expect a much bigger shift to Scala 2.12 in the Spark community after the final 3.0 release (targeted early 2020), but you're right that there isn't a ton of 2.12 usage now. I created #4926 to solicit discussion around the upcoming Spark release.

@trams
Copy link
Contributor

trams commented Oct 11, 2019

@CodingCat @thesuperzapper I thought #4574 would allow for compiling XGBoost with both Scala 2.11 and 2.12? In that case, we should compile XGBoost with 2.11 and upload JAR to Maven.

#4574 does not allow to cross compile.
What it allows is for someone to check out the code, manually override scala version and recompile

So someone may compile a jar with 2.11 and upload to Maven
I had a pull request with migration to SBT which would allow to cross compile
I also know the trick how to support a cross compilation in Maven (we used it in our company). I can share if you are interested

@trivialfis
Copy link
Member

trivialfis commented Oct 16, 2019

@hcho3 Is it possible to use CPack for easing the installation for OSX? Please ignore this comment if it's not possible.

@douglasren
Copy link

Does it support Multi objective learning?

@trivialfis
Copy link
Member

trivialfis commented Oct 22, 2019

@douglasren Sadly no. Could you start a new issue so we can discuss it? The term "multi objective" can vary depending on contexts, like one objective function for multiple outputs, multiple objectives with one output or multiple objectives with multiple outputs?

@EricSpeidel
Copy link

I would like to cast my vote towards an interim release as well.

@hcho3
Copy link
Collaborator

hcho3 commented Dec 23, 2019

#5146 fixes #4477.

@trivialfis
Copy link
Member

@TylerADavis
Copy link

An interim release would be great as the macOS installation is still a pain right now

@dubeyrahul
Copy link

Can we get documented support for learning to rank (pairwise) with XGBoost4J-Spark? Currently, there is no concrete solution to how to specify training data. There's some confusion around partitioning by groupID and training data needing to follow same partition strategy, but it's quite vague.
An example or clear documentation would be really helpful!

@lucagiovagnoli
Copy link

I'd like to cast my vote to an interim release as well. We're looking forward to the next version mostly for the missing value fix by @cpfarrell (see #4805).

Is there a time estimate related to the next release (major or interim)?

PS: @thesuperzapper we're using 2.11 and 2.12 and an interim release would be extremely helpful for us

@trivialfis
Copy link
Member

@hcho3 Can we make create a release branch and have a week or so for testing?

@hcho3
Copy link
Collaborator

hcho3 commented Jan 30, 2020

Yes!

@terrytangyuan
Copy link
Member

@hcho3 In addition to a branch, we can also make an official release candidate on GitHub Releases so that the community can have more confidence to test it as well.

@lucagiovagnoli
Copy link

This sounds awesome! Really looking forward to the next release. Let me know if we can help. We're definitely going to test it out at Yelp.

@hcho3
Copy link
Collaborator

hcho3 commented Jan 31, 2020

I will cut a new branch release_1.0.0 after #5248 is merged. Thanks everyone for your patience.

@hcho3
Copy link
Collaborator

hcho3 commented Jan 31, 2020

Release candidate is now available for Python: #5253. You can try it today by running

pip3 install xgboost==1.0.0rc1

@hcho3
Copy link
Collaborator

hcho3 commented Feb 20, 2020

1.0.0 is now out:

pip3 install xgboost==1.0.0

@hcho3 hcho3 closed this as completed Feb 20, 2020
@hcho3 hcho3 unpinned this issue Feb 21, 2020
@lock lock bot locked as resolved and limited conversation to collaborators May 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests