Releases: catboost/catboost
Releases · catboost/catboost
Release 0.7.2
Major Features And Improvements
- GPU: New
DocParallel
mode for tasks without categorical features and with—max-ctr-complextiy 1
. Provides best performance for pool with big number of documents. - GPU: Distributed training on several GPU host via MPI. See instruction how to build binary here.
- GPU: Up to 30% learning speed-up for Maxwell and later GPUs with binarization level > 32
Bug Fixes and Other Changes
- Hotfixes for GPU version of python wrapper.
Release 0.7.1
Major Features And Improvements
- Python wrapper: added methods to download datasets titanic and amazon, to make it easier to try the library (
catboost.datasets
). - Python wrapper: added method to write column desctiption file (
catboost.utils.create_cd
). - Made improvements to visualization.
- Support non-numeric values in
GroupId
column. - Tutorials section updated.
Bug Fixes and Other Changes
- Fixed problems with eval_metrics (issue #285)
- Other fixes
Release 0.7
Breaking changes
- Changed parameter order in
train()
function to be consistant with other GBDT libraries. use_best_model
is set to True by default ifeval_set
labels are present.
Major Features And Improvements
- New ranking mode
YetiRank
optimizesNDGC
andPFound
. - New visualisation for
eval_metrics
andcv
in Jupyter notebook. - Improved per document feature importance.
- Supported
verbose
=int
: ifverbose
> 1,metric_period
is set to this value. - Supported type(
eval_set
) = list in python. Currently supporting only singleeval_set
. - Binary classification leaf estimation defaults are changed for weighted datasets so that training converges for any weights.
- Add
model_size_reg
parameter to control model size. Fixctr_leaf_count_limit
parameter, also to control model size. - Beta version of distributed CPU training with only float features support.
- Add
subgroupId
to Python/R-packages. - Add groupwise metrics support in
eval_metrics
.
Thanks to our Contributors
This release contains contributions from CatBoost team.
We are grateful to all who filed issues or helped resolve them, asked and answered questions.
Release 0.6.3
Breaking changes
boosting_type
parameter valueDynamic
is renamed toOrdered
.- Data visualisation functionality in Jupyter Notebook requires ipywidgets 7.x+ now.
query_id
parameter renamed togroup_id
in Python and R wrappers.- cv returns pandas.DataFrame by default if Pandas installed. See new parameter
as_pandas
.
Major Features And Improvements
- CatBoost build with make file. Now it’s possible to build command-line CPU version of CatBoost under Linux with make file.
- In column description column name
Target
is changed toLabel
. It will still work with previous name, but it is recommended to use the new one. eval-metrics
mode added into cmdline version. Metrics can be calculated for a given dataset using a previously trained model.- New classification metric
CtrFactor
is added. - Load CatBoost model from memory. You can load your CatBoost model from file or initialize it from buffer in memory.
- Now you can run
fit
function using file with dataset:fit(train_path, eval_set=eval_path, column_description=cd_file)
. This will reduce memory consumption by up to two times. - 12% speedup for training.
Bug Fixes and Other Changes
- JSON output data format is changed.
- Python whl binaries with CUDA 9.1 support for Linux OS published into the release assets.
- Added
bootstrap_type
parameter toCatBoostClassifier
andRegressor
(issue #263).
Thanks to our Contributors
This release contains contributions from newbfg and CatBoost team.
We are grateful to all who filed issues or helped resolve them, asked and answered questions.
Release 0.6.2
Major Features And Improvements
- BETA version of distributed mulit-host GPU via MPI training
- Added possibility to import coreml model with oblivious trees. Makes possible to migrate pre-flatbuffers model (with float features only) to current format (issue #235)
- Added QuerySoftMax loss function
Bug Fixes and Other Changes
- Fixed GPU models bug on pools with both categorical and float features (issue #241)
- Use all available cores by default
- Default float features binarization method set to
GreedyLogSum
- Fixed not querywise loss for pool with
QueryId
Release 0.6.1.1
Release 0.6.1
Bug Fixes and Other Changes
- Fixed critical bugs in formula evaluation code (issue #236)
- Added
scale_pos_weight
parameter
Release 0.6
Speedups
- 25% speedup of the model applier
- 43% speedup for training on large datasets.
- 15% speedup for
QueryRMSE
and calculation of querywise metrics. - Large speedups when using binary categorical features.
- Significant (x200 on 5k trees and 50k lines dataset) speedup for plot and stage predict calculations in cmdline.
- Compilation time speedup.
Major Features And Improvements
- Industry fastest applier implementation.
- Introducing new parameter
boosting-type
to switch between standard boosting scheme and dynamic boosting, described in paper "Dynamic boosting". - Adding new bootstrap types
bootstrap_type
,subsample
. UsingBernoulli
bootstrap type withsubsample < 1
might increase the training speed. - Better logging for cross-validation, added parameter
logging_level
andmetric_period
(should be set in training parameters) to cv. - Added a separate
train
function that receives the parameters and returns a trained model. - Ranking mode
QueryRMSE
now supports default settings for dynamic boosting. - R-package pre-build binaries are included into release.
- We added many synonyms to our parameter names, now it is more convenient to try CatBoost if you are used to some other library.
Bug Fixes and Other Changes
- Fix for CPU
QueryRMSE
with weights. - Adding several missing parameters into wrappers.
- Fix for data split in querywise modes.
- Better logging.
- From this release we'll provide pre-build R-binaries.
- More parallelisation.
- Memory usage improvements.
- And some other bug fixes.
Thanks to our Contributors
This release contains contributions from CatBoost team.
We are grateful to all who filed issues or helped resolve them, asked and answered questions.
Release 0.5.2.1
Hot fixes
Release 0.5.2
Release 0.5.2
Major Features And Improvements
- We've made single document formula applier 4 times faster!
model.shrink
function added in Python and R wrappers.- Added new training parameter
metric_period
that controls output frequency. - Added new ranking metric
QueryAverage
. - This version contains an easy way to implement new user metrics in C++. How-to example is provided.
Bug Fixes and Other Changes
- Stability improvements and bug fixes
As usual we are grateful to all who filed issues, asked and answered questions.