Releases: JuliaAI/MLJ.jl
v0.11.0
MLJ v0.11.0
Make compatibility updates to MLJBase and MLJModels to effect the following changes to MLJ (see the linked release notes for links to the issues/PRs)):
-
(new model) Add LightGBM models
LightGBMClassifier
and
LightGBMRegressor
-
(new model) Add new built-in model,
ContinuousEncoder
, for
transforming all features of a table toContinuous
scitype,
dropping any features that cannot be so transformed -
(new model) Add ParallelKMeans model,
KMeans
, loaded with
@load KMeans pkg=ParallelKMeans
-
(mildly breaking enhancement) Arrange for the
CV
resampling strategyto spread fold "remainders" evenly among folds in
train_test_pairs(::CV, ...)
(a small change only noticeable in
small datasets) -
(breaking) Restyle
report
andfitted_params
for exported
learning networks (e.g., pipelines) to include a dictionary of reports or
fitted_params, keyed on the machines in the underlying learning
network. New doc-strings detail the new behaviour. -
(enhancement) Allow calling of
transform
on machines withStatic
models without
first callingfit!
-
Allow
machine
constructor to work on supervised models that takenothing
for
the input featuresX
(for models that simply fit a
sampler/distribution to the target datay
) (#51)
Also:
-
(documentation) In the "Adding New Models for General Use"
section of the manual, add detail on how to wrap unsupervised
models, as well as models that fit a sampler/distribution to data -
(documentation) Expand the "Transformers" sections of the
manual, including more material on static transformers and
transformers that implementpredict
(#393)
Closed issues:
- Add tuning by stochastic search (#37)
- Improve documentation around static transformers (#393)
- Error in docs for model search (#478)
- Update [compat] StatsBase="^0.32,^0.33" (#481)
- For a 0.10.3 release (#483)
- Help with coercing strings for binary data into Continuous variables (#489)
- EvoTree Error (#490)
- Add info with workaround to avoid MKL error (#491)
- LogisticClassifier pkg = MLJLinearModels computes a number of coefficients but not the same number of mean_and_std_given_feature (#492)
- MethodError: no method matching... (#493)
- For a 0.10.4 release (#495)
- Error: fitted_params(LogisticModel) (#498)
Merged pull requests:
v0.10.3
MLJ v0.10.3
Merged pull requests:
v0.10.2
MLJ v0.10.2
-
Extend [compat] Distributions = "^0.21,^0.22,^0.23"
-
Minor doc fixes
Closed issues:
- Task design discussion (#166)
- Non-normalized versions of measures (#445)
- Overload model traits to work on the named-tuple "proxies" for models listed by models() (#464)
- Multiprocess issue (#468)
- Julia v1.4.0 is downloading MLJ v0.2.3 instead of MLJ v0.10.1 (#476)
Merged pull requests:
- Added IRIS example in docs (#475) (@ashryaagr)
- For a 0.10.2 release (#479) (@ablaom)
v0.10.1
MLJ v0.10.1
(enhancement) Add serialization for machines. Serialization is model-specific, with a fallback implementation using JLSO. The user serializes with MLJBase.save(path, mach)
and de-serializes with machine(path)
(#138, #292)
Closed issues:
Merged pull requests:
- updated list of ScikitLearn models in Readme (#472) (@OkonSamuel)
- For a 0.10.1 release (#473) (@ablaom)
v0.10.0
MLJ v0.10.0
Upgrade to MLJBase 0.12.0 and MLJModels 0.9.0 to effect the following changes:
-
(breaking) suppress normalisation of measure weights (MLJBase PR #208)
-
(breaking) Shift the optional
rng
argument of iterator to first position (MLJBase #215) -
(mildly breaking) Let all models (supervised and unsupervised) share a common set of traits. So, for example, unsupervised models now have the
target_scitype
trait (usually taking the valueUnknown
). For a list of the common traits, domodels()[1] |> keys |> collect
(JuliaAI/MLJBase.jl#163). -
(enhancement) Add
sampler
wrapper for one-dimensional ranges, for random sampling from ranges usingrand
(MLJBase #213) -
Change default value of
num_round
in XGBoost models from 1 to 100 (MLJModels PR #201)
Closed issues:
- Help with loading code on multiple processes for paralleled tuning of a pipeline (#440)
- Re-export CPU1, CPUProcesses, CPUThreads (#447)
- Taking loss functions seriously (#450)
- @pipeline to accept multiple Supervised models (#455)
- What parts of MLJBase should be reexported in MLJ (#462)
unpack
not working (#465)- Automatic Ensembling options (#466)
Merged pull requests:
v0.9.3
v0.9.2
MLJ v0.9.2
-
(enhancement) Update Tables requirement to "^1.0" (#444)
-
(new models) Add the pure-julia gradient boosted tree models from EvoTrees:
EvoRegressor
,EvoTreeCount
,EvoTreeGaussian
,EvoTreeCount
(#122) -
(documentation) Update README.md and some documentation errors
Closed issues:
- Implementing MLJ model interface for EvoTrees.jl (#122)
- Improve the tuning strategy interface (#315)
- Re-organizing the MLJ stack (#317)
- Add Tables 1.0 (#444)
Merged pull requests:
v0.9.1
MLJ v0.9.1
-
(enhancement) Enable dataset loading from OpenML using
OpenML.load(id)
. -
(documentation) Update the MLJ manual with missing measure docstrings; and to reflect use of MLJScientificTypes in place of ScientificTypes
-
(documentation - developers) Update manual to reflect split of MLJBase into MLJBase and MLJModelInterface
Closed issues:
- Evaluation error logs on loading model (#433)
v0.9.0
v0.8.0
-
(enhancement) MLJ now uses MLJTuning v0.1.1 to implement tuning. For the moment the only tuning strategy remains grid search, but expect this to improve soon with a new and improved tuning strategy interface for developers.
-
(breaking) The
Grid
tuning strategy no longer has theacceleration
hyperparameter, as computational resources (distributed computing / multithreading) is now declared in theTunedModel
constructor (see below). AGrid
search now generates models for evaluation in a random order unlessshuffle=false
is specified. (To simulate a random search, use a high resolution but use a reduced value ofn
.) One can no longer specify a dictionary of resolutions keyed on model hyperparameter name. The specified globalresolution
is now overriden by specifying hyperparameter-specific resolutions in aTunedModel
'srange
object. For details query?Grid
. -
(enhancement) One can now specify a goal for the total number of grid points with
Grid(goal=...)
, in which case global and hyperparameter-specific resolutions are ignored. -
(breaking) The form of reports generated by fitting a machine bound to
TunedModel
have changed. Query?TunedModel
for details. What was previously obtained withreport(mach)
is now obtained usingreport(mach).plotting
. In the case of a grid search, there is alsoreport(mach).history
. -
(enhancement) Tuning is now conceptualised as an iterative procedure in all cases. In the
TunedModel
constructor, one may now optionally specify the number of models to be searched withn=...
. This overrides a default number determined by the particular tuning strategy. Increasing this parameter and refitting a machine bound to aTunedModel
does not trigger a new search for the optimal hyperparemeters, but restarts the search from where it left off. (In the future this will allow for external control of tuning, including the saving of intermediate results.) -
(breaking)
learning_curve!
has been renamedlearning_curve
as it is non-mutating. The old name is retained for backwards compatibility. One may no longer specifyn=...
to generate multiple learning curves. Rather, for reproducibility and parallizability, one must instead: (i) pass the name of the model RNG hyperparameter field withrng_name=...
; and (ii) a list of unique RNG's, one for each curve, as inrngs=[MersenneTwister(1), MersenneTwister(42)]
(for two curves). Alternatively, RNGs can be automatically generated by specifying an integer, as inrngs=2
. Query?learning_curve
for details. -
(enhancement)
learning_curve
now has anacceleration
key-word argument for distributing the generation of multiple learning curves, and sample weights can be passed tolearning _curve
.