Skip to content

Releases: JuliaAI/MLJ.jl

v0.11.0

24 Apr 20:05
3936fd2
Compare
Choose a tag to compare

MLJ v0.11.0

Diff since v0.10.3

Make compatibility updates to MLJBase and MLJModels to effect the following changes to MLJ (see the linked release notes for links to the issues/PRs)):

  • (new model) Add LightGBM models LightGBMClassifier and
    LightGBMRegressor

  • (new model) Add new built-in model, ContinuousEncoder, for
    transforming all features of a table to Continuous scitype,
    dropping any features that cannot be so transformed

  • (new model) Add ParallelKMeans model, KMeans, loaded with
    @load KMeans pkg=ParallelKMeans

  • (mildly breaking enhancement) Arrange for the CV
    resampling strategyto spread fold "remainders" evenly among folds in
    train_test_pairs(::CV, ...) (a small change only noticeable in
    small datasets)

  • (breaking) Restyle report and fitted_params for exported
    learning networks (e.g., pipelines) to include a dictionary of reports or
    fitted_params, keyed on the machines in the underlying learning
    network. New doc-strings detail the new behaviour.

  • (enhancement) Allow calling of transform on machines with Static models without
    first calling fit!

  • Allow machine constructor to work on supervised models that take nothing for
    the input features X (for models that simply fit a
    sampler/distribution to the target data y) (#51)

Also:

  • (documentation) In the "Adding New Models for General Use"
    section of the manual, add detail on how to wrap unsupervised
    models, as well as models that fit a sampler/distribution to data

  • (documentation) Expand the "Transformers" sections of the
    manual, including more material on static transformers and
    transformers that implement predict (#393)

Closed issues:

  • Add tuning by stochastic search (#37)
  • Improve documentation around static transformers (#393)
  • Error in docs for model search (#478)
  • Update [compat] StatsBase="^0.32,^0.33" (#481)
  • For a 0.10.3 release (#483)
  • Help with coercing strings for binary data into Continuous variables (#489)
  • EvoTree Error (#490)
  • Add info with workaround to avoid MKL error (#491)
  • LogisticClassifier pkg = MLJLinearModels computes a number of coefficients but not the same number of mean_and_std_given_feature (#492)
  • MethodError: no method matching... (#493)
  • For a 0.10.4 release (#495)
  • Error: fitted_params(LogisticModel) (#498)

Merged pull requests:

v0.10.3

03 Apr 18:05
7ff111c
Compare
Choose a tag to compare

MLJ v0.10.3

Diff since v0.10.2

  • Allow MLJ to use StatsBase v0.33 (PR #484 , #481)

  • Enable use of RandomSearch tuning strategy (PR #482, #37)

Merged pull requests:

  • Enable hyper-parameter tuning using random search (#482) (@ablaom)
  • Extend [compat] StatsBase = "^0.32,^0.33" (#484) (@ablaom)
  • For a 0.10.3 release (#485) (@ablaom)

v0.10.2

25 Mar 09:05
5c2bed1
Compare
Choose a tag to compare

MLJ v0.10.2

Diff since v0.10.1

  • Extend [compat] Distributions = "^0.21,^0.22,^0.23"

  • Minor doc fixes

Closed issues:

  • Task design discussion (#166)
  • Non-normalized versions of measures (#445)
  • Overload model traits to work on the named-tuple "proxies" for models listed by models() (#464)
  • Multiprocess issue (#468)
  • Julia v1.4.0 is downloading MLJ v0.2.3 instead of MLJ v0.10.1 (#476)

Merged pull requests:

v0.10.1

14 Mar 21:07
4f229c1
Compare
Choose a tag to compare

MLJ v0.10.1

Diff since v0.10.0

(enhancement) Add serialization for machines. Serialization is model-specific, with a fallback implementation using JLSO. The user serializes with MLJBase.save(path, mach) and de-serializes with machine(path) (#138, #292)

Closed issues:

  • Adhere by Invenia's bluestyle (#434)
  • Update list of scikitlearn models in readme table. (#469)

Merged pull requests:

v0.10.0

11 Mar 11:07
c369a90
Compare
Choose a tag to compare

MLJ v0.10.0

Diff since v0.9.3

Upgrade to MLJBase 0.12.0 and MLJModels 0.9.0 to effect the following changes:

  • (breaking) suppress normalisation of measure weights (MLJBase PR #208)

  • (breaking) Shift the optional rng argument of iterator to first position (MLJBase #215)

  • (mildly breaking) Let all models (supervised and unsupervised) share a common set of traits. So, for example, unsupervised models now have the target_scitype trait (usually taking the value Unknown). For a list of the common traits, do models()[1] |> keys |> collect (JuliaAI/MLJBase.jl#163).

  • (enhancement) Add sampler wrapper for one-dimensional ranges, for random sampling from ranges using rand (MLJBase #213)

  • Change default value of num_round in XGBoost models from 1 to 100 (MLJModels PR #201)

Closed issues:

  • Help with loading code on multiple processes for paralleled tuning of a pipeline (#440)
  • Re-export CPU1, CPUProcesses, CPUThreads (#447)
  • Taking loss functions seriously (#450)
  • @pipeline to accept multiple Supervised models (#455)
  • What parts of MLJBase should be reexported in MLJ (#462)
  • unpack not working (#465)
  • Automatic Ensembling options (#466)

Merged pull requests:

v0.9.3

29 Feb 21:06
ed04561
Compare
Choose a tag to compare

MLJ v0.9.3

Diff since v0.9.2

Merged pull requests:

v0.9.2

26 Feb 03:06
6c6d53f
Compare
Choose a tag to compare

MLJ v0.9.2

  • (enhancement) Update Tables requirement to "^1.0" (#444)

  • (new models) Add the pure-julia gradient boosted tree models from EvoTrees: EvoRegressor, EvoTreeCount, EvoTreeGaussian, EvoTreeCount (#122)

  • (documentation) Update README.md and some documentation errors

Diff since v0.9.1

Closed issues:

  • Implementing MLJ model interface for EvoTrees.jl (#122)
  • Improve the tuning strategy interface (#315)
  • Re-organizing the MLJ stack (#317)
  • Add Tables 1.0 (#444)

Merged pull requests:

v0.9.1

14 Feb 06:07
d7d189e
Compare
Choose a tag to compare

MLJ v0.9.1

Diff since v0.9.0

  • (enhancement) Enable dataset loading from OpenML using OpenML.load(id).

  • (documentation) Update the MLJ manual with missing measure docstrings; and to reflect use of MLJScientificTypes in place of ScientificTypes

  • (documentation - developers) Update manual to reflect split of MLJBase into MLJBase and MLJModelInterface

Closed issues:

  • Evaluation error logs on loading model (#433)

v0.9.0

12 Feb 15:00
3573d8a
Compare
Choose a tag to compare
Minor release with the light interface (#439)

v0.8.0

04 Feb 03:57
v0.8.0
6cb73fa
Compare
Choose a tag to compare
  • (enhancement) MLJ now uses MLJTuning v0.1.1 to implement tuning. For the moment the only tuning strategy remains grid search, but expect this to improve soon with a new and improved tuning strategy interface for developers.

  • (breaking) The Grid tuning strategy no longer has the acceleration hyperparameter, as computational resources (distributed computing / multithreading) is now declared in the TunedModel constructor (see below). A Grid search now generates models for evaluation in a random order unless shuffle=false is specified. (To simulate a random search, use a high resolution but use a reduced value of n.) One can no longer specify a dictionary of resolutions keyed on model hyperparameter name. The specified global resolution is now overriden by specifying hyperparameter-specific resolutions in aTunedModel's range object. For details query ?Grid.

  • (enhancement) One can now specify a goal for the total number of grid points with Grid(goal=...), in which case global and hyperparameter-specific resolutions are ignored.

  • (breaking) The form of reports generated by fitting a machine bound to TunedModel have changed. Query ?TunedModel for details. What was previously obtained with report(mach) is now obtained using report(mach).plotting. In the case of a grid search, there is also report(mach).history.

  • (enhancement) Tuning is now conceptualised as an iterative procedure in all cases. In the TunedModel constructor, one may now optionally specify the number of models to be searched withn=.... This overrides a default number determined by the particular tuning strategy. Increasing this parameter and refitting a machine bound to a TunedModel does not trigger a new search for the optimal hyperparemeters, but restarts the search from where it left off. (In the future this will allow for external control of tuning, including the saving of intermediate results.)

  • (breaking) learning_curve! has been renamed learning_curve as it is non-mutating. The old name is retained for backwards compatibility. One may no longer specify n=... to generate multiple learning curves. Rather, for reproducibility and parallizability, one must instead: (i) pass the name of the model RNG hyperparameter field with rng_name=...; and (ii) a list of unique RNG's, one for each curve, as in rngs=[MersenneTwister(1), MersenneTwister(42)] (for two curves). Alternatively, RNGs can be automatically generated by specifying an integer, as in rngs=2. Query ?learning_curve for details.

  • (enhancement) learning_curve now has an acceleration key-word argument for distributing the generation of multiple learning curves, and sample weights can be passed to learning _curve.