Merge pull request #925 from alan-turing-institute/dev

For a 0.18 release
JuliaAI · Apr 7, 2022 · 14fae0b · 14fae0b
2 parents 73c474f + 0e4c051
commit 14fae0b
Show file tree

Hide file tree

Showing 17 changed files with 127 additions and 170 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -4,6 +4,7 @@ on:
     branches:
       - master
       - dev
+      - for-a-0-point-18-release
   push:
     branches:
       - master

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -8,6 +8,8 @@ channel](https://julialang.org/slack/), #MLJ.
 
 - [Code organization](ORGANIZATION.md)
 
+- Issues: Currently issues are split between [MLJ issues](https://github.com/alan-turing-institute/MLJ.jl/issues) and issues in all other repositories, collected in [this GitHub Project](https://github.com/orgs/JuliaAI/projects/1).
+
 
 ### Conventions
 

diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "MLJ"
 uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
 authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>"]
-version = "0.17.3"
+version = "0.18.0"
 
 [deps]
 CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
@@ -13,7 +13,6 @@ MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
 MLJEnsembles = "50ed68f4-41fd-4504-931a-ed422449fee0"
 MLJIteration = "614be32b-d00c-4edb-bd02-1eb411ab5e55"
 MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
-MLJSerialization = "17bed46d-0ab5-4cd4-b792-a5c4b8547c6d"
 MLJTuning = "03970b2e-30c4-11ea-3135-d1576263f10f"
 OpenML = "8b6db2d4-7670-4922-a472-f9537c81ab66"
 Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
@@ -28,12 +27,11 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
 CategoricalArrays = "0.8,0.9, 0.10"
 ComputationalResources = "0.3"
 Distributions = "0.21,0.22,0.23, 0.24, 0.25"
-MLJBase = "0.19.4"
-MLJEnsembles = "0.2"
-MLJIteration = "0.4"
+MLJBase = "0.20"
+MLJEnsembles = "0.3"
+MLJIteration = "0.5"
 MLJModels = "0.15"
-MLJSerialization = "1.1"
-MLJTuning = "0.6"
+MLJTuning = "0.7"
 OpenML = "0.2"
 ProgressMeter = "1.1"
 ScientificTypes = "3"

diff --git a/README.md b/README.md
@@ -36,6 +36,10 @@ written in Julia and other languages.
 **Integrating an existing machine learning model into the MLJ
 framework?** Start [here](https://alan-turing-institute.github.io/MLJ.jl/dev/quick_start_guide_to_adding_models/).
 
+**Wanting to contribute?** Start [here](CONTRIBUTING.md). 
+
+**PhD and Postdoc opportunies** See [here](https://sebastian.vollmer.ms/jobs/).
+
 MLJ was initially created as a Tools, Practices and Systems project at
 the [Alan Turing Institute](https://www.turing.ac.uk/)
 in 2019. Current funding is provided by a [New Zealand Strategic

diff --git a/docs/Project.toml b/docs/Project.toml
@@ -19,7 +19,6 @@ MLJLinearModels = "6ee0df7b-362f-4a72-a706-9e79364fb692"
 MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
 MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
 MLJMultivariateStatsInterface = "1b6a4a23-ba22-4f51-9698-8599985d3728"
-MLJSerialization = "17bed46d-0ab5-4cd4-b792-a5c4b8547c6d"
 MLJTuning = "03970b2e-30c4-11ea-3135-d1576263f10f"
 Missings = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
 NearestNeighborModels = "636a865e-7cf4-491e-846c-de09b730eb36"
@@ -32,11 +31,11 @@ TypedTables = "9d95f2ec-7b3d-5a63-8d20-e2491e220bb9"
 [compat]
 CategoricalDistributions = "0.1"
 Documenter = "0.26"
-MLJBase = "0.19"
-MLJEnsembles = "0.2"
-MLJIteration = "0.4"
+MLJBase = "0.20"
+MLJEnsembles = "0.3"
+MLJIteration = "0.5"
 MLJModels = "0.15"
-MLJTuning = "0.6.5"
+MLJTuning = "0.7"
 ScientificTypes = "3"
 ScientificTypesBase = "3"
 julia = "1.6"
diff --git a/docs/make.jl b/docs/make.jl
@@ -8,7 +8,6 @@ using MLJ
 import MLJIteration
 import IterationControl
 import EarlyStopping
-import MLJSerialization
 import MLJBase
 import MLJTuning
 import MLJModels
@@ -86,7 +85,6 @@ makedocs(
                 MLJModelInterface,
                 ScientificTypesBase,
                 MLJIteration,
-                MLJSerialization,
                 EarlyStopping,
                 IterationControl,
                 CategoricalDistributions],

diff --git a/docs/src/adding_models_for_general_use.md b/docs/src/adding_models_for_general_use.md
@@ -830,16 +830,20 @@ MMI.is_pure_julia(::Type{<:DecisionTreeClassifier}) = true
 Alternatively these traits can also be declared using `MMI.metadata_pkg` and `MMI.metadata_model` helper functions as:
 
 ```julia
-MMI.metadata_pkg(DecisionTreeClassifier,
-                 name="DecisionTree",
-                 packge_uuid="7806a523-6efd-50cb-b5f6-3fa6f1930dbb",
-                 package_url="https://github.com/bensadeghi/DecisionTree.jl",
-                 is_pure_julia=true)
-
-MMI.metadata_model(DecisionTreeClassifier,
-                   input_scitype=MMI.Table(MMI.Continuous),
-                   target_scitype=AbstractVector{<:MMI.Finite},
-                   load_path="MLJDecisionTreeInterface.DecisionTreeClassifier")
+MMI.metadata_pkg(
+  DecisionTreeClassifier,
+  name="DecisionTree",
+  packge_uuid="7806a523-6efd-50cb-b5f6-3fa6f1930dbb",
+  package_url="https://github.com/bensadeghi/DecisionTree.jl",
+  is_pure_julia=true
+)
+
+MMI.metadata_model(
+  DecisionTreeClassifier,
+  input_scitype=MMI.Table(MMI.Continuous),
+  target_scitype=AbstractVector{<:MMI.Finite},
+  load_path="MLJDecisionTreeInterface.DecisionTreeClassifier"
+)
 ```
 
 *Important.* Do not omit the `load_path` specification. If unsure what
@@ -1057,108 +1061,57 @@ controlled by a hyper-parameter `alpha` is given
 
 ### Serialization
 
-!!! warning "Experimental"
-
-	The following API is experimental. It is subject to breaking changes during minor or major releases without warning.
-
-The MLJ user can serialize and deserialize a *machine*, which means
-serializing/deserializing:
+!!! warning "New in MLJBase 0.20"
 
-- the associated `Model` object (storing hyperparameters)
-- the `fitresult` (learned parameters)
-- the `report` generating during training
+	The following API is incompatible with versions of MLJBase < 0.20, even for model implementations compatible with MLJModelInterface 1^
 
-These are bundled into a single file or `IO` stream specified by the
-user using the package `JLSO`. There are two scenarios in which a new
-MLJ model API implementation will want to overload two additional
-methods `save` and `restore` to support serialization:
 
-1. The algorithm-providing package already has it's own serialization format for learned parameters and/or hyper-parameters, which users may want to access. In that case *the implementation overloads* `save`.
+This section may be occasionally relevant when wrapping models
+implemented in languages other than Julia.
 
-2. The `fitresult` is not a sufficiently persistent object; for example, it is a pointer passed from wrapped C code. In that case *the implementation overloads* `save` *and* `restore`.
+The MLJ user can serialize and deserialize machines, as she would any
+other julia object. (This user has the option of first removing data
+from the machine. See [Saving machines](@ref) for details.) However, a
+problem can occur if a model's `fitresult` (see [The fit
+method](@ref)) is not a persistent object. For example, it might be a
+C pointer that would have no meaning in a new Julia session.
 
-In case 2, 1 presumably applies also, for otherwise MLJ serialization
-is probably not going to be possible without changes to the
-algorithm-providing package. An example is given below.
-
-Note that in case 1, MLJ will continue to create it's own
-self-contained serialization of the machine. Below `filename` refers
-to the corresponding serialization file name, as specified by the
-user, but with any final extension (e.g., ".jlso", ".gz") removed. If
-the user has alternatively specified an `IO` object for serialization,
-then `filename` is a randomly generated numeric string.
+If that is the case a model implementation needs to implement a `save`
+and `restore` method for switching between a `fitresult` and some
+persistent, serializable representation of that result.
 
 
 #### The save method
 
 ```julia
-MMI.save(filename, model::SomeModel, fitresult; kwargs...) -> serializable_fitresult
+MMI.save(model::SomeModel, fitresult; kwargs...) -> serializable_fitresult
 ```
 
-Implement this method to serialize using a format specific to models
-of type `SomeModel`. The `fitresult` is the first return value of
-`MMI.fit` for such model types; `kwargs` is a list of keyword
-arguments specified by the user and understood to relate to a some
-model-specific serialization (cannot be `format=...` or
-`compression=...`). The value of `serializable_fitresult` should be a
-persistent representation of `fitresult`, from which a correct and
-valid `fitresult` can be reconstructed using `restore` (see
-below).
+Implement this method to return a persistent serializable
+representation of the `fitresult` component of the `MMI.fit` return
+value.below).
 
 The fallback of `save` performs no action and returns `fitresult`.
 
 
 #### The restore method
 
 ```julia
-MMI.restore(filename, model::SomeModel, serializable_fitresult) -> fitresult
+MMI.restore(model::SomeModel, serializable_fitresult) -> fitresult
 ```
 
-Implement this method to reconstruct a `fitresult` (as returned by
+Implement this method to reconstruct a valid `fitresult` (as would be returned by
 `MMI.fit`) from a persistent representation constructed using
 `MMI.save` as described above.
 
-The fallback of `restore` returns `serializable_fitresult`.
-
-#### Example
+The fallback of `restore` performs no action and returns `serializable_fitresult`.
 
-Below is an example drawn from MLJ's XGBoost wrapper. In this example
-the `fitresult` returned by `MMI.fit` is a tuple `(booster,
-a_target_element)` where `booster` is the `XGBoost.jl` object storing
-the learned parameters (essentially a pointer to some object created
-by C code) and `a_target_element` is an ordinary `CategoricalValue`
-used to track the target classes (a persistent object, requiring no
-special treatment).
 
-```julia
-function MLJModelInterface.save(filename,
-								::XGBoostClassifier,
-								fitresult;
-								kwargs...)
-	booster, a_target_element = fitresult
-
-	xgb_filename = string(filename, ".xgboost.model")
-	XGBoost.save(booster, xgb_filename)
-	persistent_booster = read(xgb_filename)
-	@info "Additional XGBoost serialization file \"$xgb_filename\" generated. "
-	return (persistent_booster, a_target_element)
-end
+#### Example
 
-function MLJModelInterface.restore(filename,
-								   ::XGBoostClassifier,
-								   serializable_fitresult)
-	persistent_booster, a_target_element = serializable_fitresult
+For an example, refer to the model implementations at
+[MLJXGBoostInterface.jl](https://github.com/JuliaAI/MLJXGBoostInterface.jl/blob/master/src/MLJXGBoostInterface.jl)
 
-	xgb_filename = string(filename, ".tmp")
-	open(xgb_filename, "w") do file
-		write(file, persistent_booster)
-	end
-	booster = XGBoost.Booster(model_file=xgb_filename)
-	rm(xgb_filename)
-	fitresult = (booster, a_target_element)
-	return fitresult
-end
-```
 
 ### Document strings
 
@@ -1170,8 +1123,8 @@ as a checklist. Here are examples of compliant doc-strings (go to the
 end of the linked files):
 
 - Regular supervised models (classifiers and regressors): [MLJDecisionTreeInterface.jl](https://github.com/JuliaAI/MLJDecisionTreeInterface.jl/blob/master/src/MLJDecisionTreeInterface.jl) (see the end of the file)
- 
-- Tranformers: [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/dev/src/builtins/Transformers.jl) 
+
+- Tranformers: [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/dev/src/builtins/Transformers.jl)
 
 A utility function is available for generating a standardized header
 for your doc-strings (but you provide most detail by hand):

diff --git a/docs/src/common_mlj_workflows.md b/docs/src/common_mlj_workflows.md
@@ -141,6 +141,12 @@ info("PCA")
 info("RidgeRegressor", pkg="MultivariateStats") # a model type in multiple packages
 ```
 
+Extracting the model document string:
+
+```@example```
+doc("DecisionTreeClassifier", pkg="DecisionTree")
+```
+
 ## Instantiating a model
 
 *Reference:*   [Getting Started](@ref), [Loading Model Code](@ref)

diff --git a/docs/src/controlling_iterative_models.md b/docs/src/controlling_iterative_models.md
@@ -116,7 +116,7 @@ control                                                        | description
 [`WithReportDo`](@ref MLJIteration.WithReportDo)`(f->e->@info("report: $e))`| Call `f(r)` where `r` is the training machine report                    | yes
 [`WithModelDo`](@ref MLJIteration.WithModelDo)`(f->m->@info("model: $m))`| Call `f(m)` where `m` is the model, which may be mutated by `f`             | yes
 [`WithMachineDo`](@ref MLJIteration.WithMachineDo)`(f->mach->@info("report: $mach))`| Call `f(mach)` wher `mach` is the training machine in its current state    | yes
-[`Save`](@ref MLJSerialization.Save)`(filename="machine.jlso")`| ⋆ Save current training machine to `machine1.jlso`, `machine2.jslo`, etc                         | yes
+[`Save`](@ref MLJIteration.Save)`(filename="machine.jlso")`|Save current training machine to `machine1.jlso`, `machine2.jslo`, etc                         | yes
 
 > Table 1. Atomic controls. Some advanced options omitted.
 
@@ -125,9 +125,6 @@ control                                                        | description
  "Early Stopping - But When?", in *Neural Networks: Tricks of the
  Trade*, ed. G. Orr, Springer.
 
-⋆ If using `MLJIteration` without `MLJ`, then `Save` is not available
-  unless one is also using `MLJSerialization`.
-
 **Stopping option.** All the following controls trigger a stop if the
 provided function `f` returns `true` and `stop_if_true=true` is
 specified in the constructor: `Callback`, `WithNumberDo`,
@@ -438,7 +435,7 @@ MLJIteration.WithFittedParamsDo
 MLJIteration.WithReportDo
 MLJIteration.WithModelDo
 MLJIteration.WithMachineDo
-MLJSerialization.Save
+MLJIteration.Save
 ```
 
 ### Control wrappers

diff --git a/docs/src/julia_blogpost.md b/docs/src/julia_blogpost.md
@@ -40,7 +40,7 @@ composition.
 
 - Video from [London Julia User Group meetup in March 2019](https://www.youtube.com/watch?v=CfHkjNmj1eE) (skip to [demo at 21'39](https://youtu.be/CfHkjNmj1eE?t=21m39s)) &nbsp;
 
-- [MLJ Tutorials](https://JuliaAI.github.io/MLJTutorials/)
+- [Learning MLJ](@ref)
 
 - Implementing the MLJ interface for a [new model](https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/)