Skip to content

Commit

Permalink
Merge pull request #925 from alan-turing-institute/dev
Browse files Browse the repository at this point in the history
For a 0.18 release
  • Loading branch information
ablaom committed Apr 7, 2022
2 parents 73c474f + 0e4c051 commit 14fae0b
Show file tree
Hide file tree
Showing 17 changed files with 127 additions and 170 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ on:
branches:
- master
- dev
- for-a-0-point-18-release
push:
branches:
- master
Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ channel](https://julialang.org/slack/), #MLJ.

- [Code organization](ORGANIZATION.md)

- Issues: Currently issues are split between [MLJ issues](https://github.com/alan-turing-institute/MLJ.jl/issues) and issues in all other repositories, collected in [this GitHub Project](https://github.com/orgs/JuliaAI/projects/1).


### Conventions

Expand Down
12 changes: 5 additions & 7 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MLJ"
uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>"]
version = "0.17.3"
version = "0.18.0"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand All @@ -13,7 +13,6 @@ MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJEnsembles = "50ed68f4-41fd-4504-931a-ed422449fee0"
MLJIteration = "614be32b-d00c-4edb-bd02-1eb411ab5e55"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
MLJSerialization = "17bed46d-0ab5-4cd4-b792-a5c4b8547c6d"
MLJTuning = "03970b2e-30c4-11ea-3135-d1576263f10f"
OpenML = "8b6db2d4-7670-4922-a472-f9537c81ab66"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
Expand All @@ -28,12 +27,11 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
CategoricalArrays = "0.8,0.9, 0.10"
ComputationalResources = "0.3"
Distributions = "0.21,0.22,0.23, 0.24, 0.25"
MLJBase = "0.19.4"
MLJEnsembles = "0.2"
MLJIteration = "0.4"
MLJBase = "0.20"
MLJEnsembles = "0.3"
MLJIteration = "0.5"
MLJModels = "0.15"
MLJSerialization = "1.1"
MLJTuning = "0.6"
MLJTuning = "0.7"
OpenML = "0.2"
ProgressMeter = "1.1"
ScientificTypes = "3"
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ written in Julia and other languages.
**Integrating an existing machine learning model into the MLJ
framework?** Start [here](https://alan-turing-institute.github.io/MLJ.jl/dev/quick_start_guide_to_adding_models/).

**Wanting to contribute?** Start [here](CONTRIBUTING.md).

**PhD and Postdoc opportunies** See [here](https://sebastian.vollmer.ms/jobs/).

MLJ was initially created as a Tools, Practices and Systems project at
the [Alan Turing Institute](https://www.turing.ac.uk/)
in 2019. Current funding is provided by a [New Zealand Strategic
Expand Down
9 changes: 4 additions & 5 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ MLJLinearModels = "6ee0df7b-362f-4a72-a706-9e79364fb692"
MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
MLJMultivariateStatsInterface = "1b6a4a23-ba22-4f51-9698-8599985d3728"
MLJSerialization = "17bed46d-0ab5-4cd4-b792-a5c4b8547c6d"
MLJTuning = "03970b2e-30c4-11ea-3135-d1576263f10f"
Missings = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
NearestNeighborModels = "636a865e-7cf4-491e-846c-de09b730eb36"
Expand All @@ -32,11 +31,11 @@ TypedTables = "9d95f2ec-7b3d-5a63-8d20-e2491e220bb9"
[compat]
CategoricalDistributions = "0.1"
Documenter = "0.26"
MLJBase = "0.19"
MLJEnsembles = "0.2"
MLJIteration = "0.4"
MLJBase = "0.20"
MLJEnsembles = "0.3"
MLJIteration = "0.5"
MLJModels = "0.15"
MLJTuning = "0.6.5"
MLJTuning = "0.7"
ScientificTypes = "3"
ScientificTypesBase = "3"
julia = "1.6"
2 changes: 0 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ using MLJ
import MLJIteration
import IterationControl
import EarlyStopping
import MLJSerialization
import MLJBase
import MLJTuning
import MLJModels
Expand Down Expand Up @@ -86,7 +85,6 @@ makedocs(
MLJModelInterface,
ScientificTypesBase,
MLJIteration,
MLJSerialization,
EarlyStopping,
IterationControl,
CategoricalDistributions],
Expand Down
125 changes: 39 additions & 86 deletions docs/src/adding_models_for_general_use.md
Original file line number Diff line number Diff line change
Expand Up @@ -830,16 +830,20 @@ MMI.is_pure_julia(::Type{<:DecisionTreeClassifier}) = true
Alternatively these traits can also be declared using `MMI.metadata_pkg` and `MMI.metadata_model` helper functions as:

```julia
MMI.metadata_pkg(DecisionTreeClassifier,
name="DecisionTree",
packge_uuid="7806a523-6efd-50cb-b5f6-3fa6f1930dbb",
package_url="https://github.com/bensadeghi/DecisionTree.jl",
is_pure_julia=true)

MMI.metadata_model(DecisionTreeClassifier,
input_scitype=MMI.Table(MMI.Continuous),
target_scitype=AbstractVector{<:MMI.Finite},
load_path="MLJDecisionTreeInterface.DecisionTreeClassifier")
MMI.metadata_pkg(
DecisionTreeClassifier,
name="DecisionTree",
packge_uuid="7806a523-6efd-50cb-b5f6-3fa6f1930dbb",
package_url="https://github.com/bensadeghi/DecisionTree.jl",
is_pure_julia=true
)

MMI.metadata_model(
DecisionTreeClassifier,
input_scitype=MMI.Table(MMI.Continuous),
target_scitype=AbstractVector{<:MMI.Finite},
load_path="MLJDecisionTreeInterface.DecisionTreeClassifier"
)
```

*Important.* Do not omit the `load_path` specification. If unsure what
Expand Down Expand Up @@ -1057,108 +1061,57 @@ controlled by a hyper-parameter `alpha` is given

### Serialization

!!! warning "Experimental"

The following API is experimental. It is subject to breaking changes during minor or major releases without warning.

The MLJ user can serialize and deserialize a *machine*, which means
serializing/deserializing:
!!! warning "New in MLJBase 0.20"

- the associated `Model` object (storing hyperparameters)
- the `fitresult` (learned parameters)
- the `report` generating during training
The following API is incompatible with versions of MLJBase < 0.20, even for model implementations compatible with MLJModelInterface 1^

These are bundled into a single file or `IO` stream specified by the
user using the package `JLSO`. There are two scenarios in which a new
MLJ model API implementation will want to overload two additional
methods `save` and `restore` to support serialization:

1. The algorithm-providing package already has it's own serialization format for learned parameters and/or hyper-parameters, which users may want to access. In that case *the implementation overloads* `save`.
This section may be occasionally relevant when wrapping models
implemented in languages other than Julia.

2. The `fitresult` is not a sufficiently persistent object; for example, it is a pointer passed from wrapped C code. In that case *the implementation overloads* `save` *and* `restore`.
The MLJ user can serialize and deserialize machines, as she would any
other julia object. (This user has the option of first removing data
from the machine. See [Saving machines](@ref) for details.) However, a
problem can occur if a model's `fitresult` (see [The fit
method](@ref)) is not a persistent object. For example, it might be a
C pointer that would have no meaning in a new Julia session.

In case 2, 1 presumably applies also, for otherwise MLJ serialization
is probably not going to be possible without changes to the
algorithm-providing package. An example is given below.

Note that in case 1, MLJ will continue to create it's own
self-contained serialization of the machine. Below `filename` refers
to the corresponding serialization file name, as specified by the
user, but with any final extension (e.g., ".jlso", ".gz") removed. If
the user has alternatively specified an `IO` object for serialization,
then `filename` is a randomly generated numeric string.
If that is the case a model implementation needs to implement a `save`
and `restore` method for switching between a `fitresult` and some
persistent, serializable representation of that result.


#### The save method

```julia
MMI.save(filename, model::SomeModel, fitresult; kwargs...) -> serializable_fitresult
MMI.save(model::SomeModel, fitresult; kwargs...) -> serializable_fitresult
```

Implement this method to serialize using a format specific to models
of type `SomeModel`. The `fitresult` is the first return value of
`MMI.fit` for such model types; `kwargs` is a list of keyword
arguments specified by the user and understood to relate to a some
model-specific serialization (cannot be `format=...` or
`compression=...`). The value of `serializable_fitresult` should be a
persistent representation of `fitresult`, from which a correct and
valid `fitresult` can be reconstructed using `restore` (see
below).
Implement this method to return a persistent serializable
representation of the `fitresult` component of the `MMI.fit` return
value.below).

The fallback of `save` performs no action and returns `fitresult`.


#### The restore method

```julia
MMI.restore(filename, model::SomeModel, serializable_fitresult) -> fitresult
MMI.restore(model::SomeModel, serializable_fitresult) -> fitresult
```

Implement this method to reconstruct a `fitresult` (as returned by
Implement this method to reconstruct a valid `fitresult` (as would be returned by
`MMI.fit`) from a persistent representation constructed using
`MMI.save` as described above.

The fallback of `restore` returns `serializable_fitresult`.

#### Example
The fallback of `restore` performs no action and returns `serializable_fitresult`.

Below is an example drawn from MLJ's XGBoost wrapper. In this example
the `fitresult` returned by `MMI.fit` is a tuple `(booster,
a_target_element)` where `booster` is the `XGBoost.jl` object storing
the learned parameters (essentially a pointer to some object created
by C code) and `a_target_element` is an ordinary `CategoricalValue`
used to track the target classes (a persistent object, requiring no
special treatment).

```julia
function MLJModelInterface.save(filename,
::XGBoostClassifier,
fitresult;
kwargs...)
booster, a_target_element = fitresult

xgb_filename = string(filename, ".xgboost.model")
XGBoost.save(booster, xgb_filename)
persistent_booster = read(xgb_filename)
@info "Additional XGBoost serialization file \"$xgb_filename\" generated. "
return (persistent_booster, a_target_element)
end
#### Example

function MLJModelInterface.restore(filename,
::XGBoostClassifier,
serializable_fitresult)
persistent_booster, a_target_element = serializable_fitresult
For an example, refer to the model implementations at
[MLJXGBoostInterface.jl](https://github.com/JuliaAI/MLJXGBoostInterface.jl/blob/master/src/MLJXGBoostInterface.jl)

xgb_filename = string(filename, ".tmp")
open(xgb_filename, "w") do file
write(file, persistent_booster)
end
booster = XGBoost.Booster(model_file=xgb_filename)
rm(xgb_filename)
fitresult = (booster, a_target_element)
return fitresult
end
```

### Document strings

Expand All @@ -1170,8 +1123,8 @@ as a checklist. Here are examples of compliant doc-strings (go to the
end of the linked files):

- Regular supervised models (classifiers and regressors): [MLJDecisionTreeInterface.jl](https://github.com/JuliaAI/MLJDecisionTreeInterface.jl/blob/master/src/MLJDecisionTreeInterface.jl) (see the end of the file)
- Tranformers: [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/dev/src/builtins/Transformers.jl)

- Tranformers: [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/dev/src/builtins/Transformers.jl)

A utility function is available for generating a standardized header
for your doc-strings (but you provide most detail by hand):
Expand Down
6 changes: 6 additions & 0 deletions docs/src/common_mlj_workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,12 @@ info("PCA")
info("RidgeRegressor", pkg="MultivariateStats") # a model type in multiple packages
```

Extracting the model document string:

```@example```
doc("DecisionTreeClassifier", pkg="DecisionTree")
```
## Instantiating a model
*Reference:* [Getting Started](@ref), [Loading Model Code](@ref)
Expand Down
7 changes: 2 additions & 5 deletions docs/src/controlling_iterative_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ control | description
[`WithReportDo`](@ref MLJIteration.WithReportDo)`(f->e->@info("report: $e))`| Call `f(r)` where `r` is the training machine report | yes
[`WithModelDo`](@ref MLJIteration.WithModelDo)`(f->m->@info("model: $m))`| Call `f(m)` where `m` is the model, which may be mutated by `f` | yes
[`WithMachineDo`](@ref MLJIteration.WithMachineDo)`(f->mach->@info("report: $mach))`| Call `f(mach)` wher `mach` is the training machine in its current state | yes
[`Save`](@ref MLJSerialization.Save)`(filename="machine.jlso")`|Save current training machine to `machine1.jlso`, `machine2.jslo`, etc | yes
[`Save`](@ref MLJIteration.Save)`(filename="machine.jlso")`|Save current training machine to `machine1.jlso`, `machine2.jslo`, etc | yes

> Table 1. Atomic controls. Some advanced options omitted.
Expand All @@ -125,9 +125,6 @@ control | description
"Early Stopping - But When?", in *Neural Networks: Tricks of the
Trade*, ed. G. Orr, Springer.

⋆ If using `MLJIteration` without `MLJ`, then `Save` is not available
unless one is also using `MLJSerialization`.

**Stopping option.** All the following controls trigger a stop if the
provided function `f` returns `true` and `stop_if_true=true` is
specified in the constructor: `Callback`, `WithNumberDo`,
Expand Down Expand Up @@ -438,7 +435,7 @@ MLJIteration.WithFittedParamsDo
MLJIteration.WithReportDo
MLJIteration.WithModelDo
MLJIteration.WithMachineDo
MLJSerialization.Save
MLJIteration.Save
```

### Control wrappers
Expand Down
2 changes: 1 addition & 1 deletion docs/src/julia_blogpost.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ composition.

- Video from [London Julia User Group meetup in March 2019](https://www.youtube.com/watch?v=CfHkjNmj1eE) (skip to [demo at 21'39](https://youtu.be/CfHkjNmj1eE?t=21m39s)) &nbsp;

- [MLJ Tutorials](https://JuliaAI.github.io/MLJTutorials/)
- [Learning MLJ](@ref)

- Implementing the MLJ interface for a [new model](https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/)

Expand Down

0 comments on commit 14fae0b

Please sign in to comment.