Skip to content

Commit

Permalink
Merge pull request #229 from alan-turing-institute/dev
Browse files Browse the repository at this point in the history
Migration of code out to MLJBase and MLJModels; update to new versions of these
  • Loading branch information
ablaom committed Sep 11, 2019
2 parents c755afa + 2acd52f commit 99ef588
Show file tree
Hide file tree
Showing 58 changed files with 780 additions and 3,231 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
- |
julia --color=yes --project=docs/ -e'
using Pkg
Pkg.add(PackageSpec(name="MLJBase", rev="master"));Pkg.add(PackageSpec(name="MLJModels", rev="master"));Pkg.develop(PackageSpec(path=pwd()))
Pkg.pin(PackageSpec(name="Missings", version="0.4.1"));Pkg.add(PackageSpec(name="MLJBase", rev="master"));Pkg.add(PackageSpec(name="MLJModels", rev="master"));Pkg.develop(PackageSpec(path=pwd()))
Pkg.instantiate()
include("docs/make.jl")
'
5 changes: 2 additions & 3 deletions CONTRIBUTE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ channel](https://slackinvite.julialang.org), #MLJ.


- [List of presently implemented
models](src/Registry/Models.toml)
excluding built-in models. For full list, do `models()` in MLJ.
models](https://github.com/alan-turing-institute/MLJModels.jl/tree/master/src/registry/Models.toml). Or, do `using MLJ; models()`.

- [Enhancement requests](https://github.com/alan-turing-institute/MLJ.jl/issues?utf8=✓&q=is%3Aissue+is%3Aopen+label%3A%22enhancement%22)

Expand All @@ -17,7 +16,7 @@ While new model implementations are a priority at present, help adding
core functionality to MLJ is also welcome. If you are interested in
contributing, please read the this rest of this document. A guide to
implementing the MLJ interface for new models is
[here](docs/src/adding_models_for_general_use.md).
[here](https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/).


### Brief design overview
Expand Down
6 changes: 3 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MLJ"
uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>"]
version = "0.3.0"
version = "0.4.0"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand All @@ -26,8 +26,8 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
MLJBase = "0.4"
MLJModels = "0.3"
MLJBase = "0.5"
MLJModels = "0.4"
julia = "1"

[extras]
Expand Down
48 changes: 21 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A pure Julia machine learning framework.

[MLJ News](https://github.com/alan-turing-institute/MLJ.jl/blob/master/docs/src/NEWS.md)
[MLJ News](https://github.com/alan-turing-institute/MLJ.jl/blob/master/docs/src/NEWS.md) | [MLJ Cheatsheet](docs/src/mlj_cheatsheet.md)


## `join!(MLJ, YourModel)`
Expand All @@ -20,7 +20,7 @@ and the core team is happy to respond to [issue requests](https://github.com/ala
assistance. Please click [here](CONTRIBUTE.md) for more details on
contributing.

MLJ is presently supported by a small Alan Turing Institute grant and is looking for new funding sources to grow the project.
MLJ is presently supported by a small Alan Turing Institute grant and is looking for new funding sources to grow and maintain the project.

[![Build Status](https://travis-ci.com/alan-turing-institute/MLJ.jl.svg?branch=master)](https://travis-ci.com/alan-turing-institute/MLJ.jl)
[![Slack Channel mlj](https://img.shields.io/badge/chat-on%20slack-yellow.svg)](https://slackinvite.julialang.org/)
Expand All @@ -36,12 +36,8 @@ scientific programming language, [Julia](https://julialang.org).

The MLJ project is partly inspired by [MLR](https://mlr.mlr-org.com/index.html).

A list of models in external packages that can be used with MLJ:
[Models.toml](src/registry/Models.toml)

[MLJ Cheatsheet](docs/src/mlj_cheatsheet.md)


[List of presently implemented models](https://github.com/alan-turing-institute/MLJModels.jl/tree/master/src/registry/Models.toml)


### Installation

Expand All @@ -53,7 +49,7 @@ Pkg.add("MLJ")
Pkg.add("MLJModels")
```

To obtain a list of all registered models, keyed on package name:
To obtain a list of all registered models:

```julia
using MLJ
Expand Down Expand Up @@ -106,6 +102,8 @@ available.

- Option to tune hyperparameters using gradient descent and **automatic
differentiation** (for learning algorithms written in Julia).

- Option to tune hyperaparameters using **Bayesian optimisation**

- **Data agnostic**: Train models on any data supported by the Tables.jl
[interface](https://github.com/JuliaData/Tables.jl). &#10004;
Expand All @@ -114,52 +112,48 @@ available.
**learning networks** .&#10004;

- Learning networks can be exported as self-contained **composite models** &#10004;, but
common networks (e.g., linear pipelines, stacks) come ready to plug-and-play.
common networks (e.g., linear **pipelines** &#10004;, **stacks**) come ready to plug-and-play.

- Performant parallel implementation of large homogeneous **ensembles**
of arbitrary models (e.g., random forests). &#10004;

- **Task** interface matches machine learning problem to available models. &#10004;
- Model **registry** and facility to **match models** to machine learning
tasks. &#10004;

- **Benchmarking** a battery of assorted models for a given task.

- Automated estimates of cpu and memory requirements for given task/model.

- Friendly interface for handling **probabilistic** prediction. &#10004;


### Frequently Asked Questions

See [here](docs/src/frequently_asked_questions.md).


### Known issues

- The ScikitLearn SVM models will not work under Julia 1.0.3 but do work under Julia 1.1 due to [Issue #29208](https://github.com/JuliaLang/julia/issues/29208)

- When MLJRegistry is updated with new models you may need to force a new
precompilation of MLJ to make new models available.


### Getting started

Get started
[here](https://alan-turing-institute.github.io/MLJ.jl/dev/),
[here](https://alan-turing-institute.github.io/MLJ.jl/stable/),
or take the MLJ [tour](/examples/tour/tour.ipynb).


### History

Predecessors of the current package are
[AnalyticalEngine.jl](https://github.com/tlienart/AnalyticalEngine.jl)
and [Orchestra.jl](https://github.com/svs14/Orchestra.jl), and
[Koala.jl](https://github.com/ablaom/Koala.jl). Work
continued as a research study group at the University of Warwick,
Antecedents for the current package are
[AnalyticalEngine.jl](https://github.com/tlienart/AnalyticalEngine.jl),
[Orchestra.jl](https://github.com/svs14/Orchestra.jl), and
[Koala.jl](https://github.com/ablaom/Koala.jl). Development was also
guided by a research study group at the University of Warwick,
beginning with a review of existing ML Modules that were available in
Julia at the time ([in-depth](https://github.com/dominusmi/Julia-Machine-Learning-Review/tree/master/Educational),
Julia at the time
([in-depth](https://github.com/dominusmi/Julia-Machine-Learning-Review/tree/master/Educational),
[overview](https://github.com/dominusmi/Julia-Machine-Learning-Review/tree/master/Package%20Review)).

![alt text](material/packages.jpg)

Further work culminated in the first MLJ
[proof-of-concept](https://github.com/alan-turing-institute/MLJ.jl/tree/poc)

For administrators: [Applying requests to register new models](REGISTRY.md).
For administrators: [Implementing requests to register new models](REGISTRY.md).
26 changes: 14 additions & 12 deletions REGISTRY.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
# Instructions for updating the MLJ Model Registry

To register all the models in GreatNewPackage with MLJ:

- In a clone of the master branch of MLJ, change to the
`/src/registry/` directory and, in Julia, activate the environment
specified by the Project.toml there, after checking the [compat]
conditions thre are up to date.
- In a clone of the master branch of
[MLJModels](https://github.com/alan-turing-institute/MLJModels.jl),
change to the `/src/registry/` directory and, in Julia, activate the
environment specified by the Project.toml there, after checking the
[compat] conditions there are up to date.

- Add `GreatNewPackage` to the environment.

- Activate a new environment in which your MLJ clone has been
`dev`ed. Execute `using MLJ; MLJ.Registry.@update`. This updates
`/Metadata.toml` and `/Models.toml` (the latter is generated for
convenience and not used by MLJ).
- In some environment in which your MLJModels clone has been added
using `Pkg.dev`, execute `using MLJmodels; @update`. This updates
`src/registry/Metadata.toml` and `src/registry/Models.toml` (the
latter is generated for convenience and not used by MLJ).

- Quit your REPL session, whose namespace is now polluted.

- Commit and make a PR request to merge your clone with master. Once
merged, the new metadata is available to users of MLJ#master.
- Push your changes to an appropriate branch of MLJModels to make
the updated metadata available to users of the next MLJModels tagged
release.

- Consider registering an new tagged version of MLJ.

2 changes: 1 addition & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ LossFunctions = "30fc2ffe-d236-52d8-8643-a9d8f7c094a7"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
Missings = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
MultivariateStats = "6f286f6a-111f-5878-ab1e-185364afe411"
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Expand All @@ -17,4 +18,3 @@ TypedTables = "9d95f2ec-7b3d-5a63-8d20-e2491e220bb9"

[compat]
Documenter = "~0.22"
Missings = "<0.4.2"
61 changes: 30 additions & 31 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,48 +2,47 @@ if Base.HOME_PROJECT[] !== nothing
Base.HOME_PROJECT[] = abspath(Base.HOME_PROJECT[])
end
using Pkg
#Pkg.add("Documenter")
#Pkg.clone("https://github.com/alan-turing-institute/MLJBase.jl")
#Pkg.clone("https://github.com/alan-turing-institute/MLJModels.jl")
#Pkg.clone("https://github.com/alan-turing-institute/MLJ.jl")
using Documenter
using MLJ
using MLJBase
using MLJ.Transformers
using MLJ.Constant
using MLJModels.Transformers
using MLJModels.Constant
using MLJModels.KNN
using MLJModels
using ScientificTypes

#prettyurls to be changed
pages = Any["Getting Started"=>"index.md",
"Evaluating model performance"=>"evaluating_model_performance.md",
"Performance Measures"=> "performance_measures.md",
"Tuning models"=>"tuning_models.md",
"Built-in Transformers" => "built_in_transformers.md",
"Composing Models" => "composing_models.md",
"Homogeneous Ensembles" => "homogeneous_ensembles.md",
"Simple User Defined Models" => "simple_user_defined_models.md",
"Adding Models for General Use" => "adding_models_for_general_use.md",
"Benchmarking" => "benchmarking.md",
"Working with tasks" => "working_with_tasks.md",
"Internals"=>"internals.md",
"Glossary"=>"glossary.md",
"API"=>"api.md",
"MLJ Cheatsheet" => "mlj_cheatsheet.md",
"MLJ News"=>"NEWS.md",
"FAQ" => "frequently_asked_questions.md",
"Julia BlogPost"=>"julia_blogpost.md"]

for p in pages
println(first(p))
end

makedocs(
sitename = "MLJ",
format = Documenter.HTML(),
modules = [MLJ, MLJBase, MLJModels, MLJ.Transformers, ScientificTypes],
pages = Any["Getting Started"=>"index.md",
"Evaluating model performance"=>"evaluating_model_performance.md",
"Performance Measures"=> "performance_measures.md",
"Tuning models"=>"tuning_models.md",
"Built-in Transformers" => "built_in_transformers.md",
"Composing Models" => "composing_models.md",
"Homogeneous Ensembles" => "homogeneous_ensembles.md",
"Simple User Defined Models" => "simple_user_defined_models.md",
"Adding Models for General Use" => "adding_models_for_general_use.md",
"Working with Tasks" => "working_with_tasks.md",
"Benchmarking" => "benchmarking.md",
"Internals"=>"internals.md",
"Glossary"=>"glossary.md",
"API"=>"api.md",
"MLJ Cheatsheet" => "mlj_cheatsheet.md",
"MLJ News"=>"NEWS.md",
"FAQ" => "frequently_asked_questions.md",
"Julia BlogPost"=>"julia_blogpost.md"]
)
modules = [MLJ, MLJBase, MLJModels, MLJModels.Transformers, MLJModels.Constant,
MLJModels.KNN, ScientificTypes],
pages=pages)

deploydocs(
repo = "github.com/alan-turing-institute/MLJ.jl.git"
)

# modules = [MLJ]
# Documenter can also automatically deploy documentation to gh-pages.
# See "Hosting Documentation" and deploydocs() in the Documenter manual
# for more MLJBase.information.

0 comments on commit 99ef588

Please sign in to comment.