Installation

This package is an automatic machine learning module whose function is to optimize the hyper-parameters of an automatic learning model.

In this package you can find: a grid search method, a random search algorithm and a Gaussian process search method. Everything is implemented to be compatible with the Tensorflow, pyTorch and sklearn libraries.

Installation

Latest stable version:

pip install AutoMLpy

Latest unstable version:

Download the .whl file here;
Copy the path of this file on your computer;
pip install it with pip install [path].whl

With pip+git:

pip install git+https://github.com/JeremieGince/AutoMLpy

Example - MNIST optimization with Tensorflow & Keras

Here you can see an example on how to optimize a model made with Tensorflow and Keras on the popular dataset MNIST.

Imports

We start by importing some useful stuff.

# Some useful packages
from typing import Union, Tuple
import time
import numpy as np
import pandas as pd
import pprint

# Tensorflow
import tensorflow as tf
import tensorflow_datasets as tfds

# Importing the HPOptimizer and the RandomHpSearch from the AutoMLpy package.
from AutoMLpy import HpOptimizer, RandomHpSearch

Dataset

Now we load the MNIST dataset in the tensorflow way.

def normalize_img(image, label):
    """Normalizes images: `uint8` -> `float32`."""
    return tf.cast(image, tf.float32) / 255., label

def get_tf_mnist_dataset(**kwargs):
    # https://www.tensorflow.org/datasets/keras_example
    (ds_train, ds_test), ds_info = tfds.load(
        'mnist',
        split=['train', 'test'],
        shuffle_files=True,
        as_supervised=True,
        with_info=True,
    )

    # Build training pipeline
    ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    ds_train = ds_train.cache()
    ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
    ds_train = ds_train.batch(128)
    ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

    # Build evaluation pipeline
    ds_test = ds_test.map(normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    ds_test = ds_test.batch(128)
    ds_test = ds_test.cache()
    ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

    return ds_train, ds_test

Keras Model

Now we make a function that return a keras model given a set of hyper-parameters (hp).

def get_tf_mnist_model(**hp):

    if hp.get("use_conv", False):
        model = tf.keras.models.Sequential([
            # Convolution layers
            tf.keras.layers.Conv2D(10, 3, padding="same", input_shape=(28, 28, 1)),
            tf.keras.layers.MaxPool2D((2, 2)),
            tf.keras.layers.Conv2D(50, 3, padding="same"),
            tf.keras.layers.MaxPool2D((2, 2)),

            # Dense layers
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(120, activation='relu'),
            tf.keras.layers.Dense(84, activation='relu'),
            tf.keras.layers.Dense(10)
        ])
    else:
        model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(120, activation='relu'),
            tf.keras.layers.Dense(84, activation='relu'),
            tf.keras.layers.Dense(10)
        ])

    return model

The Optimizer Model

It's time to implement the optimizer model. You just have to implement the following methods: "build_model", "fit_dataset_model_" and "score_on_dataset". Those methods must respect their signature and output type. The objective here is to make the building, the training and the score phase depend on some hyper-parameters. So the optimizer can use those to find the best set of hp.

class KerasMNISTHpOptimizer(HpOptimizer):
    def build_model(self, **hp) -> tf.keras.Model:
        model = get_tf_mnist_model(**hp)

        model.compile(
            optimizer=tf.keras.optimizers.SGD(
                learning_rate=hp.get("learning_rate", 1e-3),
                nesterov=hp.get("nesterov", True),
                momentum=hp.get("momentum", 0.99),
            ),
            loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
        )
        return model

    def fit_dataset_model_(
            self,
            model: tf.keras.Model,
            dataset,
            **hp
    ) -> tf.keras.Model:
        history = model.fit(
            dataset,
            epochs=hp.get("epochs", 1),
            verbose=False,
        )
        return model

    def score_on_dataset(
            self,
            model: tf.keras.Model,
            dataset,
            **hp
    ) -> float:
        test_loss, test_acc = model.evaluate(dataset, verbose=0)
        return test_acc

Execution & Optimization

First thing after creating our classes is to load the dataset in memory.

mnist_train, mnist_test = get_tf_mnist_dataset()
mnist_hp_optimizer = KerasMNISTHpOptimizer()

After you will define your hyper-parameters space with a dictionary like this.

hp_space = dict(
    epochs=list(range(1, 16)),
    learning_rate=np.linspace(1e-4, 1e-1, 50),
    nesterov=[True, False],
    momentum=np.linspace(0.01, 0.99, 50),
    use_conv=[True, False],
)

It's time to define your hp search algorithm and give it your budget in time and iteration. Here we will test for 10 minutes and 100 iterations maximum.

param_gen = RandomHpSearch(hp_space, max_seconds=60*10, max_itr=100)

Finally, you start the optimization by giving your parameter generator to the optimize method. Note that the "stop_criterion" argument is to stop the optimization when the given score is reached. It's really useful to save some time.

save_kwargs = dict(
    save_name=f"tf_mnist_hp_opt",
    title="Random search: MNIST",
)

param_gen = mnist_hp_optimizer.optimize_on_dataset(
    param_gen, mnist_train, save_kwargs=save_kwargs,
    stop_criterion=1.0,
)

Testing

Now, you can test the optimized hyper-parameters by fitting again with the full train dataset. Yes with the full dataset, because in the optimization phase a cross-validation is made which crop your train dataset by half. Plus, it's time to test the fitted model on the test dataset.

opt_hp = param_gen.get_best_param()

model = mnist_hp_optimizer.build_model(**opt_hp)
mnist_hp_optimizer.fit_dataset_model_(
    model, mnist_train, **opt_hp
)

test_acc = mnist_hp_optimizer.score_on_dataset(
    model, mnist_test, **opt_hp
)

print(f"test_acc: {test_acc*100:.3f}%")

The optimized hyper-parameters:

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(opt_hp)

Visualization

You can visualize the optimization with an interactive html file.

fig = param_gen.write_optimization_to_html(show=True, dark_mode=True, **save_kwargs)

Optimisation table

opt_table = param_gen.get_optimization_table()

Saving ParameterGenerator

param_gen.save_history(**save_kwargs)
save_path = param_gen.save_obj(**save_kwargs)

Loading ParameterGenerator

param_gen = RandomHpSearch.load_obj(save_path)

Re-lunch optimisation with loaded ParameterGenerator

# Change the budget to be able to optimize again
param_gen.max_itr = param_gen.max_seconds + 100
param_gen.max_seconds = param_gen.max_seconds + 60

param_gen = mnist_hp_optimizer.optimize_on_dataset(
    param_gen, mnist_train, save_kwargs=save_kwargs,
    stop_criterion=1.0, reset_gen=False,
)

opt_hp = param_gen.get_best_param()

print(param_gen.get_optimization_table())
pp.pprint(param_gen.history)
pp.pprint(opt_hp)

Other examples

Examples on how to use this package are in the folder ./examples. There you can find the previous example with Tensorflow and an example with pyTorch.

License

Apache License 2.0

Citation

@article{Gince,
  title={Implémentation du module AutoMLpy, un outil d’apprentissage machine automatique},
  author={Jérémie Gince},
  year={2021},
  publisher={ULaval},
  url={https://github.com/JeremieGince/AutoMLpy},
}

Name		Name	Last commit message	Last commit date
Latest commit History 359 Commits
dist		dist
examples		examples
images		images
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
__main__.py		__main__.py
requirements.txt		requirements.txt
setup.py		setup.py

License

JeremieGince/AutoMLpy

Folders and files

Latest commit

History

Repository files navigation

Installation

Latest stable version:

Latest unstable version:

With pip+git:

Example - MNIST optimization with Tensorflow & Keras

Imports

Dataset

Keras Model

The Optimizer Model

Execution & Optimization

Testing

Visualization

Optimisation table

Saving ParameterGenerator

Loading ParameterGenerator

Re-lunch optimisation with loaded ParameterGenerator

Other examples

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages