Skip to content

neuro-inc/ml-recipe-hyperparam-wandb

Repository files navigation

Multithreaded hyperparameter tuning

In this project, we show how you can quickly run a parallel hyperparameter tuning using the Neu.ro platform. As a benchmark, we solve a simple task of classifying images from CIFAR10 dataset.

Used technologies

  • Catalyst as a pipeline runner for deep learning tasks. This new and rapidly developing library can significantly reduce the amount of boilerplate code. If you are familiar with the TensorFlow ecosystem, you can think of Catalyst as Keras for PyTorch.
  • Weights & biases as hyperparameter tuning backend and logging system.

Steps to run

  • neuro-flow build myimage - Before we start doing something, we have to run the command, which builds a Docker container image with all the necessary dependencies. Please note: image definition for building is taken from .neuro/live.yml.
  • Installing W&B and configuring its credentials (see our guide).
  • neuro-flow upload ALL - Upload your configuration files and training code to platform storage (here live.yaml config file is used to bind storage and local folders).
  • neuro-flow bake hypertrain --param token_secret_name wandb-token- Run distributed hyperparameter tuning with 2 parallel training jobs on the platform (number of jobs can be specified in .neuro/hypertrain.yml). Additional parameters of tuning you can set in config/wandb-sweep.yaml file; see W&B documentation about sweeps for more details. If you created a secret with different name, use it instead of wandb-token.
  • neuro-flow run train - Run a single training process with default hyperparameters.

Outcomes

  • Charts and a table with comparisons of runs with different hyperparameters in W&B Web UI (see Sweep section on the left bar). There you can also find a button for early stop the search (or you can use neuro-flow kill ALL for this purpose).
  • Training logs and checkpoints can be found in the results directory.