MLP Cluster Tutorial Branch

A short code repo that showcases a potential framework for carrying out experiments on the MLP Cluster.

Introduction

Welcome to the MLPractical's Introduction to the MLP GPU Cluster branch. This branch provides tutorial material for the MLP Cluster. The material available includes tutorial documents and code, as well as tooling that provides more advanced features to aid you in your quests to train lots of learnable differentiable computational graphs.

Getting Started

Before proceeding to the next section of the README, please read the getting started guide.

Installation

The code uses Pytorch to run, along with many other smaller packages. To take care of everything at once, we recommend using the conda package management library. More specifically, miniconda3, as it is lightweight and fast to install. If you have an existing miniconda3 installation please start at step 3. If you want to install both conda and the required packages, please run:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
Go through the installation.
Activate conda
conda create -n mlp python=3.6.
conda activate mlp
At this stage you need to choose which version of pytorch you need by visiting here
Choose and install the pytorch variant of your choice using the conda commands.
Then run bash install.sh

To execute an installation script simply run: bash <installation_file_name>

To activate your conda installations simply run: conda activate

Overview of code:

arg_extractor.py: Contains an array of utility methods that can parse python arguments or convert a json config file into an argument NamedTuple.
data_providers.py: A sample data provider, of the same type used in the MLPractical course.
experiment_builder.py: Builds and executes a simple image classification experiment, keeping track of relevant statistics, taking care of storing and re-loading pytorch models, as well as choosing the best validation-performing model to evaluate the test set on.
model_architectures.py: Provides a fully connected network and convolutional neural network sample models, which have a number of moving parts indicated as hyperparameters.
storage_utils.py: Provides a number of storage/loading methods for the experiment statistics.
train_evaluated_emnist_classification_system.py: Runs an experiment given a data provider, an experiment builder instance and a model architecture

Running an experiment

To run a default image classification experiment using the template models I provided:

Sign into the cluster using ssh sxxxxxxx@mlp1.inf.ed.ac.uk
Activate your conda environment using, source miniconda3/bin/activate ; conda activate mlp
cd mlpractical
cd cluster_experiment_scripts
Find which experiment(s) you want to run (make sure the experiment ends in 'gpu_cluster.sh'). Decide if you want to run a single experiment or multiple experiments in parallel.
1. For a single experiment: sbatch experiment_script.sh
2. To run multiple experiments using the "hurdle-reducing" script that automatically submits jobs, makes sure the jobs are always in queue/running:
  1. Make sure the cluster_experiment_scripts folder contains only the jobs you want to run.
  2. Run the command:
```
python run_jobs.py --num_parallel_jobs <number of jobs to keep in the slurm queue at all times> --num_epochs <number of epochs to run each job>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlp_cluster_tutorial_branch_guide.md

mlp_cluster_tutorial_branch_guide.md

MLP Cluster Tutorial Branch

Introduction

Getting Started

Installation

Overview of code:

Running an experiment

Files

mlp_cluster_tutorial_branch_guide.md

Latest commit

History

mlp_cluster_tutorial_branch_guide.md

File metadata and controls

MLP Cluster Tutorial Branch

Introduction

Getting Started

Installation

Overview of code:

Running an experiment