Skip to content

Latest commit

 

History

History
88 lines (69 loc) · 3.23 KB

train.md

File metadata and controls

88 lines (69 loc) · 3.23 KB

Train

At its simplest, training a model can be invoked with

<train_cpp_binary> train --flagsfile=<path_to_flags>

The flags to the train binary can be passed in a flagfile (see this example flags file) or as flags on the command line:

<train_cpp_binary> [train|continue|fork] \
--datadir <path/to/data/> \
--tokensdir <path/to/tokens/file/> \
--archdir <path/to/architecture/files/> \
--rundir <path/to/save/models/> \
--arch   <name_of_architecture.arch> \
--train  <train/datasets/> \
--valid  <validation/datasets> \
--lexicon <path/to/lexicon.txt> \
--lr=0.0001 \
--lrcrit=0.0001

Modes

Training supports three modes:

  • train : Train a model from scratch on the given training data.
  • continue : Continue training a saved model. This can be used for example to fine-tune with a smaller learning rate. The continue option makes a best effort to resume training from the most recent checkpoint of a given model as if there were no interruptions.
  • fork : Create and train a new model from a saved model. This can be used for example to adapt a saved model to a new dataset.

Flags

We give a short description of some of the more important flags here. A complete list of the flag definitions and short descriptions of their meaning can be found here.

The datadir flag is the base path to where all the train and valid dataset directories live. Every train path will be prefixed by datadir. Multiple datasets can be passed to train and valid as a comma-separated list. More details on dataset preparation can be found here.

Similarly, the archdir and tokensdir are (optional) base paths to where the arch and token files live. For example, the complete architecture file path will be <archdir>/<arch>. More details on specifying architecture files can be found here. lexicon flag is used to specify the lexicon which specifies the token sequence for a give word.

The rundir flag is the base directory where the model will be saved and the runname is the subdirectory that will be created to save the model and training logs. If runname is unspecified a directory name based on the date, time and user will be created.

Most of the training hyperparameter flags have default values. Many of these you will not need to change. Some of the more important ones include:

- `lr` : The learning rate for the model parameters.
- `lrcrit` : The learning rate for the criterion parameters.
- `criterion` : Which criterion (e.g. loss function) to use. Options include `ctc`,
  `asg` or `seq2seq`.
- `batchsize` : The size of the minibatch to use per GPU.
- `maxgradnorm` : Clip the norm of gradient of the model and criterion parameters
  to this value. NB the norm is computed and clipped on the aggregated model
  and criterion parameters.

Distributed

wav2letter++ supports distributed training on multiple GPUs out of the box. To run on multiple GPUs set pass the flag -enable_distributed true and run with MPI:

mpirun -n 8 <train_cpp_binary> [train|continue|fork] \
-enable_distributed true \
<... other flags ..>

The above command will run data parallel training with 8 processes (e.g. on 8 GPUs).