Skip to content
xuankai@cmu edited this page Apr 2, 2022 · 7 revisions

ASR

how to check the learning curve?

  • cd to exp/<directory of your specific direction>/images where you can see the figures for several metrics (e.g., acc, loss, time) during the training of each epoch.
  • Also, you can use tensorboard to view the training and validation curves. To start it, first activate your ESPnet python environment, then run
tensorboard --logdir path/to/log

how to tune the hyper-parameters when training failed?

  • Check if there is any error in your log files. If so, fix it before modifying other configs.
  • lr and warmup_steps are very important hyper-parameters that always need to be tuned. You can try different combinations of these hyper-parameters and compare their performance.
  • Sometimes the (effective) batch size can be an issue. It should not be too small. You can modify batch_bins and accum_grad based on your observation.
  • If the dataset is small, try to reduce the model size (e.g., output_size, attention_heads, linear_units, num_blocks). You can start from a medium size model with around 40M parameters. Actually, most ASR recipes use this size. It may not be optimal, but it is good for debugging. If your dataset is really large (e.g. librispeech 960h), you can further consider larger models to improve performance.
  • In some cases, conformer is more difficult to train, so it is a good idea to start from a transformer encoder.

GPU memory error?

Reduce the batch_bins or batch_size in your model configuration file, depending on your batching method.

CTC loss becomes NaN

  • check sentence/audio lengths are too long or not. If so, please check stage 4 in the ASR recipe, and remove such files by changing the options.
  • check the length of the downsampled audio is longer than the length of the output token sequence

Using Self-Supervised Learning Representation (SSLR) Frontend

  • Out of memory (exit code 137) in stage 10. Some SSLR models are large, exceeding the allocated RAM.
    • If so, please use larger memory machine or request more memory.
  • When using the same encoder, transfering to SSLR output NaN more often.
    • If so, please check the input_layer of encoder. It's probably because of subsampling ratio. The solution is to use conv2d2 with subsampling ratio of 2.
  • Error in stage 11 related to normalization
    • Check if both extract_feats_in_collect_stats: false in config and --feats_normalization global_mvn in run.sh are used.