-
Notifications
You must be signed in to change notification settings - Fork 2.1k
xuankai@cmu edited this page Apr 2, 2022
·
7 revisions
- cd to
exp/<directory of your specific direction>/images
where you can see the figures for several metrics (e.g., acc, loss, time) during the training of each epoch. - Also, you can use
tensorboard
to view the training and validation curves. To start it, first activate your ESPnet python environment, then run
tensorboard --logdir path/to/log
- Check if there is any error in your log files. If so, fix it before modifying other configs.
-
lr
andwarmup_steps
are very important hyper-parameters that always need to be tuned. You can try different combinations of these hyper-parameters and compare their performance. - Sometimes the (effective) batch size can be an issue. It should not be too small. You can modify
batch_bins
andaccum_grad
based on your observation. - If the dataset is small, try to reduce the model size (e.g.,
output_size
,attention_heads
,linear_units
,num_blocks
). You can start from a medium size model with around 40M parameters. Actually, most ASR recipes use this size. It may not be optimal, but it is good for debugging. If your dataset is really large (e.g.librispeech 960h
), you can further consider larger models to improve performance. - In some cases,
conformer
is more difficult to train, so it is a good idea to start from atransformer
encoder.
Reduce the batch_bins
or batch_size
in your model configuration file, depending on your batching method.
- check sentence/audio lengths are too long or not. If so, please check stage 4 in the ASR recipe, and remove such files by changing the options.
- check the length of the downsampled audio is longer than the length of the output token sequence
- Out of memory (exit code 137) in stage 10. Some SSLR models are large, exceeding the allocated RAM.
- If so, please use larger memory machine or request more memory.
- When using the same encoder, transfering to SSLR output NaN more often.
- If so, please check the
input_layer
of encoder. It's probably because of subsampling ratio. The solution is to useconv2d2
with subsampling ratio of 2.
- If so, please check the
- Error in stage 11 related to normalization
- Check if both
extract_feats_in_collect_stats: false
in config and--feats_normalization global_mvn
in run.sh are used.
- Check if both