Exception: Command exited with status 1: steps/nnet3/get_egs.sh #56

Asma-droid · 2021-05-30T15:14:15Z

hello,

I am a beginner under Kaldi and I'am trying to finetune danzu model by mini-librispeech data (juste a simple try) to understand the process.

I have firsty prepared data, coputed MFCC and i have then used this script for finetuning https://github.com/kaldi-asr/kaldi/blob/master/egs/aishell2/s5/local/nnet3/tuning/finetune_tdnn_1a.sh

I have used https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.8.0/tree_sp.zip as it is the tree directory for the most recent models (i have used mainly the ali.x.gz files).

i have faced the below issue:

2021-05-30 14:27:20,165 [steps/nnet3/train_dnn.py:36 - - INFO ] Starting DNN trainer (train_dnn.py)
steps/nnet3/train_dnn.py --stage=-10 --cmd=run.pl --mem 4G --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.input-model exp/nnet3/tdnn_sp_train/input.raw --trainer.num-epochs 5 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00002 --trainer.optimization.minibatch-size 1024 --feat-dir data/train_hires --lang data/lang --ali-dir exp/train_ali --dir exp/nnet3/tdnn_sp_train
['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=run.pl --mem 4G', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.input-model', 'exp/nnet3/tdnn_sp_train/input.raw', '--trainer.num-epochs', '5', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00002', '--trainer.optimization.minibatch-size', '1024', '--feat-dir', 'data/train_hires', '--lang', 'data/lang', '--ali-dir', 'exp/train_ali', '--dir', 'exp/nnet3/tdnn_sp_train']
2021-05-30 14:27:20,172 [steps/nnet3/train_dnn.py:178 - train - INFO ] Arguments for the experiment
{'ali_dir': 'exp/train_ali',
'backstitch_training_interval': 1,
'backstitch_training_scale': 0.0,
'cleanup': True,
'cmvn_opts': '--norm-means=false --norm-vars=false',
'combine_sum_to_one_penalty': 0.0,
'command': 'run.pl --mem 4G',
'compute_per_dim_accuracy': False,
'dir': 'exp/nnet3/tdnn_sp_train',
'do_final_combination': True,
'dropout_schedule': None,
'egs_command': None,
'egs_dir': None,
'egs_opts': None,
'egs_stage': 0,
'email': None,
'exit_stage': None,
'feat_dir': 'data/train_hires',
'final_effective_lrate': 2e-05,
'frames_per_eg': 8,
'initial_effective_lrate': 0.0005,
'input_model': 'exp/nnet3/tdnn_sp_train/input.raw',
'lang': 'data/lang',
'max_lda_jobs': 10,
'max_models_combine': 20,
'max_objective_evaluations': 30,
'max_param_change': 2.0,
'minibatch_size': '1024',
'momentum': 0.0,
'num_epochs': 5.0,
'num_jobs_compute_prior': 10,
'num_jobs_final': 1,
'num_jobs_initial': 1,
'num_jobs_step': 1,
'online_ivector_dir': None,
'preserve_model_interval': 100,
'presoftmax_prior_scale_power': -0.25,
'prior_subset_size': 20000,
'proportional_shrink': 0.0,
'rand_prune': 4.0,
'remove_egs': True,
'reporting_interval': 0.1,
'samples_per_iter': 400000,
'shuffle_buffer_size': 5000,
'srand': 0,
'stage': -10,
'train_opts': [],
'use_gpu': 'yes'}
nnet3-info exp/nnet3/tdnn_sp_train/input.raw
2021-05-30 14:27:20,373 [steps/nnet3/train_dnn.py:238 - train - INFO ] Generating egs
steps/nnet3/get_egs.sh --cmd run.pl --mem 4G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs
steps/nnet3/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/nnet3/tdnn_sp_train/egs/.nodelete
steps/nnet3/get_egs.sh: feature type is raw, with 'apply-cmvn'
steps/nnet3/get_egs.sh: working out number of frames of training data
steps/nnet3/get_egs.sh: working out feature dim
*** steps/nnet3/get_egs.sh: warning: the --frames-per-eg is too large to generate one archive with
*** as many as --samples-per-iter egs in it. Consider reducing --frames-per-eg.
steps/nnet3/get_egs.sh: creating 1 archives, each with 238983 egs, with
steps/nnet3/get_egs.sh: 8 labels per example, and (left,right) context = (34,34)
steps/nnet3/get_egs.sh: copying data alignments
copy-int-vector ark:- ark,scp:exp/nnet3/tdnn_sp_train/egs/ali.ark,exp/nnet3/tdnn_sp_train/egs/ali.scp
ERROR (copy-int-vector[5.5.929~1539-9bca2]:ReadBasicType():base/io-funcs-inl.h:68) ReadBasicType: did not get expected integer type, 0 vs. 4. You can change this code to successfully read it later, if needed.

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f27463671c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5650dbd339dd]
copy-int-vector(kaldi::BasicVectorHolder::Read(std::istream&)+0xba9) [0x5650dbd3adcb]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::Next()+0xf3) [0x5650dbd3b159]
copy-int-vector(main+0x484) [0x5650dbd3320d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2745f5f0b3]
copy-int-vector(_start+0x2e) [0x5650dbd32cce]

WARNING (copy-int-vector[5.5.9291539-9bca2]:Read():util/kaldi-holder-inl.h:308) BasicVectorHolder::Read, read error or unexpected data at archive entry beginning at file position 18446744073709551615
WARNING (copy-int-vector[5.5.9291539-9bca2]:Next():util/kaldi-table-inl.h:574) Object read failed, reading archive standard input
LOG (copy-int-vector[5.5.9291539-9bca2]:main():copy-int-vector.cc:83) Copied 2697018 vectors of int32.
ERROR (copy-int-vector[5.5.9291539-9bca2]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive standard input

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f27463671c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5650dbd339dd]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0x121) [0x5650dbd3734d]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0xd) [0x5650dbd3764d]
copy-int-vector(kaldi::SequentialTableReader<kaldi::BasicVectorHolder >::~SequentialTableReader()+0x16) [0x5650dbd37816]
copy-int-vector(main+0x520) [0x5650dbd332a9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2745f5f0b3]
copy-int-vector(_start+0x2e) [0x5650dbd32cce]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
steps/nnet3/get_egs.sh: line 272: 1101381 Exit 1 for id in $(seq $num_ali_jobs);
do
gunzip -c $alidir/ali.$id.gz;
done
1101382 Aborted (core dumped) | copy-int-vector ark:- ark,scp:$dir/ali.ark,$dir/ali.scp
Traceback (most recent call last):
File "steps/nnet3/train_dnn.py", line 459, in main
train(args, run_opts)
File "steps/nnet3/train_dnn.py", line 253, in train
stage=args.egs_stage)
File "steps/libs/nnet3/train/frame_level_objf/acoustic_model.py", line 61, in generate_egs
egs_opts=egs_opts if egs_opts is not None else ''))
File "steps/libs/common.py", line 129, in execute_command
p.returncode, command))
Exception: Command exited with status 1: steps/nnet3/get_egs.sh --cmd "run.pl --mem 4G" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "" --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs

(GPU) /scratch/asma-kaldi/egs/aishell2/s5$ bash local/nnet3/tuning/finetune_tdnn_1a.sh
2021-05-30 14:31:18,075 [steps/nnet3/train_dnn.py:36 - - INFO ] Starting DNN trainer (train_dnn.py)
steps/nnet3/train_dnn.py --stage=-10 --cmd=run.pl --mem 4G --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.input-model exp/nnet3/tdnn_sp_train/input.raw --trainer.num-epochs 5 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00002 --trainer.optimization.minibatch-size 1024 --feat-dir data/train_hires --lang data/lang --ali-dir exp/train_ali --dir exp/nnet3/tdnn_sp_train
['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=run.pl --mem 4G', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.input-model', 'exp/nnet3/tdnn_sp_train/input.raw', '--trainer.num-epochs', '5', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00002', '--trainer.optimization.minibatch-size', '1024', '--feat-dir', 'data/train_hires', '--lang', 'data/lang', '--ali-dir', 'exp/train_ali', '--dir', 'exp/nnet3/tdnn_sp_train']
2021-05-30 14:31:18,082 [steps/nnet3/train_dnn.py:178 - train - INFO ] Arguments for the experiment
{'ali_dir': 'exp/train_ali',
'backstitch_training_interval': 1,
'backstitch_training_scale': 0.0,
'cleanup': True,
'cmvn_opts': '--norm-means=false --norm-vars=false',
'combine_sum_to_one_penalty': 0.0,
'command': 'run.pl --mem 4G',
'compute_per_dim_accuracy': False,
'dir': 'exp/nnet3/tdnn_sp_train',
'do_final_combination': True,
'dropout_schedule': None,
'egs_command': None,
'egs_dir': None,
'egs_opts': None,
'egs_stage': 0,
'email': None,
'exit_stage': None,
'feat_dir': 'data/train_hires',
'final_effective_lrate': 2e-05,
'frames_per_eg': 8,
'initial_effective_lrate': 0.0005,
'input_model': 'exp/nnet3/tdnn_sp_train/input.raw',
'lang': 'data/lang',
'max_lda_jobs': 10,
'max_models_combine': 20,
'max_objective_evaluations': 30,
'max_param_change': 2.0,
'minibatch_size': '1024',
'momentum': 0.0,
'num_epochs': 5.0,
'num_jobs_compute_prior': 10,
'num_jobs_final': 1,
'num_jobs_initial': 1,
'num_jobs_step': 1,
'online_ivector_dir': None,
'preserve_model_interval': 100,
'presoftmax_prior_scale_power': -0.25,
'prior_subset_size': 20000,
'proportional_shrink': 0.0,
'rand_prune': 4.0,
'remove_egs': True,
'reporting_interval': 0.1,
'samples_per_iter': 400000,
'shuffle_buffer_size': 5000,
'srand': 0,
'stage': -10,
'train_opts': [],
'use_gpu': 'yes'}
nnet3-info exp/nnet3/tdnn_sp_train/input.raw
2021-05-30 14:31:18,286 [steps/nnet3/train_dnn.py:238 - train - INFO ] Generating egs
steps/nnet3/get_egs.sh --cmd run.pl --mem 4G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs
steps/nnet3/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/nnet3/tdnn_sp_train/egs/.nodelete
steps/nnet3/get_egs.sh: feature type is raw, with 'apply-cmvn'
steps/nnet3/get_egs.sh: working out number of frames of training data
steps/nnet3/get_egs.sh: working out feature dim
*** steps/nnet3/get_egs.sh: warning: the --frames-per-eg is too large to generate one archive with
*** as many as --samples-per-iter egs in it. Consider reducing --frames-per-eg.
steps/nnet3/get_egs.sh: creating 1 archives, each with 238983 egs, with
steps/nnet3/get_egs.sh: 8 labels per example, and (left,right) context = (34,34)
steps/nnet3/get_egs.sh: copying data alignments
copy-int-vector ark:- ark,scp:exp/nnet3/tdnn_sp_train/egs/ali.ark,exp/nnet3/tdnn_sp_train/egs/ali.scp
ERROR (copy-int-vector[5.5.9291539-9bca2]:ReadBasicType():base/io-funcs-inl.h:68) ReadBasicType: did not get expected integer type, 0 vs. 4. You can change this code to successfully read it later, if needed.

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f3acf1181c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55f3b30bd9dd]
copy-int-vector(kaldi::BasicVectorHolder::Read(std::istream&)+0xba9) [0x55f3b30c4dcb]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::Next()+0xf3) [0x55f3b30c5159]
copy-int-vector(main+0x484) [0x55f3b30bd20d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3aced100b3]
copy-int-vector(_start+0x2e) [0x55f3b30bccce]

WARNING (copy-int-vector[5.5.9291539-9bca2]:Read():util/kaldi-holder-inl.h:308) BasicVectorHolder::Read, read error or unexpected data at archive entry beginning at file position 18446744073709551615
WARNING (copy-int-vector[5.5.9291539-9bca2]:Next():util/kaldi-table-inl.h:574) Object read failed, reading archive standard input
LOG (copy-int-vector[5.5.9291539-9bca2]:main():copy-int-vector.cc:83) Copied 2697018 vectors of int32.
ERROR (copy-int-vector[5.5.9291539-9bca2]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive standard input

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f3acf1181c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55f3b30bd9dd]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0x121) [0x55f3b30c134d]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0xd) [0x55f3b30c164d]
copy-int-vector(kaldi::SequentialTableReader<kaldi::BasicVectorHolder >::~SequentialTableReader()+0x16) [0x55f3b30c1816]
copy-int-vector(main+0x520) [0x55f3b30bd2a9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3aced100b3]
copy-int-vector(_start+0x2e) [0x55f3b30bccce]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
steps/nnet3/get_egs.sh: line 272: 1101630 Exit 1 for id in $(seq $num_ali_jobs);
do
gunzip -c $alidir/ali.$id.gz;
done
1101631 Aborted (core dumped) | copy-int-vector ark:- ark,scp:$dir/ali.ark,$dir/ali.scp
Traceback (most recent call last):
File "steps/nnet3/train_dnn.py", line 459, in main
train(args, run_opts)
File "steps/nnet3/train_dnn.py", line 253, in train
stage=args.egs_stage)
File "steps/libs/nnet3/train/frame_level_objf/acoustic_model.py", line 61, in generate_egs
egs_opts=egs_opts if egs_opts is not None else ''))
File "steps/libs/common.py", line 129, in execute_command
p.returncode, command))
Exception: Command exited with status 1: steps/nnet3/get_egs.sh --cmd "run.pl --mem 4G" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "" --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group

You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/60e0ba04-247d-4a26-93f8-8bdd106ca987n%40googlegroups.com.

Any idea please ( for information, i am using two GPUs with 25 GB each )?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: Command exited with status 1: steps/nnet3/get_egs.sh #56

Exception: Command exited with status 1: steps/nnet3/get_egs.sh #56

Asma-droid commented May 30, 2021

Exception: Command exited with status 1: steps/nnet3/get_egs.sh #56

Exception: Command exited with status 1: steps/nnet3/get_egs.sh #56

Comments

Asma-droid commented May 30, 2021

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group