Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: Command exited with status 1: steps/nnet3/get_egs.sh #56

Open
Asma-droid opened this issue May 30, 2021 · 0 comments
Open

Exception: Command exited with status 1: steps/nnet3/get_egs.sh #56

Asma-droid opened this issue May 30, 2021 · 0 comments

Comments

@Asma-droid
Copy link

hello,

I am a beginner under Kaldi and I'am trying to finetune danzu model by mini-librispeech data (juste a simple try) to understand the process.

I have firsty prepared data, coputed MFCC and i have then used this script for finetuning https://github.com/kaldi-asr/kaldi/blob/master/egs/aishell2/s5/local/nnet3/tuning/finetune_tdnn_1a.sh

I have used https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.8.0/tree_sp.zip as it is the tree directory for the most recent models (i have used mainly the ali.x.gz files).

i have faced the below issue:


2021-05-30 14:27:20,165 [steps/nnet3/train_dnn.py:36 - - INFO ] Starting DNN trainer (train_dnn.py)
steps/nnet3/train_dnn.py --stage=-10 --cmd=run.pl --mem 4G --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.input-model exp/nnet3/tdnn_sp_train/input.raw --trainer.num-epochs 5 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00002 --trainer.optimization.minibatch-size 1024 --feat-dir data/train_hires --lang data/lang --ali-dir exp/train_ali --dir exp/nnet3/tdnn_sp_train
['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=run.pl --mem 4G', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.input-model', 'exp/nnet3/tdnn_sp_train/input.raw', '--trainer.num-epochs', '5', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00002', '--trainer.optimization.minibatch-size', '1024', '--feat-dir', 'data/train_hires', '--lang', 'data/lang', '--ali-dir', 'exp/train_ali', '--dir', 'exp/nnet3/tdnn_sp_train']
2021-05-30 14:27:20,172 [steps/nnet3/train_dnn.py:178 - train - INFO ] Arguments for the experiment
{'ali_dir': 'exp/train_ali',
'backstitch_training_interval': 1,
'backstitch_training_scale': 0.0,
'cleanup': True,
'cmvn_opts': '--norm-means=false --norm-vars=false',
'combine_sum_to_one_penalty': 0.0,
'command': 'run.pl --mem 4G',
'compute_per_dim_accuracy': False,
'dir': 'exp/nnet3/tdnn_sp_train',
'do_final_combination': True,
'dropout_schedule': None,
'egs_command': None,
'egs_dir': None,
'egs_opts': None,
'egs_stage': 0,
'email': None,
'exit_stage': None,
'feat_dir': 'data/train_hires',
'final_effective_lrate': 2e-05,
'frames_per_eg': 8,
'initial_effective_lrate': 0.0005,
'input_model': 'exp/nnet3/tdnn_sp_train/input.raw',
'lang': 'data/lang',
'max_lda_jobs': 10,
'max_models_combine': 20,
'max_objective_evaluations': 30,
'max_param_change': 2.0,
'minibatch_size': '1024',
'momentum': 0.0,
'num_epochs': 5.0,
'num_jobs_compute_prior': 10,
'num_jobs_final': 1,
'num_jobs_initial': 1,
'num_jobs_step': 1,
'online_ivector_dir': None,
'preserve_model_interval': 100,
'presoftmax_prior_scale_power': -0.25,
'prior_subset_size': 20000,
'proportional_shrink': 0.0,
'rand_prune': 4.0,
'remove_egs': True,
'reporting_interval': 0.1,
'samples_per_iter': 400000,
'shuffle_buffer_size': 5000,
'srand': 0,
'stage': -10,
'train_opts': [],
'use_gpu': 'yes'}
nnet3-info exp/nnet3/tdnn_sp_train/input.raw
2021-05-30 14:27:20,373 [steps/nnet3/train_dnn.py:238 - train - INFO ] Generating egs
steps/nnet3/get_egs.sh --cmd run.pl --mem 4G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs
steps/nnet3/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/nnet3/tdnn_sp_train/egs/.nodelete
steps/nnet3/get_egs.sh: feature type is raw, with 'apply-cmvn'
steps/nnet3/get_egs.sh: working out number of frames of training data
steps/nnet3/get_egs.sh: working out feature dim
*** steps/nnet3/get_egs.sh: warning: the --frames-per-eg is too large to generate one archive with
*** as many as --samples-per-iter egs in it. Consider reducing --frames-per-eg.
steps/nnet3/get_egs.sh: creating 1 archives, each with 238983 egs, with
steps/nnet3/get_egs.sh: 8 labels per example, and (left,right) context = (34,34)
steps/nnet3/get_egs.sh: copying data alignments
copy-int-vector ark:- ark,scp:exp/nnet3/tdnn_sp_train/egs/ali.ark,exp/nnet3/tdnn_sp_train/egs/ali.scp
ERROR (copy-int-vector[5.5.929~1539-9bca2]:ReadBasicType():base/io-funcs-inl.h:68) ReadBasicType: did not get expected integer type, 0 vs. 4. You can change this code to successfully read it later, if needed.

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f27463671c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5650dbd339dd]
copy-int-vector(kaldi::BasicVectorHolder::Read(std::istream&)+0xba9) [0x5650dbd3adcb]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::Next()+0xf3) [0x5650dbd3b159]
copy-int-vector(main+0x484) [0x5650dbd3320d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2745f5f0b3]
copy-int-vector(_start+0x2e) [0x5650dbd32cce]

WARNING (copy-int-vector[5.5.9291539-9bca2]:Read():util/kaldi-holder-inl.h:308) BasicVectorHolder::Read, read error or unexpected data at archive entry beginning at file position 18446744073709551615
WARNING (copy-int-vector[5.5.929
1539-9bca2]:Next():util/kaldi-table-inl.h:574) Object read failed, reading archive standard input
LOG (copy-int-vector[5.5.9291539-9bca2]:main():copy-int-vector.cc:83) Copied 2697018 vectors of int32.
ERROR (copy-int-vector[5.5.929
1539-9bca2]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive standard input

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f27463671c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5650dbd339dd]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0x121) [0x5650dbd3734d]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0xd) [0x5650dbd3764d]
copy-int-vector(kaldi::SequentialTableReader<kaldi::BasicVectorHolder >::~SequentialTableReader()+0x16) [0x5650dbd37816]
copy-int-vector(main+0x520) [0x5650dbd332a9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2745f5f0b3]
copy-int-vector(_start+0x2e) [0x5650dbd32cce]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
steps/nnet3/get_egs.sh: line 272: 1101381 Exit 1 for id in $(seq $num_ali_jobs);
do
gunzip -c $alidir/ali.$id.gz;
done
1101382 Aborted (core dumped) | copy-int-vector ark:- ark,scp:$dir/ali.ark,$dir/ali.scp
Traceback (most recent call last):
File "steps/nnet3/train_dnn.py", line 459, in main
train(args, run_opts)
File "steps/nnet3/train_dnn.py", line 253, in train
stage=args.egs_stage)
File "steps/libs/nnet3/train/frame_level_objf/acoustic_model.py", line 61, in generate_egs
egs_opts=egs_opts if egs_opts is not None else ''))
File "steps/libs/common.py", line 129, in execute_command
p.returncode, command))
Exception: Command exited with status 1: steps/nnet3/get_egs.sh --cmd "run.pl --mem 4G" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "" --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs

(GPU) /scratch/asma-kaldi/egs/aishell2/s5$ bash local/nnet3/tuning/finetune_tdnn_1a.sh
2021-05-30 14:31:18,075 [steps/nnet3/train_dnn.py:36 - - INFO ] Starting DNN trainer (train_dnn.py)
steps/nnet3/train_dnn.py --stage=-10 --cmd=run.pl --mem 4G --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.input-model exp/nnet3/tdnn_sp_train/input.raw --trainer.num-epochs 5 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00002 --trainer.optimization.minibatch-size 1024 --feat-dir data/train_hires --lang data/lang --ali-dir exp/train_ali --dir exp/nnet3/tdnn_sp_train
['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=run.pl --mem 4G', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.input-model', 'exp/nnet3/tdnn_sp_train/input.raw', '--trainer.num-epochs', '5', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00002', '--trainer.optimization.minibatch-size', '1024', '--feat-dir', 'data/train_hires', '--lang', 'data/lang', '--ali-dir', 'exp/train_ali', '--dir', 'exp/nnet3/tdnn_sp_train']
2021-05-30 14:31:18,082 [steps/nnet3/train_dnn.py:178 - train - INFO ] Arguments for the experiment
{'ali_dir': 'exp/train_ali',
'backstitch_training_interval': 1,
'backstitch_training_scale': 0.0,
'cleanup': True,
'cmvn_opts': '--norm-means=false --norm-vars=false',
'combine_sum_to_one_penalty': 0.0,
'command': 'run.pl --mem 4G',
'compute_per_dim_accuracy': False,
'dir': 'exp/nnet3/tdnn_sp_train',
'do_final_combination': True,
'dropout_schedule': None,
'egs_command': None,
'egs_dir': None,
'egs_opts': None,
'egs_stage': 0,
'email': None,
'exit_stage': None,
'feat_dir': 'data/train_hires',
'final_effective_lrate': 2e-05,
'frames_per_eg': 8,
'initial_effective_lrate': 0.0005,
'input_model': 'exp/nnet3/tdnn_sp_train/input.raw',
'lang': 'data/lang',
'max_lda_jobs': 10,
'max_models_combine': 20,
'max_objective_evaluations': 30,
'max_param_change': 2.0,
'minibatch_size': '1024',
'momentum': 0.0,
'num_epochs': 5.0,
'num_jobs_compute_prior': 10,
'num_jobs_final': 1,
'num_jobs_initial': 1,
'num_jobs_step': 1,
'online_ivector_dir': None,
'preserve_model_interval': 100,
'presoftmax_prior_scale_power': -0.25,
'prior_subset_size': 20000,
'proportional_shrink': 0.0,
'rand_prune': 4.0,
'remove_egs': True,
'reporting_interval': 0.1,
'samples_per_iter': 400000,
'shuffle_buffer_size': 5000,
'srand': 0,
'stage': -10,
'train_opts': [],
'use_gpu': 'yes'}
nnet3-info exp/nnet3/tdnn_sp_train/input.raw
2021-05-30 14:31:18,286 [steps/nnet3/train_dnn.py:238 - train - INFO ] Generating egs
steps/nnet3/get_egs.sh --cmd run.pl --mem 4G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs
steps/nnet3/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/nnet3/tdnn_sp_train/egs/.nodelete
steps/nnet3/get_egs.sh: feature type is raw, with 'apply-cmvn'
steps/nnet3/get_egs.sh: working out number of frames of training data
steps/nnet3/get_egs.sh: working out feature dim
*** steps/nnet3/get_egs.sh: warning: the --frames-per-eg is too large to generate one archive with
*** as many as --samples-per-iter egs in it. Consider reducing --frames-per-eg.
steps/nnet3/get_egs.sh: creating 1 archives, each with 238983 egs, with
steps/nnet3/get_egs.sh: 8 labels per example, and (left,right) context = (34,34)
steps/nnet3/get_egs.sh: copying data alignments
copy-int-vector ark:- ark,scp:exp/nnet3/tdnn_sp_train/egs/ali.ark,exp/nnet3/tdnn_sp_train/egs/ali.scp
ERROR (copy-int-vector[5.5.929
1539-9bca2]:ReadBasicType():base/io-funcs-inl.h:68) ReadBasicType: did not get expected integer type, 0 vs. 4. You can change this code to successfully read it later, if needed.

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f3acf1181c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55f3b30bd9dd]
copy-int-vector(kaldi::BasicVectorHolder::Read(std::istream&)+0xba9) [0x55f3b30c4dcb]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::Next()+0xf3) [0x55f3b30c5159]
copy-int-vector(main+0x484) [0x55f3b30bd20d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3aced100b3]
copy-int-vector(_start+0x2e) [0x55f3b30bccce]

WARNING (copy-int-vector[5.5.9291539-9bca2]:Read():util/kaldi-holder-inl.h:308) BasicVectorHolder::Read, read error or unexpected data at archive entry beginning at file position 18446744073709551615
WARNING (copy-int-vector[5.5.929
1539-9bca2]:Next():util/kaldi-table-inl.h:574) Object read failed, reading archive standard input
LOG (copy-int-vector[5.5.9291539-9bca2]:main():copy-int-vector.cc:83) Copied 2697018 vectors of int32.
ERROR (copy-int-vector[5.5.929
1539-9bca2]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive standard input

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f3acf1181c3]
copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55f3b30bd9dd]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0x121) [0x55f3b30c134d]
copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0xd) [0x55f3b30c164d]
copy-int-vector(kaldi::SequentialTableReader<kaldi::BasicVectorHolder >::~SequentialTableReader()+0x16) [0x55f3b30c1816]
copy-int-vector(main+0x520) [0x55f3b30bd2a9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3aced100b3]
copy-int-vector(_start+0x2e) [0x55f3b30bccce]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
steps/nnet3/get_egs.sh: line 272: 1101630 Exit 1 for id in $(seq $num_ali_jobs);
do
gunzip -c $alidir/ali.$id.gz;
done
1101631 Aborted (core dumped) | copy-int-vector ark:- ark,scp:$dir/ali.ark,$dir/ali.scp
Traceback (most recent call last):
File "steps/nnet3/train_dnn.py", line 459, in main
train(args, run_opts)
File "steps/nnet3/train_dnn.py", line 253, in train
stage=args.egs_stage)
File "steps/libs/nnet3/train/frame_level_objf/acoustic_model.py", line 61, in generate_egs
egs_opts=egs_opts if egs_opts is not None else ''))
File "steps/libs/common.py", line 129, in execute_command
p.returncode, command))
Exception: Command exited with status 1: steps/nnet3/get_egs.sh --cmd "run.pl --mem 4G" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "" --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group

You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/60e0ba04-247d-4a26-93f8-8bdd106ca987n%40googlegroups.com.


Any idea please ( for information, i am using two GPUs with 25 GB each )?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant