Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/ami/s5b recipe failing #263

Open
agarwalchaitanya opened this issue Feb 3, 2021 · 2 comments
Open

examples/ami/s5b recipe failing #263

agarwalchaitanya opened this issue Feb 3, 2021 · 2 comments

Comments

@agarwalchaitanya
Copy link

Hi, I'm trying to run the ami recipe but it's failing with the following trace. Are there any leads on this?

============================================================================
                                  AMI                                     
============================================================================
============================================================================
                       Data Preparation (stage:0)                          
============================================================================
+ dir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ echo 'Downloading annotations...'
Downloading annotations...
+ amiurl=http://groups.inf.ed.ac.uk/ami
+ annotver=ami_public_manual_1.6.1
+ annot=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1
+ logdir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/log
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1.zip ']'
+ wget -nv -O /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1.zip http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_public_manual_1.6.1.zip
+ '[' '!' -d /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations ']'
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations
+ unzip -o -d /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1.zip
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations/AMI-metadata.xml ']'
+ local/ami_xml2text.sh /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
local/ami_xml2text.sh: line 19: [: openjdk version "11.0.9.1" 2020-11-04: integer expression expected
local/ami_xml2text.sh. Java not found. Will download exported version of transcripts.
--2021-02-03 17:12:13--  http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_manual_annotations_v1.6.1_export.gzip
Resolving groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)... 129.215.202.26
Connecting to groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)|129.215.202.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3725858 (3.6M) [application/x-troff-man]
Saving to: '/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/ami_manual_annotations_v1.6.1_export.gzip'

/home/asr/neural_sp_asset 100%[==================================>]   3.55M  1.37MB/s    in 2.6s    

2021-02-03 17:12:16 (1.37 MB/s) - '/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/ami_manual_annotations_v1.6.1_export.gzip' saved [3725858/3725858]

+ wdir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts1 ']'
+ echo 'Preprocessing transcripts...'
Preprocessing transcripts...
+ local/ami_split_segments.pl /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts1 /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_train.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_eval.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_dev.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
Getting CMU dictionary
cat: /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/cmudict/cmudict.0.7a.symbols: No such file or directory
grep: /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/cmudict/cmudict.0.7a: No such file or directory
2021-02-03 17:12:21 URL:http://www.openslr.org/resources/9/wordlist.50k.gz [139334/139334] -> "/home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/wordlist.50k.gz" [1]
cat: /home/asr/neural_sp_assets/preprocessed_data/ami/ihm/train/text: No such file or directory
*Highest-count OOVs are:
Checking /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/silence_phones.txt ...
--> reading /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/silence_phones.txt is OK

Checking /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/optional_silence.txt ...
--> reading /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/optional_silence.txt is OK

Checking /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/nonsilence_phones.txt ...
--> ERROR: /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/nonsilence_phones.txt is empty or not exists
@hirofumi0810
Copy link
Owner

@agarwalchaitanya Could you try to comment out local/ami_prepare_dict.sh (line: 120) in run.sh?

@agarwalchaitanya
Copy link
Author

agarwalchaitanya commented Feb 11, 2021

@agarwalchaitanya Could you try to comment out local/ami_prepare_dict.sh (line: 120) in run.sh?

that helps remove the error but it fails somewhere within stage 0

============================================================================
                                  AMI
============================================================================
============================================================================
                       Data Preparation (stage:0)
============================================================================
+ dir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ echo 'Downloading annotations...'
Downloading annotations...
+ amiurl=http://groups.inf.ed.ac.uk/ami
+ annotver=ami_public_manual_1.6.1
+ annot=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1
+ logdir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/log
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1.zip ']'
+ '[' '!' -d /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations ']'
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations/AMI-metadata.xml ']'
+ local/ami_xml2text.sh /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
local/ami_xml2text.sh: line 19: [: openjdk version "11.0.10" 2021-01-19: integer expression expected
local/ami_xml2text.sh. Java not found. Will download exported version of transcripts.
--2021-02-11 17:16:05--  http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_manual_annotations_v1.6.1_export.gzip
Resolving groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)... 129.215.202.26
Connecting to groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)|129.215.202.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3725858 (3.6M) [application/x-troff-man]
Saving to: '/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/ami_manual_annotations_v1.6.1_export.gzip'

/home/asr/neural_sp_assets/pr 100%[=================================================>]   3.55M  2.49MB/s    in 1.4s

2021-02-11 17:16:07 (2.49 MB/s) - '/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/ami_manual_annotations_v1.6.1_export.gzip' saved [3725858/3725858]

+ wdir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts1 ']'
+ echo 'Preprocessing transcripts...'
Preprocessing transcripts...
+ local/ami_split_segments.pl /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts1 /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_train.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_eval.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_dev.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
sdm
In total, 0 files were found.
Warning: expected 169 data data files, found 0
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train
AMI sdm1 data preparation succeeded.
In total, 0 files were found.
local/ami_sdm_scoring_data_prep.sh. Applying following fixes to segments
s/^AMI_IB4004_SDM_MIO039_0036179_0036400 AMI_IB4004_SDM 361.79 364$/AMI_IB4004_SDM_MIO039_0036179_0036400 AMI_IB4004_SDM 362.28 364/;
convert2stm: Recording-id AMI_ES2011a_SDM not defined in reco2file_and_channel file /home/asr/neural_sp_assets/preprocessed_data/ami/sdm1/dev_orig/reco2file_and_channel at local/convert2stm.pl line 70.
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train
AMI sdm1 scenario and dev set data preparation succeeded.
In total, 0 files were found.
convert2stm: Recording-id AMI_EN2002a_SDM not defined in reco2file_and_channel file /home/asr/neural_sp_assets/preprocessed_data/ami/sdm1/eval_orig/reco2file_and_channel at local/convert2stm.pl line 70.
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train
AMI sdm1 scenario and eval set data preparation succeeded.
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: successfully obtained utterance lengths from sphere-file headers
utils/data/get_utt2dur.sh: computed /home/asr/neural_sp_assets/preprocessed_data/ami/sdm1/train_orig/utt2dur
utils/data/modify_speaker_info.sh: copied data from /home/asr/neural_sp_assets/preprocessed_data/ami/sdm1/train_orig to /home/asr/neural_sp_assets/preprocessed_data/ami/train_sdm1, number of speakers changed from 0 to 0
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants