Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Kaldi fail to extract features from the non-standard pronunciation of ethnic minorities? #4871

Open
LijingDK opened this issue Sep 11, 2023 · 3 comments

Comments

@LijingDK
Copy link

steps/make_mfcc_pitch.sh --nj 64 --mfcc-config conf/mfcc_hires.conf --cmd run.pl data/test
utils/validate_data_dir.sh: Successfully validated data-directory data/test
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc_pitch.sh: It seems not all of the feature files were successfully procesed (4379 != 4596); consider using utils/fix_data_dir.sh data/test
steps/make_mfcc_pitch.sh: Succeeded creating MFCC and pitch features for test
steps/compute_cmvn_stats.sh data/test
Succeeded creating CMVN stats for test
fix_data_dir.sh: kept 4379 utterances out of 4596
fix_data_dir.sh: old files are kept in data/test/.backup

@LijingDK
Copy link
Author

Every time the number of nj is modified, the number of features that can be generated is also constantly changing. Different numbers of nj have different number of features. Why is this? The audio format is not corrupted

@danpovey
Copy link
Contributor

That command would have created log files that would have warnings about any problems, use

 find . -name '*.log' -mtime -2 --print

as an example command to find such files.
I don't know how you think it's possible that Kaldi would treat recordings of ethnic minorities differently than the ethnic majority-- how would it know? These features just relate to the frequency spectrum.

@LijingDK
Copy link
Author

LijingDK commented Nov 10, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants