Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing documentation: Import of a custom kaldi model #39

Open
JohnDoe02 opened this issue Oct 21, 2020 · 9 comments
Open

Missing documentation: Import of a custom kaldi model #39

JohnDoe02 opened this issue Oct 21, 2020 · 9 comments

Comments

@JohnDoe02
Copy link

What steps are necessary to import a custom kaldi model (trained from scratch, not transfer-learned as in #33) into KAG?

In the readme it is currently stated that:

Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.

What steps are necessary to kick off the mentioned partial implementation for automatic conversion?
What steps remain to be carried out by the user?

@daanzu
Copy link
Owner

daanzu commented Oct 23, 2020

How much work it takes depends on the model's configuration (perhaps unsurprising since Kaldi is so configurable). If you are performing the training with the intent to use it with KaldiAG, it can be made quite easy. It's been awhile since I converted the Zamia model, so I may be forgetting something, but as I recall...

  • If you do the training starting with the lexicon/phones from one of my models, I think you should be able to just copy in the relevant model files from your trained model, overwriting the ones in my published model, just as you have for the fine tuning.
  • If you are using the same phones, but a different lexicon, it's a bit more complicated but you should be able to massage the words file to fit. However, there are currently a few hard coded constants in KaldiAG for words and phones.
  • For different phones, it is similar but more work. I haven't attempted to do this conversion yet.

FWIW, the unfinished and untested converter is in model.py: see convert_generic_model_to_agf().

@JohnDoe02
Copy link
Author

JohnDoe02 commented Oct 23, 2020

Got it to work! Turns out I only had forgotten to rename splice_opts to splice.conf. Furthermore one also has to add a linebreak in said file as otherwise the parsing will make KAG crash. But that was it.

Just for the record: I used the phone set of your daanzu_20200905 model for my training, but added a number of words to the lexicon. However, I believe this alone has no impact on integration with KAG as long as I don't need those extra words for dictation. They simply live in my user_lexicon.txt

@JohnDoe02
Copy link
Author

JohnDoe02 commented Oct 23, 2020

So I was too fast. While everything that uses non-dictation commands works like a charm, dictation is broken. I only get garbage, nothing that's in any way related to what I said. Looks indeed as some ids don't fit.

However, I do not really have an idea what's the root cause. This is what I am using for creating the model dir:

cp -r kaldi_model final_model
cp conf/mfcc.conf final_model/conf
cp conf/mfcc_hires.conf final_model/conf
cp conf/online_cmvn.conf final_model/conf
cp exp/nnet3_cleaned/extractor/splice_opts final_model/conf/splice.conf
cp exp/nnet3_cleaned/ivectors_jd_ls_100_clean_sp_hires/conf/ivector_extractor.conf final_model/conf

cp exp/nnet3_cleaned/extractor/final.* final_model/ivector_extractor
cp exp/nnet3_cleaned/extractor/global_cmvn.stats final_model/ivector_extractor

cp exp/chain_cleaned/tdnn_1d_sp/final.mdl final_model/
cp exp/chain_cleaned/tdnn_1d_sp/tree final_model/

@daanzu
Copy link
Owner

daanzu commented Oct 24, 2020

@JohnDoe02 Ah, I forgot about the dictation FST! You will need to re-compile it using your new .mdl file. Try:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m kaldi_model_dir/G.fst -v

@JohnDoe02
Copy link
Author

Just for reference:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m final_model/ -v

did the trick.

@widdiot
Copy link

widdiot commented Mar 8, 2021

Could you please provide the general steps to adapt kaldi models (trained for language other than english) ?

@SwimmingTiger
Copy link

SwimmingTiger commented Apr 17, 2021

I want to convert any of the following Chinese Mandarin models to compatible with KAG. Thanks for any help or documentation.

I have no experience with Kaldi. Currently the only one environment I can run is from kaldi-dragonfly-winpython37.zip. After getting the available models, I will develop my application with dragonfly.

And I know something about CMUSphinx. I tried Sphinx4 and found that it lacked some features I needed. So I switched to dragonfly/KAG. The English model in kaldi-dragonfly-winpython37.zip perfectly meets my needs, but my program needs to support more languages, especially Chinese.

@daanzu
Copy link
Owner

daanzu commented Apr 19, 2021

Similar discussion in #21.

@lormaechea
Copy link

If it can still be of help/interest to anyone, I have been recently working on importing my own French custom models into KAG. After testing them, I have found them to be well-performing and functional, although I would still need to check some configurations to improve the WER%.

To do this, I first performed an acoustic training (HMM-DNN nnet3 chain models) with Kaldi based on 1000h of French speech. Once it was done, I created a folder to dump my KAG custom model in:

KAG_DIR="kag_model"
mkdir -p ${KAG_DIR}

And I subsequently copied the files coming from my training (as pointed out by @JohnDoe02). In my case:

cp conf/mfcc.conf ${KAG_DIR}/conf
cp conf/mfcc_hires.conf ${KAG_DIR}/conf
cp conf/online_cmvn.conf $AG_DIR}/conf

cp exp/nnet3/extractor/splice_opts ${KAG_DIR}/conf/splice.conf
cp exp/nnet3/ivectors_train_nodup_sp/conf/ivector_extractor.conf ${KAG_DIR}/conf

cp -r exp/nnet3/extractor/final.* ${KAG_DIR}/ivector_extractor/
cp exp/nnet3/extractor/global_cmvn.stats ${KAG_DIR}/ivector_extractor/

cp exp/chain/tdnn_ceos_sp_online/final.mdl ${KAG_DIR}/
cp exp/chain/tdnn_ceos_sp_online/tree ${KAG_DIR}/

Once this was done, I proceeded to compile my language model. To make it work with KAG, I had to deal with the KAG hard coded constants for words and phones. To resolve this, it is necessary to add the list of nonterminals.txt used in KAG (it can be found on any of the available models) to the folder where my pronunciation models are located:

cp nonterminals.txt ${LEXICON_DIR}/dict

I later run the data preparation with Kaldi:

./utils/prepare_lang.sh <dict-src-dir> <oov-dict-entry> <tmp-dir> <lang-dir>

Once this process is finished, we can copy the following files to the folder that will contain our KAG model:

cp ${LANG_DIR}/G.fst ${KAG_DIR}

cp ${LANG_DIR}/words.txt ${KAG_DIR}/words.txt
cp ${LANG_DIR}/words.txt ${KAG_DIR}/words.base.txt # Same as previous file
cp $${LANG_DIR}/words.txt ${KAG_DIR}/words.nonterm.txt # Just including nonterminals

cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}
cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}/align_lexicon.base.int # Same as previous file
cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}/align_lexicon.nonterm.int # Just including nonterminals

cp ${LANG_DIR}/phones/disambig.int ${KAG_DIR}
cp ${LANG_DIR}/phones/left_context_phones.txt ${KAG_DIR}
cp -r ${LANG_DIR}/phones/wdisambig_* ${KAG_DIR}

cp ${LANG_DIR}/phones.txt ${KAG_DIR}
cp ${LANG_DIR}/phones.txt ${KAG_DIR}/phones.nonterm.txt # Just including nonterminals

cp ${LEXICON_DIR}/L_disambig.fst
cp ${LEXICON_DIR}/dict/lexicon.txt ${KAG_DIR}
cp ${LEXICON_DIR}/dict/lexiconp.txt ${KAG_DIR}

cp ${LEXICON_DIR}/tmp/lexiconp_disambig.txt ${KAG_DIR}
cp ${LEXICON_DIR}/tmp/lexiconp_disambig.txt ${KAG_DIR}/lexiconp_disambig.base.txt # Same as previous file

touch user_lexicon.txt # Initially empty

Finally, the dictation graph is compiled with the following command:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m kag_model/ -v

In this way, I managed to create a custom KAG model for French. I hope it can be of any help...

In any case, once convert_generic_model_to_agf() is finished, I am sure the procedure will be much easier.

Lucía

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants