This repository contains Kaldi recipes for training models using the Finnish Parliament ASR corpus.
In addition to Kaldi, the recipes rely on three external tools to do subword tokenization and language modeling.
VariKN is used to do n-gram language modeling. For download and installation, see the VariKN Github.
SentencePiece is used for subword tokenization. For download and installation, see the SentencePiece Github.
Subword-kaldi is included as a git submodule in this repository. To get the submodule, run
git submodule init
git submodule update
See this separate repository for the SpeechBrain models.