Skip to content

ronw/matlab_htk

Repository files navigation

This package contains a set of functions for calling interfacing with
HTK from Matlab.  Right now its mostly limited to training GMMs and
HMMs.  It converts your Matlab data into a format that HTK understands
and calls HTK command line programs.  The path to the HTK binaries is
hardcoded in get_htk_path.m

The portions written by me are distributed under the terms of the GNU
General Public License.  See the file COPYING for details.

Unforuntately this has not been very thoroughly tested, but it has
been working for me.  If anyone wants to write some unit tests using
http://mlunit.sf.net or something similar I won't say no.


* Functions

They are all reasonably commented...

 - The important ones:
   - train_hmm_htk        - use HTK to train an HMM
   - train_gmm_htk        - use HTK to train a GMM
   - train_htk_recognizer - train a simple HTK recognizer
   - eval_htk_recognizer  - recognize signal using HTK recognizer

 - Utility functions:
   - read_htk_hmm     - read in an HTK HMM definition (text format only)
   - write_htk_hmm    - write an HMM structure as an HTK HMM definition
   - htkread/htkwrite - read/write a Matlab matrix in HTK feature file format
                        (written by Mark Hasegawa-Johnson)
   - compose_hmms     - does FSM composition to form a big HMM from many
                        small HMMs (i.e. get an HMM to recognize a
                        sentence from a bunch of phone HMMs).
   - kmeans           - uses k-means to learn clusters from data.  Used to
                        initialize GMMs and HMMs.
   - logsum           - takes the sum of a matrix of log likelihoods
   - get_htk_path     - centralized location to set the path to the HTK binaries
 
* Data Structures

The functions in this toolbox pass around the following structures:
Note: all probabilities are stored as log probabilities

** GMM
  - gmm.nmix   - number of components in the mixture
  - gmm.priors - array of prior log probabilities over each state
  - gmm.means  - matrix of means (column x is mean of component x)
  - gmm.covars - matrix of covariance (column x is the diagonal of the
                 covariance matrix of component x)

** HMM with GMM observations
  - hmm.name          - 
  - hmm.nstates       - number of states in the HMM
  - hmm.emission_type - 'GMM'
  - hmm.start_prob    - array of log probs P(first observation is state x)
  - hmm.end_prob      - array of log probs P(last observation is state x)
  - hmm.transmat      - matrix of transition log probs (transmat(x,y) 
                        = log(P(transition from state x to state y)))
  - hmm.labels        - optional cell array of labels for each state in the HMM
                        (for use in composing HMMs)
  - hmm.gmms          - array of GMM structures 

** HMM with Gaussian observations
  - hmm.nstates       - number of states in the HMM
  - hmm.emission_type - 'gaussian'
  - hmm.start_prob    - array of log probs P(first observation is state x)
  - hmm.end_prob      - array of log probs P(last observation is state x)
  - hmm.transmat      - matrix of transition log probs (transmat(x,y) 
                        = log(P(transition from state x to state y)))
  - hmm.labels        - optional cell array of labels for each state in the HMM
                        (for use in composing HMMs)
  - hmm.means         - matrix of means (column x is mean of state x)
  - hmm.covars        - matrix of means (column x is the diagonal of the
                        covariance matrix of component x)

Note that each row of exp(hmm.transmat) does not necessarily sum to 1
because for each state x there is some probability
(exp(hmm.exit_prob(x))) that the next transition will be to a
non-emitting exit state (i.e. the current observation is the last
observation in the sequence).  The correct invariant is:
sum(exp(hmm.transmat, 2)) + exp(hmm.exit_prob) == ones(hmm.nstates, 1)


2007-11-06 Ron Weiss <ronw@ee.columbia.edu>

About

MATLAB functions that interface with the HTK Speech Recognition Toolkit (http://htk.eng.cam.ac.uk/) for training HMMs, GMMs and simple speech recognizers.

Resources

License

Stars

Watchers

Forks

Packages

No packages published