Skip to content
jmschrei edited this page Jun 21, 2014 · 4 revisions

YAHMM is a hidden Markov model package for Python, written to be easy to use while still comprehensive in scope and written in Cython for speed. Its major features include an easy to use object model, simple to use silent states, and the ability to specify any distribution type for each individual state, instead of being forced to use the same distribution type for every state.

YAHMM is on pip, which makes installing it a breeze. Simply type in pip install yahmm and you're good to go!

Lets dive in with the Wikipedia HMM example describing what Bob does on rainy and sunny days. The proposed graph structure has two hidden states, Rainy and Sunny, and three emissions, Walk, Shop, and Clean.

>>> from yahmm import *

>>> # Initialize the model object
>>> model = Model( name="Rainy-Sunny" )

>>> # Initialize the two hidden states, with an appropriate discrete distribution
>>> rainy = State( DiscreteDistribution({ 'walk': 0.1, 'shop': 0.4, 'clean': 0.5 }), name='Rainy' )
>>> sunny = State( DiscreteDistribution({ 'walk': 0.6, 'shop': 0.3, 'clean': 0.1 }), name='Sunny' )

>>> # Add the states to the model
>>> model.add_state( rainy )
>>> model.add_state( sunny )

>>> # Now add the two transitions from the start of the model to the hidden states
>>> model.add_transition( model.start, rainy, 0.6 )
>>> model.add_transition( model.start, sunny, 0.4 )

>>> # Add the transitions from the hidden states to each other
>>> model.add_transition( rainy, rainy, 0.7 )
>>> model.add_transition( rainy, sunny, 0.3 )
>>> model.add_transition( sunny, rainy, 0.4 )
>>> model.add_transition( sunny, sunny, 0.6 )

>>> # Finalize the model structure
>>> model.bake( verbose=True )

Creating all models looks similar to this, in that you first initialize the model, then add the states to the model, and then add the transitions. In an infinite HMM such as this, there do not need to be transitions to an explicit end state, but in a finite HMM as we'll see next, you can add transitions to the end. Lastly, you must bake the model, which converts the internal structure from a networkx graph to the sparse matrices that allow for constant time access.

Some useful things the model can do now are calculate the forward matrix, backward matrix, forward-backward emission and transition matrices, calculate the viterbi or maximum a posteriori path, or train on sequences.

>>> day1 = [ 'walk', 'walk', 'shop', 'walk', 'clean', 'walk' ]
>>> day2 = [ 'shop', 'walk', 'clean', 'clean', 'clean' ]
>>> print model.log_probability( day1 )
-6.94068028694
>>> print model.log_probability( day2 )
-5.51357132273
>>> print model.forward( day1 )
[[       -inf        -inf  0.                -inf]
 [-1.42711636 -2.81341072        -inf        -inf]
 [-2.33098457 -4.28308669        -inf        -inf]
 [-3.97720173 -3.94165781        -inf        -inf]
 [-4.58139897 -6.16171209        -inf        -inf]
 [-7.29681647 -5.88309959        -inf        -inf]
 [-7.20149045 -8.4122152         -inf        -inf]]
>>> print model.backward( day1 )
[[-7.33592797 -7.87956493 -7.6638464         -inf]
 [-6.38095741 -6.85804706 -6.67256996        -inf]
 [-5.46075159 -5.50060982 -5.48714645        -inf]
 [-4.20599146 -4.62583218 -4.4654952         -inf]
 [-3.31869616 -3.06080427 -3.13960264        -inf]
 [-1.51412773 -2.16282315 -1.89711998        -inf]
 [-0.51082562 -2.30258509        -inf        -inf]]
>>> trans, ems = model.forward_backward( day1 )
>>> print trans
[[ 1.19239755  0.48046923  0.          0.        ]
 [ 0.52268055  0.23052058  0.          0.        ]
 [ 0.42004498  0.0651686   0.          0.        ]
 [ 0.          0.          0.          0.        ]]
>>> print ems
[[-0.86739348 -2.73077749]
 [-0.85105587 -2.84301622]
 [-1.24251291 -1.62680971]
 [-0.95941485 -2.28183607]
 [-1.87026392 -1.10524245]
 [-0.77163579 -3.77412001]]
>>> print model.viterbi( day1 )
(-8.509453619322063, [
(2, State(Rainy-Sunny-start, None, 305466464)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(1, State(Rainy, DiscreteDistribution({'shop': 0.4, 'clean': 0.5, 'walk': 0.1}), 305466536)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752))])
>>> print model.maximum_a_posteriori( day1 )
(-5.797255350282298, [
(2, State(Rainy-Sunny-start, None, 305466464)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(1, State(Rainy, DiscreteDistribution({'shop': 0.4, 'clean': 0.5, 'walk': 0.1}), 305466536)), 
(0, State(Sunny, DiscreteDistribution({'shop': 0.3, 'clean': 0.1, 'walk': 0.6}), 305466752)), 
(3, State(Rainy-Sunny-end, None, 305466392))])

We can see that both the posterior decoding (maximum a posteriori) and viterbi methods return the same path, which is not unexpected for such a small HMM.

Now lets try using a small, continuous distribution HMM which models ionic current readings from a source which bifurcates between two levels. We could just model the ionic current means, but lets also model the duration using an independent multivariate distribution.

>>> from yahmm import *
>>> model = Model( "stuff" )
>>> high_d = MultivariateDistribution( [ NormalDistribution( 25, 0.5 ), ExponentialDistribution( 2 ) ] )
>>> low_d = MultivariateDistribution( [ NormalDistribution( 10, 0.5 ), ExponentialDistribution( 0.1 ) ] )
>>> s1 = State( high_d, name="High" )
>>> s2 = State( low_d, name="Low" )
>>> model.add_state( s1 )
>>> model.add_state( s2 )
>>> model.add_transition( model.start, s1, 0.9 )
>>> model.add_transition( model.start, s2, 0.1 )
>>> model.add_transition( s1, s1, 0.3 )
>>> model.add_transition( s1, s2, 0.7 )
>>> model.add_transition( s2, s1, 0.2 )
>>> model.add_transition( s2, s2, 0.8 )
>>>
>>> seq = model.sample() 
>>> print seq
[[25.59142420969153, 0.08755865379162744], 
[10.885243661269064, 13.814628095045833], 
[10.377273315125588, 10.705500898626891], 
[10.90843360756541, 28.384614691170952], 
[24.173988910212547, 0.31915020852750725], 
[25.396986380950644, 1.2089825964484309]]
>>> print model.log_probability( seq )
-26.8297631449
>>> model.train( [seq], algorithm='baum-welch' )
Training improvement: 5.97434667321
Training improvement: -7.1054273576e-15
Total Training Improvement:  5.97434667321
>>> print model.log_probability( seq )
-20.8554164717

Here we perform Baum-Welch training on a randomly generated sequence, and see an improvement in the model fitting the sequence. Also to note is that the sequence is now tuples of length two, because the multivariate distribution has two independent features in it. Because of the beauty of dovetyping in Python, all functions work exactly the same with a tuple, string, or float, as long as the underlying distributions know how to handle them.

Every feature implemented in YAHMM is in one of these pages. Please take a look!