Skip to content
Paul Lott edited this page Jul 29, 2013 · 23 revisions

StochHMM implements standard HMM, HMM with duration (for viterbi algorithm) and in the future will implement hidden semi-Markov model architectures and algorithms. It grants researchers the power to integrate additional datasets in their HMM to improve predictions. Finally, it adapts HMM algorithms to provide stochastic decoding giving researchers the ability to explore and rank sub-optimal predictions.

To run the StochHMM command-line application you'll need a model file and a sequence.

##Command-line options:

###Required command-line options and files

-model <model file>	import model file
-seq <sequence file>	import sequence file in fasta format

###Non-stochastic Decoding: Different algorithms available for decoding

-viterbi	performs viterbi traceback
-posterior	Calculates posterior probabilities
		If no output options are supplied, this will return the posterior scores
		for all of the states.

	-threshold <score>: Return only the States with a GFF_DESC, if they are
			greater than or equal to the threshold amount.

-nbest <number of paths> 		performs n-best viterbi algorithm

###Stochastic Decoding:

-stochastic <Type> 
	Types:
		forward		performs stochastic traceback using forward algorithm
		viterbi		performs stochastic traceback using modified-viterbi algorithm
		posterior	performs stochastic traceback using posterior algorithm
-rep  <number of tracebacks to sample>

###Output options:

-gff			prints path in GFF format
-path			prints state path according to state number
-label			prints state path as labels
-hits			prints hit table for multiple tracebacks for all states at each position

A couple example model files have been provided.

3_16_Eddy.hmm - GC rich model from Problem 3.16 of 
	"Problems and Solutions in Biological Sequence Analysis. M. Borodovsky and S. Ekisheva. Cambridge
	 Press, UK (2006)"

Dice.hmm - Dishonest Casino Dice model from pg 65 of 
	"Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids. R Durbin, S.
	 Eddy, A. Krogh, and G. Mitchison. Cambridge Press, UK (1998)"

GC-skew.hmm - SkewR model for predicting R-loop forming regions in the human genome. See 
	"Ginno,P.A. et al. (2012) R-loop formation is a distinctive characteristic of unmethylated human
	 CpG island promoters. Mol. Cell, 45, 814–825."

###Dice Model Examples

####Print PATH_LABEL using Viterbi algorithm

#Print Viterbi traceback as State Path Label
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -viterbi -label
>>Eddy Dice TRACK_NAME:TRACK1	Score: -539.062
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F 
L L L L L L L L L L L L L L L L L L F F F F F F F F F F F F L L L L L L L L L L L L L L L L L L 
L L L L L L L L L L L L L L L L F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F 
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F L L L L L L L L L L L L L 
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F 
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F L L L L L L L L L L L L L L L L L L 
L F F F F F F F F F F F

####Print GFF using Viterbi algorithm

#Print Viterbi traceback as GFF (Only states with GFF_DESC will be output)
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -viterbi -gff
#Score: -539.062
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	1	48	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	49	66	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	67	78	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	79	112	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	113	179	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	180	192	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	193	270	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	271	289	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	290	300	.	+	.

####Print state number using Viterbi algorithm

#Print Viterbi traceback as state position in model
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -viterbi -path
>>Eddy Dice TRACK_NAME:TRACK1	Score: -539.062
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1 0 0 0 0 0 0 0 0 0 0 0 

####Print Posterior scores

$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -posterior

Posterior Probabilities Table
Model:	CASINO DICE MODEL
Sequence:	>Eddy Dice TRACK_NAME:TRACK1
Probability of Sequence from Forward: Natural Log'd	-516.544812
Probability of Sequence from Backward:Natural Log'd	-516.544812
Position	FAIR	LOADED
1	0.812	0.188
2	0.849	0.151
3	0.861	0.139
4	0.850	0.150
5	0.814	0.186
6	0.738	0.262
...(output truncated)

####Print GFF using Stochastic Viterbi algorithm 10 samples Reports the number of times the specific traceback path has occurred during the sampling

$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic viterbi -rep 10 -gff
Traceback occurred:	 1
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	1	50	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	51	68	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	69	89	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	90	112	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	113	131	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	132	143	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	144	154	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	155	155	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	156	183	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	184	196	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	197	203	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	204	210	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	211	275	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	276	288	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	289	300	.	+	.

Traceback occurred:	 1
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	1	48	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	49	63	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	64	65	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	66	66	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	67	81	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	82	85	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	86	86	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	87	117	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	118	131	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	132	142	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	143	167	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	168	168	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	169	179	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	180	209	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	210	230	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	231	231	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	232	270	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	LOADED	271	288	.	+	.
Eddy Dice TRACK_NAME:TRACK1	StochHMM	FAIR	289	300	.	+	.

...(output truncated)
$ stochhmm -model ../examples/3_16Eddy.hmm -seq ../example/3_16Eddy.fa -viterbi -gff
$ stochhmm -model ../examples/3_16Eddy.hmm -seq ../example/3_17Eddy.fa -posterior
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic viterbi -rep 10 -label
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic posterior -rep 10 -label