-
Notifications
You must be signed in to change notification settings - Fork 42
Running stochhmm
StochHMM implements standard HMM, HMM with duration (for viterbi algorithm) and in the future will implement hidden semi-Markov model architectures and algorithms. It grants researchers the power to integrate additional datasets in their HMM to improve predictions. Finally, it adapts HMM algorithms to provide stochastic decoding giving researchers the ability to explore and rank sub-optimal predictions.
To run the StochHMM command-line application you'll need a model file and a sequence.
##Command-line options:
###Required command-line options and files
-model <model file> import model file
-seq <sequence file> import sequence file in fasta format
###Non-stochastic Decoding: Different algorithms available for decoding
-viterbi performs viterbi traceback
-posterior Calculates posterior probabilities
If no output options are supplied, this will return the posterior scores
for all of the states.
-threshold <score>: Return only the States with a GFF_DESC, if they are
greater than or equal to the threshold amount.
-nbest <number of paths> performs n-best viterbi algorithm
###Stochastic Decoding:
-stochastic <Type>
Types:
forward performs stochastic traceback using forward algorithm
viterbi performs stochastic traceback using modified-viterbi algorithm
posterior performs stochastic traceback using posterior algorithm
-rep <number of tracebacks to sample>
###Output options:
-gff prints path in GFF format
-path prints state path according to state number
-label prints state path as labels
-hits prints hit table for multiple tracebacks for all states at each position
A couple example model files have been provided.
3_16_Eddy.hmm - GC rich model from Problem 3.16 of
"Problems and Solutions in Biological Sequence Analysis. M. Borodovsky and S. Ekisheva. Cambridge
Press, UK (2006)"
Dice.hmm - Dishonest Casino Dice model from pg 65 of
"Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids. R Durbin, S.
Eddy, A. Krogh, and G. Mitchison. Cambridge Press, UK (1998)"
GC-skew.hmm - SkewR model for predicting R-loop forming regions in the human genome. See
"Ginno,P.A. et al. (2012) R-loop formation is a distinctive characteristic of unmethylated human
CpG island promoters. Mol. Cell, 45, 814–825."
##Dice Model Examples Each example provides the command used and the output from StochHMM
####Print PATH_LABEL using Viterbi algorithm
#Print Viterbi traceback as State Path Label
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -viterbi -label
>>Eddy Dice TRACK_NAME:TRACK1 Score: -539.062
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
L L L L L L L L L L L L L L L L L L F F F F F F F F F F F F L L L L L L L L L L L L L L L L L L
L L L L L L L L L L L L L L L L F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F L L L L L L L L L L L L L
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F L L L L L L L L L L L L L L L L L L
L F F F F F F F F F F F
####Print GFF using Viterbi algorithm
#Print Viterbi traceback as GFF (Only states with GFF_DESC will be output)
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -viterbi -gff
#Score: -539.062
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 1 48 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 49 66 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 67 78 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 79 112 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 113 179 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 180 192 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 193 270 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 271 289 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 290 300 . + .
####Print state number using Viterbi algorithm
#Print Viterbi traceback as state position in model
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -viterbi -path
>>Eddy Dice TRACK_NAME:TRACK1 Score: -539.062
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 0 0 0
####Print Posterior scores
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -posterior
Posterior Probabilities Table
Model: CASINO DICE MODEL
Sequence: >Eddy Dice TRACK_NAME:TRACK1
Probability of Sequence from Forward: Natural Log'd -516.544812
Probability of Sequence from Backward:Natural Log'd -516.544812
Position FAIR LOADED
1 0.812 0.188
2 0.849 0.151
3 0.861 0.139
4 0.850 0.150
5 0.814 0.186
6 0.738 0.262
...(output truncated)
####Print Posterior decoding as GFF
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -posterior -gff
#Score: 0
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 1 47 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 48 66 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 67 78 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 79 95 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 96 104 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 105 112 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 113 129 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 130 138 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 139 179 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 180 192 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 193 201 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 202 207 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 208 269 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 270 289 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 290 300 . + .
####Print GFF using Stochastic Viterbi algorithm 10 samples Reports the number of times the specific traceback path has occurred during the sampling
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic viterbi -rep 10 -gff
Traceback occurred: 1
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 1 50 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 51 68 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 69 89 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 90 112 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 113 131 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 132 143 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 144 154 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 155 155 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 156 183 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 184 196 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 197 203 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 204 210 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 211 275 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 276 288 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 289 300 . + .
Traceback occurred: 1
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 1 48 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 49 63 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 64 65 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 66 66 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 67 81 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 82 85 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 86 86 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 87 117 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 118 131 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 132 142 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 143 167 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 168 168 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 169 179 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 180 209 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 210 230 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 231 231 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 232 270 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM LOADED 271 288 . + .
Eddy Dice TRACK_NAME:TRACK1 StochHMM FAIR 289 300 . + .
...(output truncated)
####Print Hits table for Stochastic Viterbi sampling
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic viterbi -rep 10 -hits
Position FAIR LOADED
1 10 0
2 10 0
3 10 0
4 10 0
5 10 0
6 8 2
7 6 4
... (truncated output)
$ stochhmm -model ../examples/3_16Eddy.hmm -seq ../example/3_16Eddy.fa -viterbi -gff
$ stochhmm -model ../examples/3_16Eddy.hmm -seq ../example/3_17Eddy.fa -posterior
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic viterbi -rep 10 -label
$ stochhmm -model ../examples/Dice.hmm -seq ../examples/Dice.fa -stochastic posterior -rep 10 -label