Example Models and Sequences
##Dice Model (Dishonest Casino Model) Dice model from Pg. 65 of R Durbin, S.Eddy, A. Krogh, and G. Mitchison.,"Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids." Cambridge Press, UK (1998)
#STOCHHMM MODEL FILE
MODEL INFORMATION
======================================================
MODEL_NAME: CASINO DICE MODEL
MODEL_DESCRIPTION: Taken from CH3 Durbin/Eddy
MODEL_CREATION_DATE: August 28,2009
TRACK SYMBOL DEFINITIONS
======================================================
DICE: 1,2,3,4,5,6
STATE DEFINITIONS
#############################################
STATE:
NAME: INIT
TRANSITION: STANDARD: P(X)
FAIR: 0.5
LOADED: 0.5
#############################################
STATE:
NAME: FAIR
PATH_LABEL: F
GFF_DESC: FAIR
TRANSITION: STANDARD: P(X)
FAIR: 0.95
LOADED: 0.05
END: 1
EMISSION: DICE: P(X)
ORDER: 0
@1 2 3 4 5 6
0.167 0.167 0.167 0.167 0.167 0.167
#############################################
STATE:
NAME: LOADED
PATH_LABEL: L
GFF_DESC: LOADED
TRANSITION: STANDARD: P(X)
FAIR: 0.1
LOADED: 0.9
END: 1
EMISSION: DICE: P(X)
ORDER: 0
@1 2 3 4 5 6
0.1 0.1 0.1 0.1 0.1 0.5
#############################################
//END
##GC-rich Sequence Model Created from Problem 3.16 of M. Borodovsky and S. Ekisheva., "Problems and Solutions in Biological Sequence Analysis.", Cambridge Press, UK (2006)
#STOCHHMM MODEL FILE
<MODEL INFORMATION>
======================================================
MODEL_NAME: GC Model
MODEL_DESCRIPTION: Taken from Problem 3.16 from Problems and Solutions in Biological Sequence Analysis
MODEL_CREATION_DATE: August 28,2009
<TRACK SYMBOL DEFINITIONS>
======================================================
TRACK1: A,C,G,T
<STATE DEFINITIONS>
####################################################################################################
STATE:
NAME: INIT
TRANSITION: STANDARD: P(X)
H: 0.5
L: 0.5
####################################################################################################
STATE:
NAME: H
GFF_DESC: HIGH
PATH_LABEL: H
TRANSITION: STANDARD: P(X)
H: 0.5
L: 0.5
END: 1
EMISSION: TRACK1: P(X)
ORDER: 0
0.2 0.3 0.3 0.2
####################################################################################################
STATE:
NAME: L
GFF_DESC: LOW
PATH_LABEL: L
TRANSITION: STANDARD: P(X)
H: 0.4
L: 0.6
END: 1
EMISSION: TRACK1: P(X)
ORDER: 0
0.3 0.2 0.2 0.3
####################################################################################################
//END
##SkewR model for R-loop This model was created in collaboration with Paul Ginno to predict R-loop regions with in the Human genome. This is the most stringent model(1M).
[Ginno,P.A. et al. (2012) R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell, 45, 814–825."] (http://www.sciencedirect.com/science/article/pii/S1097276512000834)
#STOCHHMM MODEL FILE
MODEL INFORMATION
======================================================
MODEL_NAME: Paul Ginno GC skew 1M COUNT CG_RICH
DESCRIPTION: Model for calculating posterior of GC skew
<TRACK SYMBOL DEFINITIONS>
======================================================
SEQ: A,C,G,T
<AMBIGUOUS SYMBOL DEFINITIONS>
======================================================
SEQ: N[A,C,G,T]
<STATE DEFINITIONS>
#########################################################
STATE:
NAME: INIT
TRANSITION: STANDARD: P(X)
GENOMIC 1
CG_RICH 1
CSKEW 1
GSKEW 1
#########################################################
STATE:
NAME: GENOMIC
PATH_LABEL: N
TRANSITION: STANDARD: P(X)
GENOMIC 0.999998
CG_RICH 6.6666666664883e-07
CSKEW 6.6666666664883e-07
GSKEW 6.6666666664883e-07
END 1
EMISSION: SEQ: COUNTS
ORDER: 3 AMBIGUOUS: AVG
88461993 31994903 41425629 57658048
34832802 18391293 3375279 26668810
38652196 20900775 26295546 28125122
42612733 23294327 33658973 42849422
31851997 22768915 30005254 30550314
24352100 16618422 2865958 22554951
3184227 3240156 3617699 4288328
18787595 19796085 25185015 28125122
44760010 21408642 32081751 28052107
27680707 23075355 3501082 25569772
31701523 24998899 22131107 22554951
21575141 17141870 26507998 26668810
38295527 19208192 20648715 39623438
26501102 18783333 2902270 28052107
29725570 19394463 25211907 30550314
33951341 26611001 24195052 57658048
41514114 17861852 24635147 24195052
32018045 21804407 5463072 26507998
32168411 26661194 31663512 25185015
23264294 20628396 27330588 33658973
25343414 23062038 31678489 25211907
29802042 17638161 5454122 22131107
3451414 4091859 4573566 3617699
16276749 27150674 31663512 26295546
3098337 1965955 4620146 2902270
3240341 5382230 1443462 3501082
3357642 4056814 5454122 2865958
2484485 3007576 5463072 3375279
22242588 15254053 15518076 20648715
31351417 28041207 4620146 32081751
28886192 25108193 31678489 30005254
21223717 26689123 24635147 41425629
42565738 16740273 26689123 26611001
21230408 12546462 3007576 17141870
31947926 17199837 27150674 19796085
18091493 14224710 20628396 23294327
21715586 16010179 25108193 19394463
23023278 15829429 4056814 24998899
2937113 3297986 4091859 3240156
15065105 17199837 26661194 20900775
28697398 12595066 28041207 18783333
22799304 16651536 5382230 23075355
24939410 15829429 17638161 16618422
13649267 12546462 21804407 18391293
20265487 10076378 15254053 19208192
17956652 12595066 1965955 21408642
23952390 16010179 23062038 22768915
16671157 16740273 17861852 31994903
46998722 16671157 21223717 33951341
27095219 13649267 2484485 21575141
23533979 15065105 16276749 18787595
33807354 18091493 23264294 42612733
29295166 23952390 28886192 29725570
28118426 24939410 3357642 31701523
3013954 2937113 3451414 3184227
23533979 31947926 32168411 38652196
36050145 17956652 31351417 26501102
28508070 22799304 3240341 27680707
28118426 23023278 29802042 24352100
27095219 21230408 32018045 34832802
38041332 20265487 22242588 38295527
36050145 28697398 3098337 44760010
29295166 21715586 25343414 31851997
46998722 42565738 41514114 88461993
#########################################################
STATE:
NAME: CG_RICH
PATH_LABEL: R
TRANSITION: STANDARD: P(X)
GENOMIC 0.001
CG_RICH 0.999
END 1
EMISSION: SEQ: COUNTS
ORDER: 3 AMBIGUOUS: AVG
323461993 228994903 331425629 231658048
239832802 225391293 54375279 171668810
296652196 326900775 449295546 210125122
146612733 165294327 232658973 114849422
220851997 381768915 518005254 207550314
355352100 539618422 113865958 354554951
57184227 103240156 135617699 118288328
91787595 290796085 361185015 210125122
356760010 337408642 605081751 230052107
493680707 706075355 125501082 473569772
641701523 687998899 904131107 354554951
126575141 268141870 387507998 171668810
120295527 108208192 105648715 115623438
187501102 272783333 58902270 230052107
216725570 246394463 354211907 207550314
101951341 170611001 155195052 231658048
305514114 183861852 345635147 155195052
585018045 656804407 187463072 387507998
599168411 815661194 1032663512 361185015
106264294 355628396 329330588 232658973
348343414 672062038 1024678489 354211907
984802042 988638161 291454122 904131107
126451414 210091859 266573566 135617699
214276749 891150674 1032663512 449295546
47098337 70965955 158620146 58902270
126240341 222382230 59443462 125501082
104357642 229056814 291454122 113865958
36484485 136007576 187463072 54375279
141242588 169254053 183518076 105648715
548351417 866041207 158620146 605081751
507886192 757108193 1024678489 518005254
118223717 489689123 345635147 331425629
302565738 184740273 489689123 170611001
332230408 327546462 136007576 268141870
475947926 520199837 891150674 290796085
83091493 146224710 355628396 165294327
233715586 408010179 757108193 246394463
590023278 853829429 229056814 687998899
92937113 129297986 210091859 103240156
136065105 520199837 815661194 326900775
477697398 390595066 866041207 272783333
601799304 828651536 222382230 706075355
788939410 853829429 988638161 539618422
151649267 327546462 656804407 225391293
107265487 102076378 169254053 108208192
263956652 390595066 70965955 337408642
354952390 408010179 672062038 381768915
93671157 184740273 183861852 228994903
182998722 93671157 118223717 101951341
172095219 151649267 36484485 126575141
157533979 136065105 214276749 91787595
113807354 83091493 106264294 146612733
186295166 354952390 507886192 216725570
470118426 788939410 104357642 641701523
59013954 92937113 126451414 57184227
157533979 475947926 599168411 296652196
265050145 263956652 548351417 187501102
424508070 601799304 126240341 493680707
470118426 590023278 984802042 355352100
172095219 332230408 585018045 239832802
128041332 107265487 141242588 120295527
265050145 477697398 47098337 356760010
186295166 233715586 348343414 220851997
182998722 302565738 305514114 323461993
#########################################################
STATE:
NAME: CSKEW
PATH_LABEL: C
GFF_DESC: C_SKEW
TRANSITION: STANDARD: P(X)
GENOMIC 0.002
CSKEW 0.998
END 1
EMISSION: SEQ: COUNTS
ORDER: 2
41 58 25 35
78 113 19 39
40 63 34 34
19 41 20 13
60 128 92 34
120 199 63 98
19 76 45 15
35 90 58 40
39 29 30 13
76 80 47 47
34 53 25 20
22 24 23 17
19 34 24 11
40 88 26 39
18 58 29 17
12 38 21 18
#########################################################
STATE:
NAME: GSKEW
PATH_LABEL: G
GFF_DESC: G_SKEW
TRANSITION: STANDARD: P(X)
GENOMIC 0.002
GSKEW 0.998
END 1
EMISSION: SEQ: COUNTS
ORDER: 2
18 17 40 13
17 20 15 34
39 47 98 39
11 13 34 35
21 23 58 20
29 25 45 34
26 47 63 19
24 30 92 25
38 24 90 41
58 53 76 63
88 80 199 113
34 29 128 58
12 22 35 19
18 34 19 40
40 76 120 78
19 39 60 41
#########################################################
//END