Skip to content

Example Models and Sequences

Paul Lott edited this page Jul 29, 2013 · 11 revisions

##Dice Model (Dishonest Casino Model) Dice model from Pg. 65 of R Durbin, S.Eddy, A. Krogh, and G. Mitchison.,"Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids." Cambridge Press, UK (1998)

#STOCHHMM MODEL FILE
MODEL INFORMATION
======================================================
MODEL_NAME:	CASINO DICE MODEL
MODEL_DESCRIPTION:	Taken from CH3 Durbin/Eddy
MODEL_CREATION_DATE:	August 28,2009

TRACK SYMBOL DEFINITIONS
======================================================
DICE:	1,2,3,4,5,6

STATE DEFINITIONS
#############################################
STATE:	
	NAME:	INIT
TRANSITION:	STANDARD: P(X)
	FAIR:	0.5
	LOADED:	0.5
#############################################
STATE:	
	NAME:	FAIR
	PATH_LABEL:	F
	GFF_DESC:	FAIR
TRANSITION:	STANDARD: P(X)
	FAIR:	0.95
	LOADED:	0.05
	END:	1
EMISSION:	DICE: P(X)
	ORDER:	0
@1	2	3	4	5	6
0.167	0.167	0.167	0.167	0.167	0.167
#############################################
STATE:
	NAME:	LOADED
	PATH_LABEL:	L
	GFF_DESC:	LOADED
TRANSITION:	STANDARD: P(X)
	FAIR:	0.1
	LOADED:	0.9
	END:	1
EMISSION:	DICE: P(X)
	ORDER:	0
@1	2	3	4	5	6	
0.1	0.1	0.1	0.1	0.1	0.5
#############################################
//END

##GC-rich Sequence Model Created from Problem 3.16 of M. Borodovsky and S. Ekisheva., "Problems and Solutions in Biological Sequence Analysis.", Cambridge Press, UK (2006)

#STOCHHMM MODEL FILE

<MODEL INFORMATION>
======================================================
MODEL_NAME:	GC Model
MODEL_DESCRIPTION:	Taken from Problem 3.16 from Problems and Solutions in Biological Sequence Analysis
MODEL_CREATION_DATE:	August 28,2009


<TRACK SYMBOL DEFINITIONS>
======================================================
TRACK1:	A,C,G,T

<STATE DEFINITIONS>
####################################################################################################
STATE:	
	NAME:	INIT
TRANSITION:	STANDARD:	P(X)
	H:	0.5
	L:	0.5

####################################################################################################
STATE:	
	NAME:	H
	GFF_DESC:	HIGH
	PATH_LABEL:	H
TRANSITION:	STANDARD:	P(X)
	H:	0.5
	L:	0.5
	END:	1
EMISSION:	TRACK1: P(X)
	ORDER:	0
0.2	0.3	0.3	0.2

####################################################################################################
STATE:
	NAME:	L
	GFF_DESC:	LOW
	PATH_LABEL:	L
TRANSITION:	STANDARD:	P(X)
	H:	0.4
	L:	0.6
	END:	1
EMISSION:	TRACK1: P(X)
	ORDER:	0
0.3	0.2	0.2	0.3

####################################################################################################
//END

##SkewR model for R-loop This model was created in collaboration with Paul Ginno to predict R-loop regions with in the Human genome. This is the most stringent model(1M).

[Ginno,P.A. et al. (2012) R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell, 45, 814–825."] (http://www.sciencedirect.com/science/article/pii/S1097276512000834)

Ginno,P.A. et al. (2013) GC skew at the 5‘ and 3’ ends of human genes links R-loop formation to epigenetic regulation and transcription termination. Genome Res.

#STOCHHMM MODEL FILE
MODEL INFORMATION
======================================================
MODEL_NAME:	Paul Ginno GC skew 1M COUNT CG_RICH
DESCRIPTION:	Model for calculating posterior of GC skew

<TRACK SYMBOL DEFINITIONS>
======================================================
SEQ:	A,C,G,T 

<AMBIGUOUS SYMBOL DEFINITIONS>
======================================================
SEQ: N[A,C,G,T]

<STATE DEFINITIONS>
#########################################################
STATE:
	NAME:	INIT
TRANSITION:	STANDARD:	P(X)
	GENOMIC	1
	CG_RICH	1
	CSKEW	1
	GSKEW	1
#########################################################
STATE:
	NAME:	GENOMIC
	PATH_LABEL:	N
TRANSITION:	STANDARD: P(X)
	GENOMIC	0.999998
	CG_RICH	6.6666666664883e-07
	CSKEW	6.6666666664883e-07
	GSKEW	6.6666666664883e-07
	END	1
EMISSION:	SEQ:	COUNTS
	ORDER:	3	AMBIGUOUS:	AVG
88461993	31994903	41425629	57658048	
34832802	18391293	3375279	26668810	
38652196	20900775	26295546	28125122	
42612733	23294327	33658973	42849422	
31851997	22768915	30005254	30550314	
24352100	16618422	2865958	22554951	
3184227	3240156	3617699	4288328	
18787595	19796085	25185015	28125122	
44760010	21408642	32081751	28052107	
27680707	23075355	3501082	25569772	
31701523	24998899	22131107	22554951	
21575141	17141870	26507998	26668810	
38295527	19208192	20648715	39623438	
26501102	18783333	2902270	28052107	
29725570	19394463	25211907	30550314	
33951341	26611001	24195052	57658048	
41514114	17861852	24635147	24195052	
32018045	21804407	5463072	26507998	
32168411	26661194	31663512	25185015	
23264294	20628396	27330588	33658973	
25343414	23062038	31678489	25211907	
29802042	17638161	5454122	22131107	
3451414	4091859	4573566	3617699	
16276749	27150674	31663512	26295546	
3098337	1965955	4620146	2902270	
3240341	5382230	1443462	3501082	
3357642	4056814	5454122	2865958	
2484485	3007576	5463072	3375279	
22242588	15254053	15518076	20648715	
31351417	28041207	4620146	32081751	
28886192	25108193	31678489	30005254	
21223717	26689123	24635147	41425629	
42565738	16740273	26689123	26611001	
21230408	12546462	3007576	17141870	
31947926	17199837	27150674	19796085	
18091493	14224710	20628396	23294327	
21715586	16010179	25108193	19394463	
23023278	15829429	4056814	24998899	
2937113	3297986	4091859	3240156	
15065105	17199837	26661194	20900775	
28697398	12595066	28041207	18783333	
22799304	16651536	5382230	23075355	
24939410	15829429	17638161	16618422	
13649267	12546462	21804407	18391293	
20265487	10076378	15254053	19208192	
17956652	12595066	1965955	21408642	
23952390	16010179	23062038	22768915	
16671157	16740273	17861852	31994903	
46998722	16671157	21223717	33951341	
27095219	13649267	2484485	21575141	
23533979	15065105	16276749	18787595	
33807354	18091493	23264294	42612733	
29295166	23952390	28886192	29725570	
28118426	24939410	3357642	31701523	
3013954	2937113	3451414	3184227	
23533979	31947926	32168411	38652196	
36050145	17956652	31351417	26501102	
28508070	22799304	3240341	27680707	
28118426	23023278	29802042	24352100	
27095219	21230408	32018045	34832802	
38041332	20265487	22242588	38295527	
36050145	28697398	3098337	44760010	
29295166	21715586	25343414	31851997	
46998722	42565738	41514114	88461993

#########################################################
STATE:
	NAME:	CG_RICH
	PATH_LABEL: R
TRANSITION:	STANDARD:	P(X)
	GENOMIC	0.001
	CG_RICH	0.999
	END	1
EMISSION:	SEQ:	COUNTS
	ORDER:	3	AMBIGUOUS:	AVG
323461993	228994903	331425629	231658048	
239832802	225391293	54375279	171668810	
296652196	326900775	449295546	210125122	
146612733	165294327	232658973	114849422	
220851997	381768915	518005254	207550314	
355352100	539618422	113865958	354554951	
57184227	103240156	135617699	118288328	
91787595	290796085	361185015	210125122	
356760010	337408642	605081751	230052107	
493680707	706075355	125501082	473569772	
641701523	687998899	904131107	354554951	
126575141	268141870	387507998	171668810	
120295527	108208192	105648715	115623438	
187501102	272783333	58902270	230052107	
216725570	246394463	354211907	207550314	
101951341	170611001	155195052	231658048	
305514114	183861852	345635147	155195052	
585018045	656804407	187463072	387507998	
599168411	815661194	1032663512	361185015	
106264294	355628396	329330588	232658973	
348343414	672062038	1024678489	354211907	
984802042	988638161	291454122	904131107	
126451414	210091859	266573566	135617699	
214276749	891150674	1032663512	449295546	
47098337	70965955	158620146	58902270	
126240341	222382230	59443462	125501082	
104357642	229056814	291454122	113865958	
36484485	136007576	187463072	54375279	
141242588	169254053	183518076	105648715	
548351417	866041207	158620146	605081751	
507886192	757108193	1024678489	518005254	
118223717	489689123	345635147	331425629	
302565738	184740273	489689123	170611001	
332230408	327546462	136007576	268141870	
475947926	520199837	891150674	290796085	
83091493	146224710	355628396	165294327	
233715586	408010179	757108193	246394463	
590023278	853829429	229056814	687998899	
92937113	129297986	210091859	103240156	
136065105	520199837	815661194	326900775	
477697398	390595066	866041207	272783333	
601799304	828651536	222382230	706075355	
788939410	853829429	988638161	539618422	
151649267	327546462	656804407	225391293	
107265487	102076378	169254053	108208192	
263956652	390595066	70965955	337408642	
354952390	408010179	672062038	381768915	
93671157	184740273	183861852	228994903	
182998722	93671157	118223717	101951341	
172095219	151649267	36484485	126575141	
157533979	136065105	214276749	91787595	
113807354	83091493	106264294	146612733	
186295166	354952390	507886192	216725570	
470118426	788939410	104357642	641701523	
59013954	92937113	126451414	57184227	
157533979	475947926	599168411	296652196	
265050145	263956652	548351417	187501102	
424508070	601799304	126240341	493680707	
470118426	590023278	984802042	355352100	
172095219	332230408	585018045	239832802	
128041332	107265487	141242588	120295527	
265050145	477697398	47098337	356760010	
186295166	233715586	348343414	220851997	
182998722	302565738	305514114	323461993

#########################################################
STATE:
	NAME:	CSKEW
	PATH_LABEL:	C
	GFF_DESC:	C_SKEW
TRANSITION:	STANDARD:	P(X)
	GENOMIC	0.002
	CSKEW	0.998
	END	1
EMISSION:	SEQ:	COUNTS
	ORDER:	2
41	58	25	35
78	113	19	39
40	63	34	34
19	41	20	13
60	128	92	34
120	199	63	98
19	76	45	15
35	90	58	40
39	29	30	13
76	80	47	47
34	53	25	20
22	24	23	17
19	34	24	11
40	88	26	39
18	58	29	17
12	38	21	18

#########################################################
STATE:
	NAME:	GSKEW
	PATH_LABEL:	G
	GFF_DESC:	G_SKEW
TRANSITION:	STANDARD:	P(X)
	GENOMIC	0.002
	GSKEW	0.998
	END	1
EMISSION:	SEQ:	COUNTS
	ORDER:	2
18	17	40	13	
17	20	15	34	
39	47	98	39	
11	13	34	35	
21	23	58	20	
29	25	45	34	
26	47	63	19	
24	30	92	25	
38	24	90	41	
58	53	76	63	
88	80	199	113	
34	29	128	58	
12	22	35	19	
18	34	19	40	
40	76	120	78	
19	39	60	41

#########################################################
//END