Skip to content
Paul Lott edited this page Jan 23, 2013 · 5 revisions

Transitions can have 3 types (STANDARD, LEXICAL, DURATION).

##STANDARD Standard transitions have a fixed probability associated with the transition. We define standard transition as shown below.

TRANSITION:	STANDARD:		P(X)
	INTER:	1.0

We can also include multiple transitions, under a single TRANSITION definition.

TRANSITION:	STANDARD:		P(X)
	INTER:	0.5
	START:	0.5

##LEXICAL Lexical transition use the sequence to determine the likelihood of transitioning from one state to the next. So, like the emissions, we must provide sequence counts or frequencies. Because they are sequence dependent, we’ll have a table corresponding to a sequence track and track alphabet.

TRANSITION:	LEXICAL:		P(X)
	INTER:	SEQ
		ORDER:	1
0.996096	0.998151	0.998046	0.997835
0.997860	0.997702	0.998058	0.997916
0.997821	0.997921	0.997690	0.997879
0.997816	0.997893	0.997766	0.996080

The transition is lexical with values provided as probabilities. The transition is to the INTER state and we’ll use the SEQ track to define the transition probability. The order of dependence is first order, meaning we’ll look at the probability P(X | Y).

If our sequence was ACGT and we are currently at C position in our sequence, we’d compute P(C | A) = 0.998046

TODO://Add Function parsing and printing

If we want to define how the ambiguous characters are treated, we need to type AMBIGUOUS

Scoring of ambiguous can be AVG, MAX, MIN, P(X) or LOG

TRANSITION:	LEXICAL:		P(X)
	INTER:	SEQ
		ORDER:	1	AMBIGUOUS: AVG
0.996096	0.998151	0.998046	0.997835
0.997860	0.997702	0.998058	0.997916
0.997821	0.997921	0.997690	0.997879
0.997816	0.997893	0.997766	0.996080

Lexical transition can also link to an emission function, which uses a lexical table and sequence to calculate the transition probability

TRANSITION:	LEXICAL:		FUNCTION:  PWM2
	INTER:	SEQ

This defines a Transition to the state INTER using the SEQ track. The transition probability is calculated using the PWM2 function

##EXPLICIT DURATION Explicit duration are dependent upon a length distribution. The length will be calculated by tracing back until a set condition. Then that length will be used to calculate the transition probability. The duration probabilities define the likelihood of a given duration for the traceback.

Here the transition to ENTER will try a traceback until the label of states is an “S”. We can also define the traceback to be

DIFF_STATE: Traceback until a different state is encountered TO_STATE: Traceback until a state with given name is encountered TO_LABEL: Traceback until a state with given label is encountered TO_GFF: Traceback until a state with a given GFF description is encountered TO_START: Traceback to the start of the sequence

TRANSITION:	DURATION:	P(X)
	ENTER:	DIFF_STATE
		3	1
		4	0.995
		5	0.50
		10	0.10
		100	0.01
		101	0

This definition would define the transition to the ENTER state. Anything less than 4 would be considered probability 1.0. And anything more than 100 would have a probability of zero.

If the position index of the distribution begins with a number that number will be extended to the extremes. For example using the above transition, 2 would be 1 also.

If position 101, ended with 0.1 then 102-> ∞ would also be 0.1

If you want the probability of less than 3 to be zero, then you should define 2 as P(X) = 0.

To describe the outgoing transition we’d need to define a reciprocal transition

TRANSITION:	DURATION:	P(X)
	ENTER:	DIFF_STATE
		3	1
		4	0.995
		5	0.50
		10	0.10
		100	0.01
		101	0
TRANSITION:	DURATION:	P(X)
	NEXT:		DIFF_STATE
		3	0
		4	0.05
		5	0.50
		10	0.90
		100	0.99
		101	1

##END TRANSITIONS The ending state doesn’t contain any emissions or transitions from the state. However, every state that has a possible transition to the end of the state must define a STANDARD transition to the END state.