STA 4953 (Spring 2001) Exercise 2A Answers

Name:

Exercise 2A

Full credit will only be given to correct answers with a clear explanation of how they are obtained. Use additional paper as necessary.

The following is the transition probability matrix of a Markov nucleotide sequence. Fill in the blanks

P =	(	0.2	0.3	0.1		)
		0.4		0.1	0.2
		0.5	0.1		0.2
			0.2	0.3	0.4

For a Markov chain X₀, X₁, X₂, ... with transition probability matrix P as in question 1, suppose the probability distribution of X₀ is

x	f_o(x)
A	1/4
C	1/4
G	1/4
T	1/4

That is, the initial nucleotide may be any of the four bases equally likely.

Work out the probability distribution of X₁. (Hint: Use the Law of Total Probability:

P(E) = SUM_i [P( E and B_i )]
= SUM_i [P(B_i) * P(E | B_i)].)

Solution:
To determine the probability distribution of X₁, you must find f₁(A), f₁(C), f₁(G), f₁(T).
To find f₁(A)
find
P(x₁=A) = P(x₁=A | x_o=A) * P(x_o=A) + P(x₁=A | x_o=C) * P(x_o=C) + P(x₁=A | x_o=G) * P(x_o=G) + P(x₁=A | x_o=T) * P(x_o=T)
= (0.2)*(0.25) + (0.4) * (0.25) + (0.5) * (0.25) + (0.1) * (0.25)
= 0.3

f₁(C), f₁(G), and f₁(T) are found similarly to get the probability distribution of X₁:

x f₁(x)

A 0.300

C 0.225

G 0.175

T 0.300

Then also work out the probability distribution of X₂.
Solution: The probability distribution of X₂ is found similar to that of X₁, but using the values found above.

P(x₂=A) = P(x₂=A | x₁=A) * P(x₁=A) + P(x₂=A | x₁=C) * P(x₁=C) + P(x₂=A | x₁=G) * P(x₁=G) + P(x₂=A | x₁=T) * P(x₁=T)
= (0.2)*(0.3) + (0.4) * (0.225) + (0.5) * (0.175) + (0.1) * (0.3)
= 0.2675
f₂(C), f₂(G), and f₂(T) are found similarly to get the probability distribution of X₂:

x f₂(x)

A 0.2675

C 0.235

G 0.1775

T 0.32

Can you suggest a method for finding the probability distribution of X_n?
Solution: (f₀(A), f₀(C), f₀(G), f₀(T))* Pⁿ = (f_n(A), f_n(C), f_n(G), f_n(T))

Construct a Markov chain model for a nucleotide sequence generated according to these rules:

Solution: Let Y₀ = (X₀,X₁), Y₁ = (X₁,X₂), ..., Y_n-1 = (X_n-1,X_n),

(i). The present nucleotide is equally likely to be A, C, G, T if the preceding two nucleotides are identical.

(ii). The present nucleotide will be twice as likely to be C or G than A or T if the preceding two nucleotides are different. Furthermore, when making a choice between C versus G and A versus T, purines will be used 60% of the time.

Write out its transition probability matrix.

Solution: Rule (i) tells us that if the preceeding two nucleotides are identical, then
P(A) = P(C) = P(G) = P(T) = 0.25.
Rule (ii) tells us that if the preceeding two nucleotides are different, then P(strong) = 2 * P(weak). We already know that P(strong) + P(weak) = 1. Solving these two equations simultaneously, we have P(strong)= 2/3 and P(weak) = 1/3.
Rule (ii) also tells us that purines (A, G) will be used 60% of the time. Thus, P(purine) = 0.6.
To find the values for the transition probability matrix, we know from rule (i) that
P_(AA)(AA) = P_(AA)(AC) = P_(AA)(AG) = P_{(AA)(AT) = 0.25.

All other values in the row corresponding to AA will be 0, because with
the model defined as above it is impossible to go to bases where the first
base representing the column does not match the second base representing
the row. For example it is impossible to go from row AA to column CA,
because A
does not equal C.

Similarly P_(CC)(CA)
= P_(CC)(CC) = P_(CC)(CG) = P_{(CC)(CT) = 0.25.

Similarly P_(GG)(GA) = P_(GG)(GC) =
P_(GG)(GG) = P_{(GG)(GT) = 0.25.

Similarly P_(CC)(TA) = P_(TT)(TC) =
P_(TT)(TG) = P_{(TT)(TT) = 0.25.

Now,

P_(AC)(CA) = P(A and weak) = (0.60)*(1/3) = 0.2

and

P_(AC)(CC) = P(C and strong) = (0.40) * (2/3) = 0.2667

P_(AC)(CG) = P(G and strong) = (0.06) * (2/3) = 0.4

P_(AC)(CT) = P(T and weak) = (0.40) * (1/3) = 0.1333

The remainding probabilities are found similarly.

The transition probability matrix, P, is:

0.25

0.25

0.25

0.25

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.25

0.25

0.25

0.25

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.25

0.25

0.25

0.25

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.2

0.2667

0.4

0.1333

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.25

0.25

0.25

0.25}}}}