Exercise 2A
Full credit will only be given to correct answers with a clear
explanation
of how they are obtained. Use additional paper as necessary.
-
The following is the transition probability matrix of a Markov nucleotide
sequence. Fill in the blanks
-
For a Markov chain X0, X1, X2, ... with
transition probability matrix P as in question 1, suppose the
probability distribution of X0 is
x
|
fo(x)
|
A
|
1/4
|
C
|
1/4
|
G
|
1/4
|
T
|
1/4
|
That is, the initial nucleotide may be any of the four bases equally
likely.
Work out the probability distribution of X1. (Hint:
Use the Law of Total Probability:
P(E) = SUMi [P( E and Bi )]
=
SUMi [P(Bi) * P(E |
Bi)].)
Solution:
To determine the probability distribution of X1, you must find
f1(A), f1(C),
f1(G), f1(T).
To find f1(A)
find
P(x1=A) =
P(x1=A | xo=A) * P(xo=A) +
P(x1=A | xo=C) * P(xo=C) +
P(x1=A | xo=G) * P(xo=G) +
P(x1=A | xo=T) * P(xo=T)
= (0.2)*(0.25) + (0.4) * (0.25) + (0.5) * (0.25) + (0.1) * (0.25)
= 0.3
f1(C),
f1(G), and f1(T) are found similarly
to get the probability distribution of X1:
x
|
f1(x)
|
A
|
0.300
|
C
|
0.225
|
G
|
0.175
|
T
|
0.300
|
Then also work out the probability distribution of X2.
Solution: The probability distribution of X2 is found similar
to that of X1, but using the values found above.
P(x2=A) =
P(x2=A | x1=A) * P(x1=A) +
P(x2=A | x1=C) * P(x1=C) +
P(x2=A | x1=G) * P(x1=G) +
P(x2=A | x1=T) * P(x1=T)
= (0.2)*(0.3) + (0.4) * (0.225) + (0.5) * (0.175) + (0.1) * (0.3)
= 0.2675
f2(C),
f2(G), and f2(T) are found similarly
to get the probability distribution of X2:
x
|
f2(x)
|
A
|
0.2675
|
C
|
0.235
|
G
|
0.1775
|
T
|
0.32
|
Can you suggest a method for finding the probability distribution of
Xn?
Solution: (f0(A), f0(C),
f0(G), f0(T))* Pn =
(fn(A), fn(C),
fn(G), fn(T))
-
Construct a Markov chain model for a nucleotide sequence generated
according to these rules:
Solution:
Let Y0 = (X0,X1), Y1 =
(X1,X2), ..., Yn-1 =
(Xn-1,Xn),
(i). The present nucleotide is equally likely to
be A, C, G, T if the preceding two nucleotides are identical.
|
(ii). The present nucleotide will be twice as likely to be C
or G
than A or T if the preceding two nucleotides are different. Furthermore,
when making a choice between C versus G and A versus T, purines will be
used 60%
of the time.
|
Write out its transition probability matrix.
Solution: Rule (i) tells us that if the preceeding two nucleotides are
identical, then
P(A) = P(C) = P(G) = P(T) = 0.25.
Rule (ii) tells us that if the preceeding two nucleotides are different,
then P(strong) = 2 * P(weak). We already know that P(strong) + P(weak) =
1. Solving these two equations simultaneously, we have P(strong)= 2/3 and
P(weak) = 1/3.
Rule (ii) also tells us that purines (A, G) will be used 60% of the time.
Thus, P(purine) = 0.6.
To find the values for the transition probability matrix, we know from
rule (i) that
P(AA)(AA) = P(AA)(AC) =
P(AA)(AG) = P(AA)(AT) = 0.25.
All other values in the row corresponding to AA will be 0, because with
the model defined as above it is impossible to go to bases where the first
base representing the column does not match the second base representing
the row. For example it is impossible to go from row AA to column CA,
because A
does not equal C.
Similarly P(CC)(CA)
= P(CC)(CC) = P(CC)(CG) = P(CC)(CT) = 0.25.
Similarly P(GG)(GA) = P(GG)(GC) =
P(GG)(GG) = P(GG)(GT) = 0.25.
Similarly P(CC)(TA) = P(TT)(TC) =
P(TT)(TG) = P(TT)(TT) = 0.25.
Now,
P(AC)(CA) = P(A and weak) = (0.60)*(1/3) = 0.2
and
P(AC)(CC) = P(C and strong) = (0.40) * (2/3) = 0.2667
P(AC)(CG) = P(G and strong) = (0.06) * (2/3) = 0.4
P(AC)(CT) = P(T and weak) = (0.40) * (1/3) = 0.1333
The remainding probabilities are found similarly.
The transition probability matrix, P, is:
0.25
|
0.25
|
0.25
|
0.25
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.25
|
0.25
|
0.25
|
0.25
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.25
|
0.25
|
0.25
|
0.25
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.2
|
0.2667
|
0.4
|
0.1333
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0.25
|
0.25
|
0.25
|
0.25
|
|