Name:  


Exercise 2A

Full credit will only be given to correct answers with a clear explanation of how they are obtained. Use additional paper as necessary.

  1. The following is the transition probability matrix of a Markov nucleotide sequence. Fill in the blanks

       P  =   (   0.2 0.3 0.1   )
    0.4 0.1 0.2
    0.5 0.1 0.2
    0.2 0.3 0.4
  2. For a Markov chain X0, X1, X2, ... with transition probability matrix P as in question 1, suppose the probability distribution of X0 is

    x fo(x)
    A 1/4
    C 1/4
    G 1/4
    T 1/4


    That is, the initial nucleotide may be any of the four bases equally likely.

    Work out the probability distribution of X1. (Hint: Use the Law of Total Probability:

    P(E) = SUMi [P( E and Bi )]

                    = SUMi [P(Bi) * P(E | Bi)].)


    Solution:
    To determine the probability distribution of X1, you must find f1(A), f1(C), f1(G), f1(T).
    To find f1(A)
    find
    P(x1=A) = P(x1=A | xo=A) * P(xo=A) + P(x1=A | xo=C) * P(xo=C) + P(x1=A | xo=G) * P(xo=G) + P(x1=A | xo=T) * P(xo=T)
    = (0.2)*(0.25) + (0.4) * (0.25) + (0.5) * (0.25) + (0.1) * (0.25)
    = 0.3

    f1(C), f1(G), and f1(T) are found similarly to get the probability distribution of X1:



    x f1(x)
    A 0.300
    C 0.225
    G 0.175
    T 0.300


    Then also work out the probability distribution of X2.
    Solution: The probability distribution of X2 is found similar to that of X1, but using the values found above.

    P(x2=A) = P(x2=A | x1=A) * P(x1=A) + P(x2=A | x1=C) * P(x1=C) + P(x2=A | x1=G) * P(x1=G) + P(x2=A | x1=T) * P(x1=T)
    = (0.2)*(0.3) + (0.4) * (0.225) + (0.5) * (0.175) + (0.1) * (0.3)
    = 0.2675
    f2(C), f2(G), and f2(T) are found similarly to get the probability distribution of X2:

    x f2(x)
    A 0.2675
    C 0.235
    G 0.1775
    T 0.32


    Can you suggest a method for finding the probability distribution of Xn?
    Solution: (f0(A), f0(C), f0(G), f0(T))* Pn = (fn(A), fn(C), fn(G), fn(T))
  3. Construct a Markov chain model for a nucleotide sequence generated according to these rules:

    Solution: Let Y0 = (X0,X1), Y1 = (X1,X2), ..., Yn-1 = (Xn-1,Xn),

    (i). The present nucleotide is equally likely to be A, C, G, T if the preceding two nucleotides are identical.

    (ii). The present nucleotide will be twice as likely to be C or G than A or T if the preceding two nucleotides are different. Furthermore, when making a choice between C versus G and A versus T, purines will be used 60% of the time.


    Write out its transition probability matrix.

    Solution: Rule (i) tells us that if the preceeding two nucleotides are identical, then
    P(A) = P(C) = P(G) = P(T) = 0.25.
    Rule (ii) tells us that if the preceeding two nucleotides are different, then P(strong) = 2 * P(weak). We already know that P(strong) + P(weak) = 1. Solving these two equations simultaneously, we have P(strong)= 2/3 and P(weak) = 1/3.
    Rule (ii) also tells us that purines (A, G) will be used 60% of the time. Thus, P(purine) = 0.6.
    To find the values for the transition probability matrix, we know from rule (i) that
    P(AA)(AA) = P(AA)(AC) = P(AA)(AG) = P(AA)(AT) = 0.25.
    All other values in the row corresponding to AA will be 0, because with the model defined as above it is impossible to go to bases where the first base representing the column does not match the second base representing the row. For example it is impossible to go from row AA to column CA, because A does not equal C.
    Similarly P(CC)(CA) = P(CC)(CC) = P(CC)(CG) = P(CC)(CT) = 0.25.
    Similarly P(GG)(GA) = P(GG)(GC) = P(GG)(GG) = P(GG)(GT) = 0.25.
    Similarly P(CC)(TA) = P(TT)(TC) = P(TT)(TG) = P(TT)(TT) = 0.25.
    Now,
    P(AC)(CA) = P(A and weak) = (0.60)*(1/3) = 0.2
    and
    P(AC)(CC) = P(C and strong) = (0.40) * (2/3) = 0.2667
    P(AC)(CG) = P(G and strong) = (0.06) * (2/3) = 0.4
    P(AC)(CT) = P(T and weak) = (0.40) * (1/3) = 0.1333
    The remainding probabilities are found similarly.
    The transition probability matrix, P, is:

    0.25 0.25 0.25 0.25 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0.2 0.2667 0.4 0.1333 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0.2 0.2667 0.4 0.1333 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.2667 0.4 0.1333
    0.2 0.2667 0.4 0.1333 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0.25 0.25 0.25 0.25 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0.2 0.2667 0.4 0.1333 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.2667 0.4 0.1333
    0.2 0.2667 0.4 0.1333 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0.2 0.2667 0.4 0.1333 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0.25 0.25 0.25 0.25 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.2667 0.4 0.1333
    0.2 0.2667 0.4 0.1333 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0.2 0.2667 0.4 0.1333 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0.2 0.2667 0.4 0.1333 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0.25 0.25 0.25 0.25