Name:  
   E-mail:  


Exercise 4B

Full credit will only be given to correct answers with a clear explanation of how they are obtained. Use additional paper as necessary.

  1. In a DNA sequence of length 1000, the base counts are:

    Base Count
    A 270
    C 232
    G 185
    T 313


    Test the hypothesis that the sequence is generated as independent random variables with probability distribution
    f(A) = f(C) = f(G) = f(T) = 1/4


    The null and alternative hypothesis are,
    Ho: f(A) = f(C) = f(G) = f(T) = 1/4
    H1: f(x) does not equal 1/4 for some x={A, C, G, T}.


    using

    (i). Pearson's goodness of fit test

    Solution:
    X2 = sum(i={A,C,G,T}) [(Oi - Ei)2/ Ei]
    = [(270 -250)2/250] + [(232-250)2/250] + [(185 - 250)2/250] + [(313 - 250)2/250] = 35.672

    Using a Chi-square (right-tailed) table look up the value for a Chi-square random variable with 3 degrees of freedom and a p-value of 0.05 to get 7.81. Any value of the test statistic (calculated above) that is larger than 7.81, will lead to a rejection of the null hypothesis, Ho. Since 35.672 > 7.81, we reject Ho.


    (ii). Likelihood ratio test.

    Solution:
    G2 = 2 * {sum(i={A,C,G,T}) [Oi * log(Oi /Ei)]}
    = 2 * {(270) * log(270/250) + (232) * log(232/250) + (185) * log(185/250) + (313) * log(270/250)} = 36.1708

    Since 36.1708 > 7.81, we reject Ho.
  2. Assume that a DNA sequence conforms to a Markov chain model. Its base and dinucleotide counts are

    A 246 AA 40 CA 86 GA 92 TA 28
    C 219 AC 74 CC 76 GC 12 TC 56
    G 191 AG 19 CG 26 GG 38 TG 108
    T 344 AT 113 CT 31 GT 49 TT 151


    Estimate the transition probability matrix of this Markov chain.

       P  =         A C G T   
    A (     )
    C
    G
    T


    The above probabilities are found using the rule for conditional probability:
    P(A|B) = [P(A and B)] / P(B)
    applied to this situation

    pXY = P(current base is Y given that the previous base is X)
    =P(current base is Y | the previous base is X)
    = P(current base is Y and previous base is X) / P(previous base is X)
    where X = {A,C,G,T} and Y = {A,C,G,T}.

    For example:

    pAC = P(current base is C given that the previous base is A)
    = P(current base is C | previous base is A)
    = P(current base is C and previous base is A) / P(previous base is A)
    =(74/999) / 246/1000
    = 0.3011