Name:  
   E-mail:  


Exercise 4B

Full credit will only be given to correct answers with a clear explanation of how they are obtained. Use additional paper as necessary.

  1. In a DNA sequence of length 1000, the base counts are:

    Base Count
    A 270
    C 232
    G 185
    T 313


    Test the hypothesis that the sequence is generated as independent random variables with probability distribution
    f(A) = f(C) = f(G) = f(T) = 1/4
    using
    (i). Pearson's goodness of fit test
    (ii). Likelihood ratio test.
  2. Assume that a DNA sequence conforms to a Markov chain model. Its base and dinucleotide counts are

    A 246 AA 40 CA 86 GA 92 TA 28
    C 219 AC 74 CC 76 GC 12 TC 56
    G 191 AG 19 CG 26 GG 38 TG 108
    T 344 AT 113 CT 31 GT 49 TT 151


    Estimate the transition probability matrix of this Markov chain.

       P  =         A C G T   
    A (     )
    C
    G
    T