Exercise 4B
Full credit will only be given to correct answers with a clear
explanation of how they are obtained. Use additional paper as
necessary.
-
In a DNA sequence of length 1000, the base counts are:
| Base
|
Count
|
| A
|
270
|
| C
|
232
|
| G
|
185
|
| T
|
313
|
Test the hypothesis that the sequence is generated as independent random
variables with probability distribution
f(A) = f(C) = f(G) = f(T) =
1/4
using
(i). Pearson's goodness of fit test
(ii). Likelihood ratio test.
-
Assume that a DNA sequence conforms to a Markov chain model. Its base and
dinucleotide counts are
| A
|
246
|
AA
|
40
|
CA
|
86
|
GA
|
92
|
TA
|
28
|
| C
|
219
|
AC
|
74
|
CC
|
76
|
GC
|
12
|
TC
|
56
|
| G
|
191
|
AG
|
19
|
CG
|
26
|
GG
|
38
|
TG
|
108
|
| T
|
344
|
AT
|
113
|
CT
|
31
|
GT
|
49
|
TT
|
151
|
Estimate the transition probability matrix of this Markov chain.
|