Exercise 4B
Full credit will only be given to correct answers with a clear
explanation of how they are obtained. Use additional paper as
necessary.
-
In a DNA sequence of length 1000, the base counts are:
Base
|
Count
|
A
|
270
|
C
|
232
|
G
|
185
|
T
|
313
|
Test the hypothesis that the sequence is generated as independent random
variables with probability distribution
f(A) = f(C) = f(G) = f(T) =
1/4
using
(i). Pearson's goodness of fit test
(ii). Likelihood ratio test.
-
Assume that a DNA sequence conforms to a Markov chain model. Its base and
dinucleotide counts are
A
|
246
|
AA
|
40
|
CA
|
86
|
GA
|
92
|
TA
|
28
|
C
|
219
|
AC
|
74
|
CC
|
76
|
GC
|
12
|
TC
|
56
|
G
|
191
|
AG
|
19
|
CG
|
26
|
GG
|
38
|
TG
|
108
|
T
|
344
|
AT
|
113
|
CT
|
31
|
GT
|
49
|
TT
|
151
|
Estimate the transition probability matrix of this Markov chain.
|