Exercise 4B
Full credit will only be given to correct answers with a clear
explanation of how they are obtained. Use additional paper as
necessary.
-
In a DNA sequence of length 1000, the base counts are:
Base
|
Count
|
A
|
270
|
C
|
232
|
G
|
185
|
T
|
313
|
Test the hypothesis that the sequence is generated as independent random
variables with probability distribution
f(A) = f(C) = f(G) = f(T) =
1/4
The null and alternative hypothesis are,
Ho: f(A) = f(C) = f(G) = f(T) =
1/4
H1: f(x) does not equal 1/4 for some x={A, C, G, T}.
using
(i). Pearson's goodness of fit test
Solution:
X2 = sum(i={A,C,G,T}) [(Oi -
Ei)2/
Ei]
= [(270 -250)2/250] +
[(232-250)2/250] + [(185 - 250)2/250] +
[(313 - 250)2/250]
= 35.672
Using a Chi-square (right-tailed) table look up the value for a
Chi-square random variable with 3 degrees of freedom and a p-value of 0.05
to get 7.81. Any value of the test statistic (calculated above) that is
larger than 7.81, will lead to a rejection of the null hypothesis,
Ho. Since 35.672 > 7.81, we reject Ho.
(ii). Likelihood ratio test.
Solution:
G2 = 2 * {sum(i={A,C,G,T}) [Oi *
log(Oi /Ei)]}
= 2 * {(270) * log(270/250) + (232) *
log(232/250) + (185) *
log(185/250) + (313) *
log(270/250)}
= 36.1708
Since 36.1708 > 7.81, we reject Ho.
-
Assume that a DNA sequence conforms to a Markov chain model. Its base and
dinucleotide counts are
A
|
246
|
AA
|
40
|
CA
|
86
|
GA
|
92
|
TA
|
28
|
C
|
219
|
AC
|
74
|
CC
|
76
|
GC
|
12
|
TC
|
56
|
G
|
191
|
AG
|
19
|
CG
|
26
|
GG
|
38
|
TG
|
108
|
T
|
344
|
AT
|
113
|
CT
|
31
|
GT
|
49
|
TT
|
151
|
Estimate the transition probability matrix of this Markov chain.
The above probabilities are found using the rule for
conditional probability:
P(A|B) = [P(A and B)] / P(B)
applied to this situation
pXY = P(current base is Y given that the previous base is X)
=P(current base is Y | the previous base is X)
= P(current base is Y and previous base is X) / P(previous base is X)
where X = {A,C,G,T} and Y = {A,C,G,T}.
For example:
pAC = P(current base is C given that the previous base is A)
= P(current base is C | previous base is A)
= P(current base is C and previous base is A) / P(previous base is A)
=(74/999) / 246/1000
= 0.3011
|