Exercise 4A
Full credit will only be given to correct answers with a clear
explanation of how they are obtained. Use additional paper as
necessary.
-
Look up the helpfile of the S-Plus command "sample".
-
Generate a random nucleotide sequence of length 1000 using the independent
model (i.e. the tetrahedral die model) with probabilities P(A) = 0.3, P(C)
= 0.2, P(G) = 0.25, P(T) = 0.25. Name your sequence "myseq".
Solution: myseq<-sample(c("A", "C", "G",
"T"),size=1000,replace=T,prob=c(0.2,0.3,0.25,0.25))
-
Let NA (respectively NC, NG,
and NT) be the random variable representing the number of A's
(respectively C's, G's, and T's) in such a random sequence as generated in
problem #2.
(a). What kind of probability distribution does NA
(respectively NC,
NG, and NT)
have? What are the parameters in that distribution?
(b). What is
the expected value, of NA (respectively NC,
NG, and NT)?
-
Figure out the observed counts of the bases in myseq you
obtained
in problem #2. Fill in the table below.
(Hint: In Exercise3A, you have created an S-Plus object called
"mysequence". Find out what happens when you do
mysequence[mysequence=="A"]. Do the same for "C", "G", and "T".
This
might give you some idea.)
Show the S-Plus command(s) that you used.
-
Generate a random nucleotide sequence of length 1000 according to the
Markov model with the transition probability matrix P given in Problem #1
in Exercise 2A. Assume that the sequence has already attained its
stationary distribution at the beginning. Repeat problem 4 for this
sequence.
Hint: You'll need to use "for"-loops and some "if-else" statements. The
"dimnames" function for matrices may also be useful. Check out some of
the introductory S-Plus documents.)
The expected values were obtained as follows:
Let I(j)A = 1 if the jth base is A; 0 if
the jth base is not A.
Then,
NA = I(1)A + I(2)A
+ ... + I(1000)A
Since the expected value of a sum is equal to the sum of the expected
values, we have
E[NA] = sum(j=1 to 1000)[E(I(j)A)] =
0.2691466 + 0.2691466 + ... + 0.2691466 = 269.1466
The expected counts for C, G, and T are obtained similarly.
Show the S-Plus commands that you used.
|