Exercise 3A
Full credit will only be given to correct answers with a clear
explanation
of how they are obtained. Use additional paper as necessary.
Introduction to S-Plus
S-Plus is a comprehensive statistical software package with a
large collection of built in standard statistics programs (e.g. ANOVA,
regression, etc.). However, the real strength of this software package
lies in the versatile object oriented S-Plus programming language. The
language lets us create new statistical functions and procedures very
efficiently
to suit our own purposes. You can start S-Plus on Helix by typing Splus5
(the "5" is because we are currently using version 5 of S-Plus).
Data Objects: When using S-Plus, think of your data sets as data
objects. You can display any S-Plus object by simply typing its name. The
simplest data object is a vector. Other data objects include matrix, data
frame, and list. We shall focus only on vectors and matrices for now.
If you are working from a terminal with X-windows display environment
set up properly, You can get on-line help from S-Plus by typing
help.start()
to the S-Plus prompt. This will start the help menu in Netscape. If you
are not working from an X-windows terminal, you can get help for any
command
called, say <commandname> by typing help(commandname).
Exercise
A vector is a set of numbers, character values, logical values, etc.
You can form a vector by combining several elements of the same
type with the "c" function. For example, n<- c(2, 5.6, 8.1, 9.5) will
create a vector, named n, consisting of four numbers 2, 5.6, 8.1, 9.5.
Try the following commands, then examine the created object by typing
its name. Write down the S-Plus output and an interpretation of what
operations
are performed at each command.
-
n<- c(2, 5.6, 8.1, 9.5)
Output:
Interpretation:
-
m<-n/sum(n)
Output:
Interpretation:
-
n<-c(-2.3, n, 4,-4.5, 8.5)
Output:
Interpretation:
-
len<- length(n)
Output:
Interpretation:
-
polyA<- rep ("A", length(n))
Output:
Interpretation:
-
mysequence<- c ("C", "T", "T", "A", "G", "C", "A", "G", "G", "T")
Output:
Interpretation:
-
CompStrand<- function (DNAseq){
s<- rep (" ", length(DNAseq))
s[DNAseq == "A"]<-"T"
s[DNAseq == "C"]<-"G"
s[DNAseq == "G"]<-"C"
s[DNAseq == "T"]<-"A"
s<- rev(s)
s
}
Output:
Interpretation:
-
CompStrand (mysequence)
Output:
Interpretation:
-
CompStrand (polyA)
Output:
Interpretation:
-
CompStrand (c("G", "A", "A", "T", "T", "C")). What do you notice when you
compare the complementary strand with the original sequence?
Output:
Interpretation:
-
A matrix is a rectangular array of numbers. In S-Plus, a matrix is
constructed using the "matrix" command. Try typing the following
P<-matrix(c(0.2, 0.4, 0.5, 0.1, 0.3, 0.3, 0.1, 0.2, 0.1, 0.1, 0.2, 0.3,
0.4, 0.2, 0.2, 0.4), 4, 4)
Then, type P to the S-Plus prompt and record the results.
Output:
Verify that P is the transition probability matrix you had in Problem 1 of
Exercise 2A. Note that S-Plus fills in the entries of a matrix by
columns. Also, the first number, 4, in the command statement after
the
matrix entries determines the number of rows in the matrix. The second
number after the matrix entries determines the number of columns in the
matrix.
-
Find out about the "solve" function in S-Plus by typing
help(solve). Use it to find the stationary distribution of the
Markov nucleotide sequence with the transition probability matrix P. You
might want to refer back to Problem 2 of Exercise 2B, where you have set
up the system of linear equations to be solved in order to obtain the
stationary distribution.
The stationary distribution is
Record below the sequence of S-Plus commands used.
-
What does the S-Plus command t(P) do?
Output:
Interpretation:
-
Type "eigen(t(P))" to the S-Plus prompt. From the output, pick
out the eigenvector corresponding to the eigenvalue 1. Can you obtain
the stationary distribution from the eigen vector? How?
Answer:
-
Why do we have to use "eigen(t(P))" instead of
"eigen(P)" in problem 14?
Answer:
-
Go to the following links to read some introductory S-Plus documents.
SPLUS
- TUTORIAL
Introductory
Course on SPlus
More helpful links may be found at
http://helix.biostat.utsa.edu/~kgarnett/bioinformatics/splus.html
|