STA 4953 Test I
Introduction to Bioinformatics
February 22, 2001, 3:30-4:45 p.m.
Part I. [9 points] Probability models for DNA
The tetrahedral die model
A random nucleotide sequence is generated independently according to the
probability distribution
(f(A), f(C), f(G), f(T)) = (0.25, 0.2, 0.2,
0.35)
When a quadruplet made up of 4 nucleotide bases from this sequence is
observed, what is the probability that
- it contains at least one A?
- all four bases are identical?
The Markov chain model
A random nucleotide sequence is generated by a Markov chain with
transition probability matrix
P =
|
|
|
A
|
C
|
G
|
T
|
|
A
|
(
|
0.2
|
0.3
|
0.1
|
0.4
|
)
|
C
|
0.4
|
0.3
|
0.1
|
0.2
|
G
|
0.5
|
0.1
|
0.2
|
0.2
|
T
|
0.1
|
0.2
|
0.3
|
0.4
|
If the probability distribution of the first base is
(f(A), f(C), f(G), f(T)) = (0.25, 0.2, 0.2,
0.35),
What is the probability of getting an A in the third base? Describe one
way to find the stationary distribution of this Markov chain.
Part II [6 points] Statistical Inference
Hypothesis Test
A DNA sequence has the following nucleotide and dinucleotide counts:
A
|
246
|
AA
|
40
|
AC
|
74
|
AG
|
19
|
AT
|
113
|
C
|
219
|
CA
|
86
|
CC
|
76
|
CG
|
26
|
CT
|
31
|
G
|
191
|
GA
|
92
|
GC
|
12
|
GG
|
38
|
GT
|
49
|
T
|
344
|
TA
|
28
|
TC
|
56
|
TG
|
108
|
TT
|
151
|
Suppose the nucleotide bases are generated independently. Test the
hypothesis that the base probability distribution is
(f(A), f(C), f(G), f(T)) = (0.25, 0.2, 0.2,
0.35),
[Hint: Either the Pearson statistic
X2 =
sum(i={A,C,G,T}) [(Oi -
Ei)2/
Ei]
or the likelihood ratio statistic
G2 = 2 * sum(i={A,C,G,T}) [Oi *
log ( Oi /Ei )]
may be used. The critical value X20.05 with
3 degrees of freedom is 7.815.]
Estimation
Suppose the Strong/Weak (S/W) classification for this DNA sequence
conforms to a Markov chain. Estimate the transition probability matrix
.
Part III [5 points] Select the best answer and write the letter in the
provided space.
- The process of making an RNA copy of DNA is called
A. Transcription
B. Translation
C. Moderation
D. Replication
E. Gobilization
-
The process fo reading an amino acid sequence from an RNA molecule is
called
A. Transcription
B. Translation
C. Repudiation
D. Replication
E. Cross-market capitalization
-
A protein molecule is made up of
A. nucleotide bases
B. A, C, G, and U
C. chromosomes
D. cells
E. amino acids
-
The base guanine is always paired with
A. Adenine
B. Guanine
C. Cystosine
D. Thymine
E. Guanine is never paired with another base in a molecule of DNA
-
Which of the DNA sequences below is a palindrome?
A. TAC
B. TCTCTCT
C. AAAAAAAAAA
D. ACGT
E. MASDP
Part IV Extra Credit Problems [1 point extra credit]
-
Hershey and Chase differentiated between DNA and protein by:
A. labeling the DNA with 32Phosphorous, proteins with
35Sulfur
B. labeling the DNA with 35Sulfur, proteins with
32Phosphorous
C. labeling the DNA with cesium, proteins with chloride
D. labeling the DNA with 14Carbon, proteins with
3Hydrogen
-
In Avery's experiment, the ability of an extract of heat killed, smooth,
disease causing bacteria to transform rough, non-disease causing bacteria
was blocked by treatment with
A. proteinase
B. DNase
C. RNase
D. Jerry Springer
E. Calcitonin
|