Name:  
   E-mail:  


STA 4953 Test I
Introduction to Bioinformatics
February 22, 2001, 3:30-4:45 p.m.



Part I. [9 points] Probability models for DNA



The tetrahedral die model
A random nucleotide sequence is generated independently according to the probability distribution
(f(A), f(C), f(G), f(T)) = (0.25, 0.2, 0.2, 0.35)
When a quadruplet made up of 4 nucleotide bases from this sequence is observed, what is the probability that
  1. it contains at least one A?
  2. all four bases are identical?


The Markov chain model
A random nucleotide sequence is generated by a Markov chain with transition probability matrix
   P  =         A C G T   
A (   0.2 0.3 0.1 0.4   )
C 0.4 0.3 0.1 0.2
G 0.5 0.1 0.2 0.2
T 0.1 0.2 0.3 0.4

If the probability distribution of the first base is
(f(A), f(C), f(G), f(T)) = (0.25, 0.2, 0.2, 0.35),
What is the probability of getting an A in the third base? Describe one way to find the stationary distribution of this Markov chain.


Part II [6 points] Statistical Inference

Hypothesis Test

A DNA sequence has the following nucleotide and dinucleotide counts:

A 246 AA 40 AC 74 AG 19 AT 113
C 219 CA 86 CC 76 CG 26 CT 31
G 191 GA 92 GC 12 GG 38 GT 49
T 344 TA 28 TC 56 TG 108 TT 151

Suppose the nucleotide bases are generated independently. Test the hypothesis that the base probability distribution is
(f(A), f(C), f(G), f(T)) = (0.25, 0.2, 0.2, 0.35),
    [Hint: Either the Pearson statistic

    X2 = sum(i={A,C,G,T}) [(Oi - Ei)2/ Ei]

    or the likelihood ratio statistic

    G2 = 2 * sum(i={A,C,G,T}) [Oi * log ( Oi /Ei )]

    may be used. The critical value X20.05 with 3 degrees of freedom is 7.815.]


Estimation
Suppose the Strong/Weak (S/W) classification for this DNA sequence conforms to a Markov chain. Estimate the transition probability matrix
   P  =   ( PSS   PSW )
PWS   PWW
.


Part III [5 points] Select the best answer and write the letter in the provided space.

  1. The process of making an RNA copy of DNA is called
      A. Transcription
      B. Translation
      C. Moderation
      D. Replication
      E. Gobilization
  2. The process fo reading an amino acid sequence from an RNA molecule is called
      A. Transcription
      B. Translation
      C. Repudiation
      D. Replication
      E. Cross-market capitalization
  3. A protein molecule is made up of
      A. nucleotide bases
      B. A, C, G, and U
      C. chromosomes
      D. cells
      E. amino acids
  4. The base guanine is always paired with
      A. Adenine
      B. Guanine
      C. Cystosine
      D. Thymine
      E. Guanine is never paired with another base in a molecule of DNA
  5. Which of the DNA sequences below is a palindrome?
      A. TAC
      B. TCTCTCT
      C. AAAAAAAAAA
      D. ACGT
      E. MASDP


  6. Part IV Extra Credit Problems [1 point extra credit]

  7. Hershey and Chase differentiated between DNA and protein by:
      A. labeling the DNA with 32Phosphorous, proteins with 35Sulfur
      B. labeling the DNA with 35Sulfur, proteins with 32Phosphorous
      C. labeling the DNA with cesium, proteins with chloride
      D. labeling the DNA with 14Carbon, proteins with 3Hydrogen
  8. In Avery's experiment, the ability of an extract of heat killed, smooth, disease causing bacteria to transform rough, non-disease causing bacteria was blocked by treatment with
      A. proteinase
      B. DNase
      C. RNase
      D. Jerry Springer
      E. Calcitonin