Name:  


STA 4953 Test II
Introduction to Bioinformatics
April 5, 2001, 3:30-4:45 p.m.



Please show your work in detail. Use additional paper as necessary.

Part I. [8 points] Substitution Matrix
The observed relative frequencies of nucleotide pairs in a database of DNA blocks are: (These are the qij's)

  ACGT
A0.150.050.040.06
C 0.100.050.11
G  0.180.04
T   0.22


Construct a BLOSUM matrix and display it in the table below.
  ACGT
A3-2-3-3
C 3-2-1
G  3-4
T   2


The above calculations were derived as follows:

To find the expected frequencies, use the formulas
pij = pi2 if i=j
  = 2pipj if i not equal to j

where, pi = qii + [sum(i not equal to j)qij]/2.

Expected Relative Frequencies (pij)
A0.0506250.092250.110250.14625
C 0.0420250.100450.13325
G  0.0600250.15925
T   0.105625
 ACGT
The elements of the table should sum to 1.

To find the BLOSUM80 scores, use the formula

2 * sij = 2 * ln[qij/pij] / ln(2)
and round to the nearest integer.

Part II [8 points] Sequence Alignment
Using the substitution matrix constructed in Part I, and a gap penalty function w(k) = -1 - k, where k is the length of the gap. Find the best local alignment(s) between the following pair of sequences:

AGATCCAC
ATGCAC


To arrive at the best local alignment, the following matrix was constructed (minus the arrows):

Thus, the best local alignment is
ATCCAC
ATGCAC


Part III [4 points] Genome Project

  1. What is a genomic library?



  2. What is an EST (Expressed Sequence Tag)?



  3. A piece of DNA which carries another piece of DNA by allowing its replication and selection is called a :
    A. Volkswagen
    B. nice guy
    C. insert
    D. vector
    E. bacteria virus

  4. A long DNA sequence made up of the aligned sequences from several smaller pieces of DNA is called a :
    A. genome
    B. contig
    C. chromosome
    D. library
    E. RNA



Part IV [2 points] Extra Credit

  1. Here are the recognition sites and cleavage positions of four restriction enzymes:
    BlaHIGC|ATGC
    EcoRIGA|ATTC
    HindIIIAA|GCTT
    BgmIIAC|ATGT

    Which of these enzyme combinations will not create compatible (complementary) sticky ends?
    A. Bls HI and EcoRI
    B. EcoRI and Hind III
    C. BlaHI and BgmII
    d. EcoRI and BgmII
    E. None of the above.
  2. Which is the sequence read from the following gel?
    ACGT
    ______   
      ______ 
     ______  
       ______
      ______ 
    ______   
      ______ 
     ______  
      ______ 
    _____   
       ______


    A. TTAGGGCCTGA
    B. TAGCGAGTCGA
    C. AGCTGCGGGTA
    D. GGGCATGCTGA
    E. TAGGGCGTCGA