Name:  



Exercise 8B: BLOSUM Matrix
STA 4953 (Spring 2001)
Due 3/29/2001


If you are given a "database" consisting of 2 DNA blocks:

A C A _ G A C G G C T
A C C C C T C G G A T
A T G T A T C G G A A
A C G T A  
A C T T A


Construct a BLOSUM80 matrix for this database. Show your work in detail.

Since we are constructing a BLOSUM 80 matrix, we will group sequences in a block that have 80% or more identical bases. Thus, the given blocks will be grouped as follows:

A C A _ G A C G G C T
A C C C C T C G G A T
A T G T A T C G G A A
A C G T A  
A C T T A


To determine the BLOSUM80 matrix for this database, we must construct 4 tables: Table of Counts; Table of Relative Frequencies; Table of Expected Frequencies; BLOSUM80 matrix.

To construct the Table of Counts, we will count the number of pairs for each possible base pairing: AA, AC, AG, AT, CC, CG, CT, GG, GT, TT. When counting, the sequences in red are counted as 1 sequence, the sequences in green are counted as 1 sequence.
The number of possible pairs per column without gaps for the first block is (3!/2!*1!) = 3. The number of possible pairs per column with gaps for the first block is (2!/2!*0!) = 1. Since there are 4 ungapped columns, we have 4*3 = 12 possible pairs for the ungapped columns. Since there is one column with a gap, we have 1*1 = 1 possible pair for this column. Thus, we have 12 + 1 = 13 total possible base pairings for the first block.
For the second block, there are 6 * (2!/2!*0!) = 6 * 1 = 6 possible base pairings for the second block.
Hence, a total of 13 + 6 = 19 possible pairings in the database.

Counts for AA: Block 1: 1 + (1/3) + (1/3) + (1/3) + (1/3) + (1/3) + (1/3) = 3
Block 2: 0
Total = 3
Counts for AC: Block 1: 1 + (1/3) + (1/3) + (1/3) = 2
Table of Counts
A3   
C310/3  
G5/35/32 
T11/6201/2
 ACGT

Block 2: (1/2) + (1/2) = 1
Total = 3
Counts for AG: Block 1: (1/3) + (1/3) + (1/3) + (1/3) + (1/3) = 5/3
Block 2: 0
Total = 5/3
Counts for AT: Block 1: (1/3)
Block 2: (1/2) + (1/2) + (1/2) = 3/2
Total: (1/3) + (3/2) = 11/6
Counts for CC: Block 1: 1 + (1/3) + (1/3) + (1/3) + (1/3) = 7/3
Block 2: (1/2) + (1/2) = 1
Total = 10/3
Counts for CG: Block 1: (1/3) + (1/3) + 1 = 5/3
Block 2: 0
Total = 5/3
Counts for CT: Block 1: (1/3) + (1/3) + (1/3) + (1/3) + (1/3) + (1/3) = 6/3 = 2
Block 2: 0
Total = 2
Counts for GG: Block 1: 0
Block 2: (1/2) + (1/2) + (1/2) + (1/2) = 2
Total: 2
Counts for GT: 0
Counts for TT: Block 1: 0
Block 2: 1/2
total: 1/2

To find the observed relative frequencies, divide the number of observed counts by the number of total possible pair counts.

Observed Relative Frequencies (qij)
A3/19   
C3/19(1O/3)/19  
G(5/3)/19(5/3)/192/19 
T(11/6)/192/190(1/2)/19
 ACGT
The elements of the table should sum to 1.

To find the expected frequencies, use the formulas

pij = pi2 if i=j
  = 2pipj if i not equal to j

where, pi = qii + [sum(i not equal to j)qij]/2.

Expected Relative Frequencies (pij)
A.1082   
C.2308.1231  
G.1270.1354.0373 
T.0837.0893.0491.0162
 ACGT
The elements of the table should sum to 1.

To find the BLOSUM80 scores, use the formula

2 * sij = 2 * ln[qij/pij] / ln(2)
and round to the nearest integer.

BLOSUM
A1   
C-11  
G-1-13 
T00-999,9991
 ACGT