Exercise 8B:
BLOSUM Matrix
STA
4953 (Spring
2001)
Due 3/29/2001
If you are given a "database" consisting of 2 DNA blocks:
A C A _ G |
A C G G C T |
A C C C C |
T C G G A T |
A T G T A |
T C G G A A |
A C G T A |
|
A C T T A |
Construct a BLOSUM80 matrix for this database. Show your work in detail.
Since we are constructing a BLOSUM 80 matrix, we will group sequences in
a block that have 80% or more identical bases. Thus, the given blocks
will be grouped as follows:
A C A _ G |
A C G G C T |
A C C C C |
T C G G A T |
A T G T A |
T C G G A A |
A C G T A |
|
A C T T A |
To determine the BLOSUM80 matrix for this database, we must construct 4
tables: Table of Counts; Table of Relative Frequencies; Table of Expected
Frequencies; BLOSUM80 matrix.
To construct the Table of Counts, we will count the number of pairs for
each possible base pairing: AA, AC, AG, AT, CC, CG, CT, GG, GT, TT. When
counting, the sequences in red are counted as 1 sequence, the sequences in
green are counted as 1 sequence.
The number of possible pairs per column without gaps for the first block
is
(3!/2!*1!) = 3. The number of possible pairs per
column with gaps for the first block
is
(2!/2!*0!) = 1. Since there are 4 ungapped columns,
we have 4*3 = 12 possible pairs for the ungapped columns. Since there
is one column with a gap, we have 1*1 = 1 possible pair for this column.
Thus, we have 12 + 1 = 13 total possible base pairings for the first
block.
For the second block, there are 6 * (2!/2!*0!) = 6 *
1 = 6 possible base pairings for the second block.
Hence, a total of 13 + 6 = 19 possible pairings in the database.
Counts for AA: Block 1: 1 + (1/3) + (1/3) + (1/3) + (1/3) + (1/3) +
(1/3) = 3
Block 2: 0
Total = 3
Counts for AC: Block 1: 1 + (1/3) + (1/3) + (1/3) = 2
Table of
Counts |
A | 3 | | | |
C | 3 | 10/3 | | |
G | 5/3 | 5/3 | 2 | |
T | 11/6 | 2 | 0 | 1/2 |
| A | C | G | T |
Block 2: (1/2) + (1/2) = 1
Total = 3
Counts for AG: Block 1: (1/3) + (1/3) + (1/3) + (1/3) + (1/3) = 5/3
Block 2: 0
Total = 5/3
Counts for AT: Block 1: (1/3)
Block 2: (1/2) + (1/2) + (1/2) = 3/2
Total: (1/3) + (3/2) = 11/6
Counts for CC: Block 1: 1 + (1/3) + (1/3) + (1/3) + (1/3) = 7/3
Block 2: (1/2) + (1/2) = 1
Total = 10/3
Counts for CG: Block 1: (1/3) + (1/3) + 1 = 5/3
Block 2: 0
Total = 5/3
Counts for CT: Block 1: (1/3) + (1/3) + (1/3) + (1/3) + (1/3) + (1/3) =
6/3 = 2
Block 2: 0
Total = 2
Counts for GG: Block 1: 0
Block 2: (1/2) + (1/2) + (1/2) + (1/2) = 2
Total: 2
Counts for GT: 0
Counts for TT: Block 1: 0
Block 2: 1/2
total: 1/2
To find the observed relative frequencies, divide the number of observed
counts by the number of total possible pair counts.
Observed Relative
Frequencies (qij) |
A | 3/19 | | | |
C | 3/19 | (1O/3)/19 | | |
G | (5/3)/19 | (5/3)/19 | 2/19 | |
T | (11/6)/19 | 2/19 | 0 | (1/2)/19 |
| A | C | G | T | The
elements of the table should sum to 1.
To find the expected frequencies, use the
formulas pij | =
pi2 if i=j |
| = 2pipj if i not equal to
j |
where, pi = qii + [sum(i not equal to
j)qij]/2.
Expected
Relative Frequencies (pij) |
A | .1082 | | | |
C | .2308 | .1231 | | |
G | .1270 | .1354 | .0373 | |
T | .0837 | .0893 | .0491 | .0162 |
| A | C | G | T |
The elements of the table should sum to 1.
To find the BLOSUM80 scores, use the formula
2 * sij = 2 *
ln[qij/pij] / ln(2) and round to the nearest
integer.
BLOSUM |
A | 1 | | | |
C | -1 | 1 | | |
G | -1 | -1 | 3 | |
T | 0 | 0 | -999,999 | 1 |
| A | C | G | T |
|