STA 4953 (Spring
2001)
Exercise 6A, Due 3/8/2001
Introduction to GCG
GCG is a comprehensive genetic sequence analysis software package, first
developed in the 1970's at the University of Wisconsin. It is, therefore,
also referred to as the Wisconsin Package
Starting GCG and using genhelp
To start GCG on helix, just type gcg to the helix prompt.
Descriptions of all GCG programs can be found by running
genhelp. When you are viewing pages of this manual,
use <CTR>B to
go
back to the previous page.
Alternatively, you can also visit an online manual located here. When
you enter this
site for the first time, go here
(http://www.gcg.com/genhelp/), and register.
You will need to remember your username and password and use them each
time you visit this site. At the bottom of the page, click UNIX.
In this exercise, we shall learn to use a very simple program called
"composition" from GCG. This program
analyzes the base composition of a nucleic acid
sequence or the amino
acid composition of a protein sequence. Find a
description
of composition in the Genhelp manual
(accessible when logged onto GCG by typing "genhelp" or on the
online
manual) and print
a
copy.
In order to run composition, you may have to reformat your sequence.
Reformatting Sequence Files
GCG programs require input sequence files of a particular format.
Sequences retrieved from the databases like GenBank or
obtained by biologists in their labs are not necessarily of the right
format to be read by GCG. These sequences need to be
reformatted.
Look up descriptions of the programs
"fromgenbank" and "reformat" in the
Genhelp manual
(accessible when logged onto GCG by typing "genhelp") or on the
online
manual.
Use the descriptions of "fromgenbank" and "reformat"
to convert the GenBank hemoglobin files , you retrieved in Exercise #5. Also, convert the *.na and *.aa
files you prepared in Exercise #5 to GCG format.
Run "Composition" on the *.na and
*.aa files that you have reformatted
to GCG format above.
Exercise
Based on the outputs of the composition program on the appropriate
sequence data, answer the following questions:
-
What are the total base counts in the nucleic acid sequences
(those you retrieved from GenBank in Exercise
#5 and ran composition on) of the alpha1, alpha2, beta, and
delta genes of hemoglobin? What are the relative frequencies of the
bases? (You need to do some simple calculations to get the relative
frequencies. For instance, relative frequency = (specific
base count)/(total base count) ).
-
Repeat Question 1 for dinucleotides.
- Do hemoglobin molecules contain all 20 different
kinds of amino acids?
If not, write down the name(s), along with the 3-letter and 1-letter code
of the missing amino acid(s). (Helpful link to codon table)
- For each of the alpha, beta, and delta chain of hemoglobin,
answer the following questions(Helpful link to codon table):
Which amino acid is the most abundant in the chain?
What are the codons coding for that amino acid?
From the nucleic acid sequence, locate the codons coding for that
amino acid (you probably need
to
look at the annotations in the GenBank file to locate the beginning of the
coding sequence) and write down the count for each codon.
Chain
|
Alpha 1
|
Alpha 2
|
Beta
|
Delta
|
Amino Acid
|
|
|
|
|
Codons
|
|
|
|
|
Count
|
|
|
|
|
|