What is a Gene?

 

 

            A core desire in most all of us is to understand what we are and how we became this way.  More and more, it is becoming clear that we, in all of our complexity, are the outcome of a set of plans written in a simple four-letter code of the genetic material, DNA.  The DNA of our chromosomes is not read as one large unit, but is broken down into tens of thousands of pieces called ‘genes’ each one of which carries out a specific function.  Much as a Monet painting is made up of thousands of small blotches which combine to form the full image, we are the compilation of the expression and interaction of an estimated 80,000 genes.  To fully understand the complex form, we must first understand what the individual genes are and how they can contribute to the final outcome.

            Most of us are familiar with genetics, if not in a formal fashion.  We are all aware of family similarities, of how one person may have his father’s nose and his mother’s hair.  We even know that complex traits such as personality or temperament can be passed from parent to child.  For the most part, though, people tend to think of this transmission as amorphous, the blending of different colors of paint to form a new, unique color.  Until the early works of classical geneticists such as Gregor Mendel began to be appreciated, even scientists thought of traits this way.  When Mendel crossed two pea plants with distinct, true-breeding characteristics, only one of the two traits appeared in the next generation.  This demonstrated that one expression form of the gene, or allele, could be dominant over another allele.  More remarkably, when the peas of this first offspring generation or F1 were mated with each other, the next generation showed both the dominant and recessive forms.  This clearly demonstrated for the first time that traits did not mix together but were actual indivisible units.  Mendel’s mathematical modeling of the transmission of these traits showed that these units or genes, were passed from one generation to the next in an ordered fashion.

            The next question was where were the genes and what were they made of.  Early microscopists had noted the presence of small bodies called chromosomes in the nucleus which appeared to be divided between cells during cell division.  However, it was not until the turn of the 20th century when Morgan was able to demonstrate a direct correlation between the presence of the X chromosome and the expression of the white or red eye-color allele in Drosophila that it became clear that chromosomes were the carriers of genes.

            Chemical analysis of the chromosomes showed that they were an even mixture of proteins and nucleic acids, specifically Deoxyribose Nucleic Acid or DNA.  Early chemists found that DNA was made of three basic components-a pentose sugar called deoxyribose since the hydroxyl group at the 2’ carbon of ribose is removed; a phosphate atom and a nitrogenous base.  The bases are of two forms, named for the bases they resemble.  The bases adenine and guanine are Purines and consist of a two ring structure and the bases Thymidine and guanine are pyrimidines which are single ring structures.  Early chemists mistakenly measured equivalent quantities of each base, which contributed to the prevalent belief that DNA had only a structural role in the chromosomes.  Scientists at the time could not imagine that something as complex as a human being could be dictated from just four relatively similar bases.  Instead, it made much more sense that the far more varied proteins, made up of twenty different amino acids with widely varied physical properties, would have to be used to provide the variability and complexity seen in living organisms.  It took several decades of work to finally settle that the DNA was the carrier of the genetic information and the proteins in the chromosomes were the support.

            The first critical experiments were done over the course of several decades by two different groups.  Avery made the initial observation that when bacteria were killed by heating, some thing was left in the solution that could still transmit genetic information.  Specifically, a non-virulent strain lacking an external polysaccharide coat could be ‘transformed’ to a virulent form with the external coat present by mixing in an extract derived from heat killed virulent bacteria.  The key experiment was to treat the extract with different enzymes to determine what the transforming principle was.  Treatment with RNase and proteinase, enzymes which destroy RNA and protein, respectively, did not alter the transformation quality.  In contrast, DNase, which removed the DNA, completely blocked the alteration.  The scientific community, however, was unimpressed because of the prejudice towards proteins.  Even further experiments which used density centrifugation and gel electropheresis to purify the DNA made small inroads into the prevailing

            But that leaves the question of how four base pairs repeated in multiple combinations can act as a blueprint for living organisms.  The so-called Central Dogma is the model of action for DNA.  This holds that the action of heredity is mapped out by DNA, but carried out in large part by proteins.  The DNA is a repository of information, but cannot carry out any functions on its own.  Other molecules must read the DNA to create the functional units.  This means that the simple, four-letter language of nucleic acid must be converted to the more complex, 20-letter language of the amino acids which are the subunits of proteins.  The first step of this is that DNA is transcribed into a different nucleic acid, Ribonucleic acid or RNA.  To transcribe is to make an exact copy and traces its origin back to the ancient scribes who would use a template of original religious text and make numerous copies for distribution.  The original letters would not change, just the type of paper or ink might differ.  In the same way, the transcription of DNA involves using one of the two strands as a template to make an essentially identical copy with the exceptions that the sugar phosphate backbone uses ribose and the pyrimidine base uracil is used in place of thymidine.  A single type of RNA is not sufficient to carry out the next step, the translation of the RNA into protein.  To translate is to change from one language to another.  In this case, the RNA acts as the intermediate to translate from the nucleic acid language to the amino acid language.

            Three different RNA functions are required to carry out this process.  One RNA is the copy of the DNA which supplies the triplets which will be turned into Amino Acids.  Another brings the amino acid and reads the correct triplet for each one.  Finally, one RNA is required to bring the two other RNAs together in the correct order.  All three RNAs are encoded by DNA, but by a different polymerases.

            Mendel could only identify a gene as an indivisible unit which conferred a trait or characteristic onto an individual organism.  We now can see that a gene is not an indivisible unit, but rather a long stretch of DNA, perhaps covering several thousand base pairs.  The gene can be divided into two main components-the coding region and the promoter.  The coding region is the sequence which must be turned into mRNA to be translated into the final functional protein.  Since the average protein is around 300 amino acids in length, it would seem that this region should be relatively short, covering under one thousand base pairs.  However, the strange tricks of Mother Nature greatly expand this size requirement.  The initial transcript, that is the RNA molecule initially transcribed from a gene, is much larger than the final mature mRNA.  A major cause of this is intervening sequences or Introns which are interspersed between the expressed sequences or Exons.  To create the correct mRNA with the appropriate open reading frame, the introns must be removed in a process known as splicing.  The mRNA molecule still contains more nucleic acid than would be minimally required to encode the protein product.  There is often a region at the 5’ and 3’ end of the molecule known as the non-transcribed leader and tail, respectively.  While much of these regions may be without function, it is becoming clear that important functions are carried out by this ‘extra’ RNA.  Short signal sequences on the RNA can direct the rate of degradation of the message, the level of translation or whether the RNA will be sequestered away.

            An mRNA must have very specific beginning and end points.  If the start site of transcription is not correct, then the proper beginning point for translation may not be part of the RNA or may be preceded by a properly formatted but incorrect start signal.  The message must contain all of the necessary coding sequence, but continuing beyond the necessary information will waste energy and might introduce inappropriate signals for degradation or sequestration.  The two ends of the mRNA are dictated by important signal sequences.

            The 5’ end of transcription is started at a specific point.  The DNA sequences which regulate the start site and the level of transcription are known as the promoter.

 

 

production of RNA must be tightly regulated for beginning, end and level.

 

regulation is done by specific DNA sequences

 

the three RNA’s

 

translation into protein

 

structure of proteins

 

enzymatic, structural functions of proteins