Cells are the smallest organized structural units
able to maintain an individual, albeit limited, life
span while carrying out a wide variety of functions.
Cells have evolved on earth during the past 3.5 billion
years, presumably orginating from suitable early molecular
aggregations. Each cell originates from another living
cell as postulated by R. Virchowin 1855 (“omnis cellula
e cellula”). The livingworld consists of two basic
types of cells: prokaryotic cells, which carry their
functional information in a circular genome without
a nucleus, and eukaryotic cells, which contain their
genome in individual chromosomes in a nucleus and
have a well-organized internal structure. Cells communicatewith
each other by means of a broad repertoire of molecular
signals. Great progress has been made since 1839,
when cells were first recognized as the “elementary
particles of organisms” by M. Schleiden and T. Schwann.
Today we understand most of the biological processes
of cells at the molecular level.
Eukaryotic cells
A eukaryotic cell consists of cytoplasm and a nucleus.
It is enclosed by a plasma membrane. The cytoplasm
contains a complex system of inner membranes that
form cellular structures (organelles). The main
organelles are the mitochondria (in which important
energy–delivering chemical reactions take place),
the endoplasmic reticulum (consisting of a series
of membranes in which glycoproteins and lipids are
formed), the Golgi apparatus (for certain transport
functions), and peroxisomes (for the formation or
degradation of certain substances). Eukaryotic cells
contain lysosomes, in which numerous proteins, nucleic
acids, and lipids are broken down. Centrioles, small
cylindrical particles made up of microtubules, play
an essential role in cell division. Ribosomes are
the sites of protein synthesis.
Nucleus of the Cell
The eukaryotic cell nucleus contains the genetic
information. It is enclosed by an inner and an outer
membrane, which contain pores for the transport
of substances between the nucleus and the cytoplasm.
The nucleus contains a nucleolus and a fibrous matrix
with different DNA–protein complexes.
Plasma membrane of the cell
The environment of cells, whether blood or other
body fluids, is water-based, and the chemical processes
inside a cell involve watersoluble molecules. In
order to maintain their integrity, cells must prevent
water and other molecules from flowing in or out
uncontrolled. This is accomplished by awater-resistant
membrane composed of bipartite molecules of fatty
acids, the plasma membrane. These molecules are
phospholipids arranged in a double layer (bilayer)
with a fatty interior. The plasmamembrane itself
contains numerous molecules that traverse the lipid
bilayer once or many times to perform special functions.
Different types of membrane proteins can be distinguished:
(i) transmembrane proteins used as channels for
transport ofmolecules into or out of the cell, (ii)
proteins connected with each other to provide stability,
(iii) receptor molecules involved in signal transduction,
and (iv) molecules with enzyme function to catalyze
internal chemical reactions in response to an external
signal. (Figure redrawn from Alberts et al., 1998.)
Comparison of animal and plant cells
Plant and animal cells have many similar characteristics.
One fundamental difference is that plant cells contain
chloroplasts for photosynthesis. In addition, plant
cells are surrounded by a rigid wall of cellulose
and other polymeric molecules and contain vacuoles
for water, ions, sugar, nitrogen–containing compounds,
orwaste products. Vacuoles are permeable to water
but not to the other substances enclosed in the
vacuoles. (Figures in A, B and D adapted from de
Duve, 1984.)
References
Alberts, B. et al.: Essential Cell Biology. An Introduction
to the Molecular Biology of the Cell.
Garland Publishing Co., New York, 1998.
de Duve, C.: A Guided Tour of the Living Cell. Vol.
I and II. Scientific American Books, Inc., New
York, 1984.
Lodish, H. et al.: Molecular Cell Biology (with
an
animated CD-ROM). 4th ed.W.H. Freeman &
Co., New York, 2000.
Some Types of Chemical Bonds
Close to 99% of theweight of a living cell is composed
of just four elements: carbon (C), hydrogen (H), nitrogen
(N), and oxygen (O). Almost 50% of the atoms are hydrogen
atoms; about 25% are carbon, and 25% oxygen. Apart from
water (about 70% of the weight of the cell) almost all
components are carbon compounds. Carbon, a small atom
with four electrons in its outer shell, can form four
strong covalent bonds with other atoms. But most importantly,
carbon atoms can combine with each other to build chains
and rings, and thus large complex molecules with specific
biological properties.
Compounds of hydrogen (H), oxygen (O),
and carbon (C)
Four simple combinations of these atoms occur frequently
in biologically important molecules: hydroxyl (—OH;
alcohols), methyl (—CH3), carboxyl (—COOH), and
carbonyl (C=O; aldehydes and ketones) groups. They
impart to the molecules characteristic chemical
properties, including possibilities to form compounds.
Acids and esters
Many biological substances contain a carbon– oxygen
bond with weak acidic or basic (alkaline) properties.
The degree of acidity is expressed by the pH value,
which indicates the concentration of H+ ions in
a solution, ranging from 10 –1 mol/L (pH 1, strongly
acidic) to 10–14 mol/L (pH 14, strongly alkaline).
Pure water contains 10–7 moles H+ per liter (pH
7.0). An ester is formed when an acid reacts with
an alcohol. Esters are frequently found in lipids
and phosphate compounds.
Carbon–nitrogen bonds (C—N)
C—N bonds occur in many biologically important molecules:
in amino groups, amines, and amides, especially
in proteins. Of paramount significance are the amino
acids (cf. p. 30), which are the subunits of proteins.
All proteins have a specific role in the functioning
of an organism.
References
Alberts, B. et al.: Molecular Biology of the Cell.
3rd ed. Garland Publishing Co., New York,
1994.
Koolman, J., Röhm K.H.: Color Atlas of Biochemistry.
Thieme, Stuttgart – New York, 1996.
Stryer, L.: Biochemistry, 4th ed. W.H. Freeman &
Co., New York, 1995.
Carbohydrates
Carbohydrates in their various chemical forms and their
derivatives are an important group of biomolecules for
genetics. They provide the basic framework of DNA and
RNA. Their flexibility makes them especially suitable
for transferring genetic information from cell to cell.
Along with nucleic acids, lipids, and proteins, carbohydrates
are one of the most important classes of biomolecules.
Their main functions can be classified into three groups:
(i) to deliver and store energy, (ii) to help form DNA
and RNA, the information-carrying molecules (see pp.
34 and 38), (iii) to help form cell walls of bacteria
and plants. Carbohydrates are often bound to proteins
and lipids. As polysaccharides, carbohydrates are important
structural elements of the cell walls of animals, bacteria,
and plants. They form cell surface structures (receptors)
used in conducting signals from cell to cell. Combined
with numerous proteins and lipids, carbohydrates are
important components of numerous cell structures. Finally,
they function to transfer and store energy in intermediary
metabolism.
Monosaccharides
Monosaccharides (simple sugars) are aldehydes (—C=O,
—H) or ketones (>C=O) with two or more hydroxy groups
(general structural formula (CH2O)n). The aldehyde
or ketone group can react with one of the hydroxy
groups to form a ring. This is the usual configuration
of sugars that have five or six C atoms (pentoses
and hexoses). The C atoms are numbered. The D- and
the L-forms of sugars are mirror-image isomers of
the same molecule. The naturally occurring forms
are the D-(dextro) forms. These further include
!- and "- forms as stereoisomers. In the cyclic
forms the C atoms of sugars are not on a plane,
but three-dimensionally take the shape of a chair
or a boat. The !-D-glucopyranose configuration (glucose)
is the energetically favored, since all the axial
positions are occupied byH atoms. The arrangement
of the —OH groups can differ, so that stereoisomers
such asmannose or galactose are formed.
Disaccharides
These are compounds of two monosaccharides. The
aldehyde or ketone group of one can bind to an "-hydroxy
or a !-hydroxy group of the other. Sucrose and lactose
are frequently occurring disaccharides.
Derivatives of sugars
When certain hydroxy groups are replaced by other
groups, sugar derivatives are formed. These occur
especially in polysaccharides. In a large group
of genetically determined syndromes, complex polysaccharides
can not be degraded owing to reduced or absent enzyme
function (mucopolysaccharidoses, mucolipidoses)
(see p. 356).
Polysaccharides
Short (oligosaccharides) and long chains of sugars
and sugar derivatives (polysaccharides) form essential
structural elements of the cell. Complex oligosaccharides
with bonds to proteins or lipids are part of cell
surface structures, e.g., blood group antigens.
References
Gilbert-Barness, E., Barness, L.: Metabolic Diseases.
Foundations of Clinical Management,
Genetics, and Pathology. Eaton Publishing,
Natick, MA 01760, USA, 2000.
Scriver, C. R., Beaudet, A. L., Sly, W. S., Valle,
D.,
editors: The Metabolic and Molecular Bases
of Inherited Disease. 8th ed., McGraw-Hill,
New York, 2001.
Lipids (Fats)
Lipids usually occur as large molecules (macromolecules).
They are essential components of membranes and precursors
of other important biomolecules, such as steroids for
the formation of hormones and other molecules for transmitting
intercellular signals. In addition to fatty acids, compounds
with carbohydrates (glycolipids), phosphate groups (phospholipids),
and other molecules are especially important. A special
characteristic is their pronounced polarity, with a
hydrophilic (water-attracting) and a hydrophobic (water-repelling)
region. This makes lipids especially suited for forming
the outer limits of the cell (cell membrane).
Fatty acids
Fatty acids are composed of a hydrocarbon chain
with a terminal carboxylic acid group. Thus, they
are polar, with a hydrophilic (—COOH) and a hydrophobic
end (—CH3), and differ in the length of the chain
and its degree of saturation. When one or more double
bonds occur in the chain, the fatty acid is referred
to as unsaturated. A double bond makes the chain
relatively rigid and causes a kink. Fatty acids
form the basic framework of many important macromolecules.
The free carboxyl group (—COOH) of a fatty acid
is ionized (—COO–).
Lipids
Fatty acids can combine with other groups of molecules
to form other types of lipids. As water-insoluble
(hydrophobic) molecules, they are soluble only in
organic solvents. The carboxyl group can enter into
an ester or an amide bond. Triglycerides are compounds
of fatty acids with glycerol. Glycolipids (lipids
with sugar residues) and phospholipids (lipids with
a phosphate group attached to an alcohol derivative)
are the structural bases of important macromolecules.
Their intracellular degradation requires the presence
of numerous enzymes, disorders of which have a genetic
basis and lead to numerous genetically determined
diseases. Sphingolipids are an important group of
molecules in biological membranes. Here, sphingosine,
instead of glycerol, is the fatty acid-binding molecule.
Sphingomyelin and gangliosides contain sphingosine.
Gangliosides make up 6% of the central nervous system
lipids. They are degraded by a series of enzymes.
Genetically determined disorders of their catabolism
lead to severe diseases, e.g., Tay–Sachs disease
due to defective degradation of ganglioside GM2
(deficiency of !-N-acetylhexosaminidase).
Lipid aggregates
Owing to their bipolar properties, fatty acids can
form lipid aggregates in water. The hydrophilic
ends are attracted to their aqueous surroundings;
the hydrophobic ends protrude from the surface of
thewater and form a surface film. If completely
under the surface, they may form a micelle, compact
and dry within. Phospholipids and glycolipids can
form two-layered membranes (lipid membrane bilayer).
These are the basic structural elements of cell
membranes, which prevent molecules in the surrounding
aqueous solution from invading the cell.
Other lipids: steroids
Steroids are small molecules consisting of four
different rings of carbon atoms. Cholesterol is
the precursor of five major classes of steroid hormones:
prostagens, glucocorticoids, mineralocorticoids,
androgens, and estrogens. Each of these hormone
classes is responsible for important biological
functions such as maintenance of pregnancy, fat
and protein metabolism, maintenance of blood volume
and blood pressure, and development of sex characteristics.
Nucleotides and Nucleic
Acids
Nucleotides participate in almost all biological processes.
They are the subunits of DNA and RNA, the molecules
that carry genetic information (see p. 34). Nucleotide
derivatives are involved in the biosynthesis of numerous
molecules; they convey energy, are part of essential
coenzymes, and regulate numerous metabolic functions.
Since all these functions are based on genetic information
of the cells, nucleotides represent a central class
of molecules for genetics. Nucleotides are composed
of three integral parts: phosphates, sugars, and purine
or pyramidine bases.
Phosphate groups
Phosphate groups may occur alone (monophosphates),
in twos (diphosphates) or in threes (triphosphates).
They are normally bound to the hydroxy group of
the C atom in position 5 of a five-C-atom sugar
(pentose).
Sugar residues
The sugar residues in nucleotides are usually derived
from either ribose (in ribonucleic acid, RNA) or
deoxyribose (in deoxyribonucleic acid, DNA) (ribonucleoside
or deoxyribonucleoside). These are the base plus
the respective sugar.
Nucleotide bases of pyrimidine
Cytosine (C), thymine (T), and uracil (U) are the
three pyrimidine nucleotide bases. They differ from
each other in their side chains (—NH2 on C4 in cytosine,
—CH3 on C5 in thymine, O on C4 in uracil) and in
the presence or absence of a double bond between
N3 and C4 (present in cytosine).
Nucleotide bases of purine
Adenine (A) and guanine (G) are the two nucleotide
bases of purine. They differ in their side chains
and a double bond (between N1 and C6).
Amino Acids
Amino acids are the basic structural units of proteins.
A defined linear sequence of the amino acids and a specific
three-dimensional structure confer quite specific physicochemical
properties to each protein. An amino acid consists of
a “central” carbon with one bond to an amino group (—NH2)
one to a carboxyl group (—COOH) one to a hydrogen atom,
and the fourth to a variable side chain. Amino acids
are ionized in neutral solutions, since the amino group
takes on a proton (—NH3 +) and the carboxyl group dissociates
(—COO–). The side chain determines the distinguishing
characteristics of an amino acid, including the size,
form, electrical charge or hydrogen-bonding ability,
and the total specific chemical reactivity. Amino acids
can be differentiated according to whether they are
neutral or not neutral (basic or acidic) and whether
they have a polar or nonpolar side chain. Each amino
acid has its own three-letter and one-letter abbreviations.
Essential amino acids in vertebrates are His, Ile, Leu,
Lys, Met, Phe, Thr, Tyr, and Val.
Neutral amino acids, nonpolar side chains
All neutral amino acids have a —COO– and an —NH3
+ group. The simplest amino acids have a simple
aliphatic side chain. For glycine this is merely
a hydrogen atom (—H); for alanine it is a methyl
group (—CH3). Larger side chains occur on valine,
leucine, and isoleucine. These larger side chains
are hydrophobic (water-repellent) and make their
respective amino acids less water-soluble than do
hydrophilic (waterattracting) chains. Proline has
an aliphatic side chain that, unlike in other amino
acids, is bound to both the central carbon and to
the amino group, so that a ringlike structure is
formed. Aromatic side chains occur in phenylalanine
(a phenyl group bound via a methylene group (—CH2—)
and tryptophan (an indol ring bound via a methylene
group). These amino acids are very hydrophobic.
Two amino acids contain sulfur (S) atoms. In cysteine
this is in the form of a sulfhydryl group (—SH);
in methionine it is a thioether (—S—CH3). Both are
hydrophobic. The sulfhydryl group in cysteine is
very reactive and participates in forming disulfide
bonds (—S—S—). These play an important role in stabilizing
the three-dimensional forms of proteins
Hydrophilic amino acids, polar side chains
Serine, threonine, and tyrosine contain hydroxyl
groups (—OH). Thus, they are hydrolyzed forms of
glycine, alanine, and phenylalanine. The hydroxyl
groups make them hydrophilic and more reactive than
the nonhydrolyzed forms. Asparagine and glutamine
both contain an amino and an amide group. At physiological
pH their side chains are negatively charged.
Charged amino acids
These amino acids have either two ionized amino
groups (basic) or two carboxyl groups (acidic).
Basic amino acids (positively charged) are arginine,
lysine, and histidine. Histidine has an imidazole
ring and can be uncharged or positively charged,
depending on its surroundings. It is frequently
found in the reactive centers of proteins, where
it takes part in alternating bonds (e.g., in the
oxygen-binding region of hemoglobin). Aspartic acid
and glutamic acid each have two carboxyl groups
(—COOH) and are thus (as a rule) acidic. Seven of
the 20 amino acids have slightly ionizable side
chains, making them highly reactive (Asn, Glu, His,
Cys, Tyr, Lys, Arg).
Proteins
Proteins are involved in practically all chemical processes
in living organisms. Their universal significance is
apparent in that, as enzymes, they drive chemical reactions
in living cells. Without enzymatic catalysis, the macromolecules
involved would not react spontaneously. All enzymes
are the products of one or more genes. Proteins also
serve to transport small molecules, ions, or metals.
Proteins have important functions in cell division during
growth and in cell and tissue differentiation. They
control the coordination of movements by regulating
muscle cells and the production and transmission of
impulses within and between nerve cells. They control
blood homeostasis (blood clotting) and immune defense.
They carry out mechanical functions in skin, bone, blood
vessels, and other areas.
Joining of amino acids
The basic units of proteins, amino acids, can be
joined together very easily owing to their dipolar
ionization (zwitterions). The carboxyl group of
one amino acid binds to the amino group of the next
(a peptide bond, sometimes also referred to as an
amide bond). When many amino acids are bound together
by peptide bonds, they form a polypeptide chain.
Each polypeptide chain has a defined direction,
determined by the amino group (—NH2) at one end
and the carboxyl group (—COOH) at the other. By
convention, the amino group represents the beginning,
and the carboxyl group the end of a peptide chain.
Primary structure of a protein
sequence of insulin by Frederick Sanger in 1955
was a landmark accomplishment. It showed for the
first time that a protein, in genetic terms a gene
product, has a precisely defined amino acid sequence.
The amino acid sequence yields important information
about the function and evolutionary origin of a
protein. The primary structure of a protein is its
amino acid sequence in a one-dimensional plane.
As are many other proteins, insulin is synthesized
from precursor molecules: preproinsulin and proinsulin.
Preproinsulin consists of 110 amino acids including
24 amino acids of a leader sequence at the amino
end. The leader sequence directs the molecule to
the correct site in the cell and is then removed
to yield proinsulin. This is converted to insulin
by removal of the connecting peptide (C peptide)
consisting of amino acids 31–65. Amino acids 1–30
form the B chain; the remaining (66–86) amino acids
form the A chain. The A and the B chains are connected
by two disulfide bridges joining the cysteines in
position 7 and position 20 of the A chain to those
of positions 7 and 19, respectively, of the B chain.
The A chain contains a disulfide bridge between
positions 6 and 11. The positions of the cysteines
reflect the spatial arrangements of the amino acids,
called the secondary structure.
Secondary structural units, the ! helix
and the " sheet
Two basic units of global proteins are ! helix formation
(! helix) and a flat sheet (" pleated sheet). Panel
C shows a schematic drawing of a unit of one ! helix
between two "-sheets, called a "!" unit (Figure
redrawn from Stryer, 1995).
Tertiary structure of insulin
All functional proteins assume a well-defined three-dimensional
structure. This structure is defined by the sequence
of amino acids and their physicochemical properties.
Tertiary structure is defined by the spatial arrangement
of amino acid residues that are far apart in the
linear sequence. The quaternary structure is the
folding of the protein resulting in a specific three-dimensional
spatial arrangement of the subunits and the nature
of their contacts. The correct quaternary structure
ensures proper function. (Figure from Koolman &
Röhm, 1996).
References
Koolman, J., Röhm, K.-H.: Color Atlas of Biochemistry.
Thieme, Stuttgart–New York,
1996.
Stryer, L.: Biochemistry, 4th ed. W.H. Freeman &
Co., New York, 1995.
DNA as Carrier of Genetic
Information
Although DNA was discovered in 1869 by Friedrich Miescher
as a new, acidic, phosphoruscontaining substance made
up of very large molecules that he named “nuclein”,
its biological rolewas not recognized. In 1889 Richard
Altmann introduced the term “nucleic acid”. By 1900
the purine and pyrimidine bases were known. Twenty years
later, the two kinds of nucleic acids, RNA and DNA,
were distinguished. An incidental but precise observation
(1928) and relevant investigations (1944) indicated
that DNA could be the carrier of genetic information.
The observation of Griffith
In 1928 the English microbiologist Fred Griffith
made a remarkable observation. While investigating
various strains of Pneumococcus, he determined that
mice injected with strain S (smooth) died (1). On
the other hand, animals injected with strain R (rough)
lived (2). When he inactivated the lethal S strain
by heat, there were no sequelae, and the animal
survived (3). Surprisingly, a mixture of the nonlethal
R strain and the heat-inactivated S strain had a
lethal effect like the S strain (4). And he found
normal living pneumococci of the S strain in the
animal’s blood. Apparently, cells of the R strain
were changed into cells of the S strain (transformed).
For a time, this surprising result could not be
explained and was met with skepticism. Its relevance
for genetics was not apparent.
The transforming principle is DNA
Griffith’s findings formed the basis for investigations
by Avery, MacLeod, and McCarty (1944). Avery and
co-workers at the Rockefeller Institute in New York
elucidated the chemical basis of the transforming
principle. From cultures of an S strain (1) they
produced an extract of lysed cells (cell-free extract)
(2). After all its proteins, lipids, and polysaccharides
had been removed, the extract still retained the
ability to transform pneumococci of the R strain
to pneumococci of the S strain (transforming principle)
(3). With further studies, Avery and co-workers
determined that this was attributed to the DNA alone.
Thus, the DNA must contain the corre corresponding
genetic information. This explained Griffith’s observation.
Heat inactivation had left the DNA of the bacterial
chromosomes intact. The section of the chromosome
with the gene responsible for capsule formation
(S gene) could be released fromthe destroyed S cells
and be taken up by some R cells in subsequent cultures.
After the S gene was incorporated into its DNA,
an R cellwas transformed into an S cell (4). Page
90 shows howbacteria can take up foreign DNA so
that some of their genetic attributes will be altered
correspondingly.
Genetic information is transmitted by
DNA alone
The final evidence that DNA, and no other molecule,
transmits genetic information was provided by Hershey
and Chase in 1952. They labeled the capsular protein
of bacteriophages (see p. 88) with radioactive sulfur
(35S) and the DNA with radioactive phosphorus (32P).
When bacteria were infected with the labeled bacteriophage,
only 32P (DNA) entered the cells, and not the 35S
(capsular protein). The subsequent formation of
new, complete phage particles in the cell proved
that DNA was the exclusive carrier of the genetic
information needed to form new phage particles,
including their capsular protein. Next, the structure
and function of DNA needed to be clarified. The
genes of all cells and some viruses consist of DNA,
a long-chained threadlike molecule.
References
Avery, O.T.,MacLeod, C.M., McCarty,M.: Studies
on the chemical nature of the substance inducing
transformation of pneumococcal
types. J. Exp. Med. 79:137–158, 1944.
Griffith, F., The significance of pneumoccocal
types. J. Hyg. 27:113–159, 1928.
Hershey, A.D., Chase, M.: Independent functions
of viral protein and nucleic acid in
growth of bacteriophage. J. Gen. Physiol.
36:39–56, 1952.
Judson, M.F.: The Eighth Day of Creation.
Makers of the Revolution in Biology. Expanded
Edition. Cold Spring Harbor Laboratory
Press, New York, 1996.
McCarty, M.: The Transforming Principle. Discovering
that Genes are made of DNA.W.W.
Norton & Co., New York–London, 1985.
DNA and Its Components
The information for the development and specific functions
of cells and tissues is stored in the genes. A gene
is a portion of the genetic information, definable according
to structure and function. Genes lie on chromosomes
in the nuclei of cells. They consist of a complex longchained
molecule, deoxyribonucleic acid (DNA). In the following,
the constituents of the DNA molecule will be presented.
DNA is a nucleic acid. Its chemical components are nucleotide
bases, a sugar (deoxyribose), and phosphate groups.
They determine the threedimensional structure of DNA,
from which it derives its functional consequence.
Nucleotide bases
The nucleotide bases in DNA are heterocyclic molecules
derived from either pyrimidine or purine. Five bases
occur in the two types of nucleic acids, DNA and
RNA. The purine bases are adenine (A) and guanine
(G). The pyrimidine bases are thymine (T) and cytosine
(C) in DNA. In RNA, uracil (U) is present instead
of thymine. The nucleotide bases are part of a subunit
of DNA, the nucleotide. This consists of one of
the four nucleotide bases, a sugar (deoxyribose),
and a phosphate group. The nitrogen atom in position
9 of a purine or in position 1 of a pyrimidine is
bound to the carbon in position 1 of the sugar (N-glycosidic
bond). Ribonucleic acid (RNA) differs from DNA in
two respects: it contains ribose instead of deoxyribose
(unlike the latter, ribose has a hydroxyl group
on the position 2 carbon atom) and uracil (U) instead
of thymine. Uracil does not have a methyl group
at position C5.
Nucleotide chain
DNA is a polymer of deoxyribonucleotide units. The
nucleotide chain is formed by joining a hydroxyl
group on the sugar of one nucleotide to the phosphate
group attached to the sugar of the next nucleotide.
The sugars linked together by the phosphate groups
form the invariant part of the DNA. The variable
part is in the sequence of the nucleotide bases
A, T, C, and G. A DNA nucleotide chain is polar.
The polarity results from the way the sugars are
attached to each other. The phosphate group at position
C5 (the 5! carbon) of one sugar joins to the hydroxyl
group at position C3 (the 3! carbon) of the next
sugar by means of a phosphate diester bridge. Thus,
one end of the chain has a 5! triphosphate group
free and the other end has a 3! hydroxy group free
(5! end and 3! end, respectively). By convention,
the sequence of nucleotide bases is written in the
5! to 3! direction.
Spatial relationship
The chemical structure of the nucleotide bases determines
a defined spatial relationship. Within the double
helix, a purine (adenine or guanine) always lies
opposite a pyrimidine (thymine or cytostine). Three
hydrogen-bond bridges are formed between cytosine
and guanine, and two between thymine and adenine.
Therefore, only guanine and cytosine or adenine
and thymine can lie opposite and pair with each
other (complementary base pairs G–C and A–T). Other
spatial relationships are not usually possible.
DNA double strand
DNA forms a double strand. As a result of the
spatial relationships of the nucleotide bases, a
cytosine will always lie opposite to a guanine
and a thymine to an adenine. The sequence of
the nucleotide bases on one strand of DNA (in
the 5! to 3! direction) is complementary to the
nucleotide base sequence (or simply the base
sequence) of the other strand in the 3! to 5!
direction. The specificity of base pairing is the
most important structural characteristic of
DNA.
DNA Structure
In 1953, JamesWatson and Francis Crick recognized that
DNA must exist as a double helix. This structure explains
both important functional aspects: replication and the
transmission of genetic information. The elucidation
of the structure of DNA is considered as the beginning
of the development of modern genetics. With it, gene
structure and function can be understood at the molecular
level.
DNA as a double helix
The double helix is the characteristic structural
feature of DNA. The two helical polynucleotide chains
are wound around each other along a common axis.
The nucleotide base pairs (bp), either A–T or G–C,
lie within. The diameter of the helix is 20 Å (2!10–7
mm). Neighboring bases lie 3.4 Å apart. The helical
structure repeats itself at intervals of 34 Å, or
every ten base pairs. Because of the fixed spatial
relationship of the nucleotide bases within the
double helix and opposite each other, the two chains
of the double helix are exactly complementary. The
form illustrated here is the so-called B form (BDNA).
Under certain conditions, DNA can also assume other
forms (Z-DNA, A-DNA, see p. 41).
Replication
Since the nucleotide chains lying opposite each
other within the double helix are strictly complementary,
each can serve as a pattern (template) for the formation
(replication) of a new chain when the helix is opened.DNA
replication is semiconservative, i.e., one completely
new strand will be formed and one strand retained.
Denaturation and renaturation
The noncovalent hydrogen bonds between the nucleotide
base pairs are weak. Nevertheless, DNA is stable
at physiological temperatures because it is a very
long molecule. The two complementary strands can
be separated (denaturation) by means of relativelyweak
chemical reagents (e.g., alkali, formamide, or urea)
or by careful heating. The resulting single-stranded
molecules are relatively stable. With cooling, complementary
single strands can reunite to form double-stranded
molecules (renaturation). Noncomplementary single
strands do not unite. This is the basis of an important
method of identifying nucleic acids: With a single
strand of defined origin, it can be determined with
which other single strand it will bind (hydridize).
The hybridization of complementary segments of DNA
is an important principle in the analysis of genes.
Transmission of genetic information
Genetic information lies in the sequence of nucleotide
base pairs (A–T or G–C). A sequence of three base
pairs represents a codeword (codon) for an amino
acid. The codon sequence determines a corresponding
sequence of amino acids. These form a polypeptide
(gene product). The sequence of the nucleotide bases
is first transferred (transcription) from one DNA
strand to a further information-bearing molecule
(mRNA, messenger RNA). Then the nucleotide base
sequence of themRNA serves as a template for a sequence
of amino acids corresponding to the order of the
codons (translation). A gene can be defined as a
section of DNA responsible for the formation of
a polypeptide (one gene, one polypeptide). One or
more polypeptides form a protein. Thus, several
genes may be involved in the formation of a protein.
References
Crick, F.:What Mad Pursuit. A Personal View of
Scientific Discovery. Basic Books, Inc., New
York, 1988.
Judson, H.F.: The Eighth Day Creation. Makers of
the Revolution in Biology. Expanded Edition.
Cold Spring Harbor Laboratory Press,
New York, 1996.
Stent, G.S. , ed.: The Double Helix.Weidenfeld &
Nicolson, London, 1981.
Watson, J.D.: The Double Helix. A Personal Account
of the Structure of DNA. Atheneum,
New York, 1968.
Watson, J.D., Crick, F.H.C.: Molecular structure
of nucleic acid. Nature 171:737–738, 1953.
Watson, J.D., Crick, F.H.C.: Genetic implications
of the structure of DNA. Nature 171:964–
967, 1953.
Wilkins, M.F.H., Stokes, A.R., Wilson, H.R.:
Molecular structure of DNA. Nature
171:738–740, 1953.
Alternative DNA Structures
Gene expression and transcription can be influenced
by changes of DNA topology. However, this type of control
of gene expression is relatively universal and nonspecific.
Thus, it is more suitable for permanent suppression
of transcription, e.g., in genes that are expressed
only in certain tissues or are active only during the
embroyonic period and later become permanently inactive.
Three forms of DNA
The DNA double helix does not occur as a single
structure, but rather represents a structural family
of different types. The original classic form, determined
byWatson and Crick in 1953, is B-DNA. The essential
structural characteristic of B-DNA is the formation
of two grooves, one large (major groove) and one
small (minor groove). There are at least two further,
alternative forms of the DNA double helix, Z-DNA
and the rare form A-DNA. While B-DNA forms a right-handed
helix, Z-DNA shows a left-handed conformation. This
leads to a greater distance (0.77 nm) between the
base pairs than in B-DNA and a zigzag form (thus
the designation Z-DNA). A-DNA is rare. It exists
only in the dehydrated state and differs from the
B form by a 20-degree rotation of the perpendicular
axis of the helix. A-DNA has a deep major groove
and a flat minor groove (Figures fromWatson et al,
1987).
Major and minor grooves in B-DNA
The base pairing in DNA (adenine–thymine and guanine–cytosine)
leads to the formation of a large and a small groove
because the glycosidic bonds to deoxyribose (dRib)
are not diametrically opposed. In B-DNA, the purine
and pyrimidine rings lie 0.34 nm apart. DNA has
ten base pairs per turn of the double helix. The
distance from one complete turn to the next is 3.4
nm. In this way, localized curves arise in the double
helix. The result is a somewhat larger and a somewhat
smaller groove.
Transition from B-DNA to Z-DNA
B-DNA is a perfect regular double helix except that
the base pairs opposite each other do not lie exactly
at the same level. They are twisted in a propeller-like
manner. In this way, DNA can easily be bent without
causing essential changes in the local structures.
In Z-DNA the sugar–phosphate skeleton has a zigzag
pattern; the single Z-DNA groove has a greater density
of negatively charged molecules. Z-DNA may occur
in limited segments in vivo. A segment of B-DNA
consisting of GC pairs can be converted into Z-DNA
when the bases are rotated 180 degrees. Normally,
Z-DNA is thermodynamically relatively unstable.
However, transition to Z-DNA is facilitated when
cytosine is methylated in position 5 (C5). The modification
of DNA by methylation of cytosine is frequent in
certain regions of DNA of eukaryotes. There are
specific proteins that bind to Z-DNA, but their
significance for the regulation of transcription
is not clear.
References
Stryer, L.: Biochemistry, 4th ed. W.H. Freeman &
Co., New York, 1995.
Watson, J.D. et al.: Molecular Biology of the
Gene. 3 rd ed. Benjamin/Cummings Publishing
Co., Menlo Park, California, 1987.
DNA Replication
DNA synthesis involves a highly coordinated action of
many proteins. Precision and speed are required. The
two new DNA chains are assembled at a rate of about
1000 nucleotides per second in E. coli. The principal
enzymatic proteins are polymerases, which carry out
template- directed synthesis; helicases, which separate
the two strands to generate the replication fork (see
D); primases, which initiate chain synthesis at preferred
sites; initiation proteins, which recognize the origin
of replication point; and proteins that remodel the
double helix. The entire complex is called the replisome.
In their paper elucidating the structure of DNA, Watson
and Crick (1953) noted in closing, “It has not escaped
our attention that this structure immediately suggests
a copying mechanism for the genetic material,” at that
time an unsolved problem. Although biochemically complex,
DNA replication is genetically relatively simple. During
replication, each strand of DNA serves as a template
for the formation of a new strand (semiconservative
replication).
Prokaryote replication begins at a single
site
In prokaryote cells, replication begins at a defined
point in the ring-shaped bacterial chromosome, the
origin of replication (1). From here, new DNA is
formed at the same speed in both directions until
the DNA has been completely duplicated and two chromosomes
are formed. Replication can be visualized by autoradiography
after the newly replicated DNA has incorporated
tritium (3H)-labeled thymidine (2).
Eukaryote replication begins at several
sites
DNA synthesis occurs during a defined phase of the
cell cycle (S phase). This would take a very long
time if there were only one starting point. However,
replication of eukaryotic DNA begins at numerous
sites (replicons) (1). It proceeds in both directions
from each replicon until neighboring replicons fuse
(2) and all of the DNA is duplicated (3). The electron
micrograph (4) shows replicons at three sites.
Scheme of replication
NewDNA is synthesized in the 5! to 3! direction,
but not in the 3! to 5! direction. A new nucleotide
cannot be attached to the 5!-OH end of the newnucleotide
chain. Only at the 3! end can nucleotides be attached
continuously. New DNA at the 5! end is replicated
in small segments. This represents an obstacle at
the end of a chromosome (telomere, see p. 180).
Replication fork
At the replication fork, each of the two DNA strands
serves as a template for the synthesis of new DNA.
First, the double helix at the replication fork
region is unwound by an enzyme system (topoisomerases).
Since the parent strands are antiparallel, DNA replication
can proceed continuously in only one DNA strand
(5! to 3! direction) (leading strand). Along the
3! to 5! strand (lagging strand), the new DNA is
formed in small segments of 1000–2000 bases (Okazaki
fragments). In this strand a short piece of RNA
is required as a primer to start replication. This
is formed by an RNA polymerase (primase). The RNA
primer is subsequently removed; DNA is inserted
into the gap by polymerase I and, finally, the DNA
fragments are linked by DNA ligase. The enzyme responsible
for DNA synthesis (DNA polymerase III) is complex
and comprises several subunits. There are different
enzymes for the leading and lagging strands in eukaryotes.
During replication, mistakes are eliminated by a
complex proof-reading mechanism that removes any
incorrectly incorporated bases and replaces them
with the correct ones.
References
Cairns, J.: The bacterial chromosome and its
manner of replication as seen by autoradiography.
J. Mol. Biol. 6:208–213, 1963.
Lodish, H. et al.: Molecular Cell Biology. 4th ed.
Scientific American Books, F.H. Freeman &
Co., New York, 2000.
Marx, J.: How DNA replication originates.
Science 270:1585–1587, 1995.
Meselson, M., Stahl, F.W.: The replication of
DNA in Escherichia coli. Proc. Natl. Acad. Sci.
44:671–682, 1958.
Watson, J.D. et al.: Molecular Biology of the
Gene, 3 rd ed. Benjamin/Cummings Publishing
Co., Menlo Park, California, 1987.
The Flow of Genetic
Information: Transcription and
Translation
The information contained in the nucleotide sequence
of a gene must be converted into useful biological function.
This is accomplished by proteins, either directly, by
being involved in a biochemical pathway, or indirectly,
by regulating the activity of a gene. The flowof genetic
information is unidirectional and requires two major
steps: transcription and translation. First, the information
of the coding sequences of a gene is transcribed into
an intermediary RNA molecule, which is synthesized in
sequences that are precisely complementary to those
of the coding strand of DNA (transcription). During
the second major step the sequence information in the
messenger RNA molecule (mRNA) is translated into a corresponding
sequence of amino acids (translation). The length and
sequence of the amino acid chain specified by a gene
results in a polypeptide with a biological function
(gene product).
Transcription
First, the nucleotide sequence of one strand of
DNA is transcribed into a complementary molecule
of RNA (messenger RNA, mRNA). The DNA helix is opened
by a complex set of proteins. The DNA strand in
the 3! to 5! direction (coding strand) serves as
the template for the transcription into RNA, which
is synthesized in the 5! to 3! direction. It is
called the RNA sense strand. RNA transcribed under
experimental conditions from the opposing DNA strand
is called antisense RNA.
Translation
During translation the sequence of codons made up
of the nucleotide bases in mRNA is converted into
a corresponding sequence of amino acids. Translation
occurs in a reading framewhich is defined at the
start of translation (start codon). Amino acids
are joined in the sequence determined by the mRNA
nucleotide bases by a further class of RNA, transfer
RNA (tRNA). Each amino acid has its own tRNA, which
has a region that is complementary to its codon
of the mRNA (anticodon). The codons 1, 2, 3, and
4 of the mRNA are translated into the amino acid
sequence methionine (Met), glycine (Gly), serine
(Ser), and isoleucine (Ile), etc. Codon 1 is always
AUG (start codon).
Stages of translation
Translation (protein synthesis) in eukaryotes occurs
outside of the cell nucleus in ribosomes in the
cytoplasm. Ribosomes consist of subunits of numerous
associated proteins and RNA molecules (ribosomal
RNA, rRNA; p. 204). Translation begins with initiation
(1): an initiation complex comprising mRNA, a ribosome,
and tRNA is formed. This requires a number of initiation
factors (IF1, IF2, IF3, etc.). Then elongation (2)
follows: a further amino acid, determined by the
next codon, is attached. A threephase elongation
cycle develops, with codon recognition, peptide
binding to the next amino acid residue, and movement
(translocation) of the ribosome three nucleotides
further in the 3! direction of the mRNA. Translation
ends with termination (3), when one of three mRNA
stop codons (UAA, UGA, or UAG) is reached. The polypeptide
chain formed leaves the ribosome, which dissociates
into its subunits. The biochemical processes of
the stages shown here have been greatly simplified.
Structure of transfer RNA (tRNA)
Transfer RNA has a characteristic, cloverleaflike
structure, illustrated here by yeast phenylalanine
tRNA (1). It has three single-stranded loop regions
and four double-stranded “stem” regions. The three-dimensional
structure (2) is complex, but various functional
areas can be differentiated, such as the recognition
site (anticodon) for the mRNA codon and the binding
site for the respective amino acid (acceptor stem)
on the 3! end (acceptor end).
References
Brenner, S. , Jacob. F., Meselson, M.: An unstable
intermediate carrying information from
genes to ribosomes for protein synthesis.
Nature 190:576–581, 1961.
Ibba, M., Söll, D.: Quality control mechanisms
during translation. Science 286:1893–1897,
1999.
Watson J.D. et al.: Molecular Biology of the
Gene. 3rd ed. Benjamin/Cummings Publishing
Co., Menlo Park, California, 1987.
Genes and Mutation
The double helix structure of DNA is the basis of both
replication and transcription as seen in the preceding
pages. The information transmitted during replication
and transcription is arranged in units called genes.
The term gene was introduced in 1909 by the Danish biologist
Wilhelm Johannsen (along with the terms genotype and
phenotype). Until it was realized that a gene consists
of DNA, itwas defined in somewhat abstract terms as
a factor (Mendel’s term) that confers certain heritable
properties to a plant or an animal. However, it was
not apparent how mutations could be related to the structure
of a gene. The discovery that mutations also occur in
bacteria and other microorganisms paved the way to understanding
their nature (see p. 84). The organization of genes
differs in prokaryotes and eukaryotes as shown below.
Transcription in prokaryotes and eukaryotes
Transcription differs in unicellular organisms without
a nucleus, such as bacteria (prokaryotes, 1), and
in multicellular organisms (eukaryotes, 2), which
have a cell nucleus. In prokaryotes, the mRNA serves
directly as a template for translation. The sequences
of DNA and mRNA correspond in a strict 1:1 relationship,
i.e., they are colinear. Translation begins even
before transcription has completely ended. In contrast,
a primary transcript of RNA precursor mRNA) is formed
first in eukaryotic cells. This is a preliminary
form of the maturemRNA. The maturemRNA is formed
when the noncoding sections are removed from the
primary transcript, before it leaves the nucleus
to act as a template for forming a polypeptide (RNA
processing). The reason for these important differences
is that functionally related genes generally lie
together in prokaryotes and that noncoding segments
(introns) are present in the genes of eukaryotes
(see p. 50).
DNA and mutation
Coding DNA and its corresponding polypeptide are
colinear. An alteration (mutation) of the DNA base
sequence may lead to a different codon. The position
of the resulting change in the sequence of amino
acids corresponds to the position of the mutation
(1). Panel B shows the gene for the protein tryptophan
synthetase A of E. coli bacteria and mutations at
four positions. At position 22, phenylalanine (Phe)
has been replaced by leucine (Leu); at position
49, glutamic acid (Glu) by glutamine (Gln); at position
177, Leu by arginine (Arg). Every mutation has a
defined position. Whether it leads to incorporation
of another amino acid depends on how the corresponding
codon has been altered. Different mutations at one
position (one codon) in different DNA molecules
are possible (2). Two different mutations have been
observed at position 211: glycine (Gly) to arginine
(Arg) and Gly to glutamic acid (Glu). Normally (in
the wildtype), codon 211 is GGA and codes for glycine
(3). A mutation of GGA to AGA leads to a codon for
arginine; amutation to GAA leads to a codon for
glutamic acid (4).
Types of mutation
Basically, there are three different types of mutation
involving single nucleotides (point mutation): substitution
(exchange), deletion (loss), and insertion (addition).
With substitution, the consequences depend on howa
codon has been altered. Two types of substitution
are distinguished: transition (exchange of one purine
for another purine or of one pyrimidine for another)
and transversion (exchange of a purine for a pyrimidine,
or vice versa). A substitution may alter a codon
so that a wrong amino acid is present at this site
but has no effect on the reading frame (missense
mutation), whereas a deletion or insertion causes
a shift of the reading frame (frameshift mutation).
Thus the sequences that follow no longer code for
a functional gene product (nonsense mutation).
References
Alberts, B. et al.: Molecular Biology of the Cell.
3rd ed. Garland Publishing, New York, 1994.
Alberts, B. et al.: Essential Cell Biology. An Introduction
to the Molecular Biology of the Cell.
Garland Publishing, New York, 1998.
Lodish, H. et al.: Molecular Cell Biology. 4th ed.
Scientific American Books, F.H. Freeman &
Co., New York, 2000.
Watson, J.D. et al.: Molecular Biology of the
Gene, 3rd ed. Benjamin/Cummings Publishing
Co., Menlo Park, California, 1987.
Genetic Code
The genetic code is the set of biological rules by which
DNA nucleotide base pair sequences are translated into
corresponding sequences of amino acids. Genes do not
code for proteins directly, but do so through a messenger
molecule (messenger RNA,mRNA). A codeword (codon) for
an amino acid consists of a sequence of three nucleotide
base pairs (triplet codon). The genetic code also includes
sequences for the beginning (start codon) and for the
end (stop codon) of the coding region. The genetic code
is universal; the same codons are used by different
organisms.
Genetic code in mRNA for all amino acids
Each codon corresponds to one amino acid, but one
amino acid may be coded for by different codons
(redundancy of the code). For example, there are
two possibilities to code for the amino acid phenylalanine:
UUU and UUC, and there are six possibilities to
code for the amino acid serine: UCU, UCC, UCA, UCG,
AGU, and AGC. Many amino acids are determined bymore
than one codon. The greatest variation is in the
third position (at the 3! end of the triplet). The
genetic code was elucidated in 1966 by analyzing
how triplets transmit information from the genes
to proteins. mRNA added to bacteria could be directly
converted into a corresponding protein. Synthetic
RNA polymers such as polyuridylate (poly (U)), polyadenylate
(poly(A)), and polycytidylate (poly(C)) could be
directly translated into polyphenylalanine, polylysine,
and polyproline in extracts of E. coli bacteria.
This showed that UUU must code for phenylalanine,
AAA for lysine, and CCC for proline. By further
experiments with mixed polymers of different proportions
of two or three nucleotides, the genetic codewas
determined for all amino acids and all nucleotide
compositions.
Abbreviated code
Sequences of amino acids are designated with the
single-letter abbreviations (“alphabetic code”).
The start codon is AUG (methionine). Stop codons
are UAA, UAG, and UGA. The only amino acids that
are encoded by a single codon are methionine (AUG)
and tryptophan (UGG).
Open reading frame (ORF)
A segment of a nucleotide sequence can correspond
to one of three reading frames (e.g., A, B, or C);
however, only one is correct (open reading frame).
In the example shown, the reading frames B and C
are interrupted by a stop codon after three and
five codons, respectively. Thus they cannot serve
as reading frames for a coding sequence. On the
other hand, Amust be the correct reading frame:
It begins with the start codon AUG and yields a
sequence without stop codons (open reading frame).
Coding by several different nucleotide
sequences
Since the genetic code has redundancy, it is possible
that different nucleotide sequences code for the
same amino acid sequence. However, the differences
are limited to one (or at most two) positions of
a given triplet codon.
References
Alberts, B. et al.: Essential Cell Biology. An Introduction
to the Molecular Biology of the Cell.
Garland Publishing, New York, 1998.
Crick, F.H.C. et al: General nature of the genetic
code for proteins. Nature 192:1227–1232,
1961.
Lodish, H. et al.: Molecular Cell Biology. 4th ed.
Scientific American Books, F.H. Freeman &
Co., New York, 2000.
Rosenthal, N.: DNA and the genetic code. New
Eng. J. Med. 331:39–41, 1995.
Singer, M., Berg, P.: Genes and Genomes: a
changing perspective. Blackwell Scientific
Publications, Oxford–London, 1991.
The Structure of Eukaryotic Genes
Eukaryotic genes consist of coding and noncoding segments
of DNA, called exons and introns, respectively. At first
glance it seems to be an unnecessary burden to carry
DNAwithout obvious functions within a gene. However,
it has been recognized that this has great evolutionary
advantages. When parts of different genes are rearranged
on new chromosomal sites during evolution, new genes
may be constructed from parts of previously existing
genes.
Exons and introns
In 1977, itwas unexpectedly found that the DNA of
a eukaryotic gene is longer than its corresponding
mRNA. The reason is that certain sections of the
initially formed primary RNA transcript are removed
before translation occurs. Electron micrographs
show that DNA and its corresponding transcript (RNA)
are of different lengths (1). When mRNA and its
complementary single-stranded DNA are hybridized,
loops of single-stranded DNA arise becausemRNA hybridizes
onlywith certain sections of the singlestranded
DNA. In (2), seven loops (A to G) and eight hybridizing
sections are shown (1 to 7 and the leading section
L). Of the total 7700 DNA base pairs of this gene
(3), only 1825 hybridize with mRNA. A hybridizing
segment is called an exon. An initially transcribed
DNA section that is subsequently removed from the
primary transcript is an intron. The size and arrangement
of exons and introns are characteristic for every
eukaryotic gene (exon/intron structure). (Electron
micrograph fromWatson et al., 1987).
Intervening DNA sequences (introns)
In prokaryotes, DNA is colinear with mRNA and contains
no introns (1). In eukaryotes, mature mRNA is complementary
to only certain sections of DNA because the latter
contains introns (2). (Figure adapted from Stryer,
1995).
Basic eukaryotic gene structure
Exons and introns are numbered in the 5! to 3! direction
of the coding strand. Both exons and introns are
transcribed into a precursor RNA (primary transcript).
The first and the last exons usually contain sequences
that are not translated. These are called the 5!
untranslated region (5! UTR) of exon 1 and the 3!
UTR at the 3! end of the last exon. The noncoding
segments (introns) are removed from the primary
transcript and the exons on either side are connected
by a process called splicing. Splicing must be very
precise to avoid an undesirable change of the correct
reading frame. Introns almost always start with
the nucleotides GT in the 5! to 3! strand (GU in
RNA) and end with AG. The sequences at the 5! end
of the intron beginning with GT are called splice
donor site and at the 3! end, ending with AG, are
called the splice acceptor site. Mature mRNA is
modified at the 5! end by adding a stabilizing structure
called a “cap” and by adding many adenines at the
3! end (polyadenylation) (see p. 50).
Splicing pathway in GU–AG introns
RNA splicing is a complex process mediated by a
large RNA-containing protein called a spliceosome.
This consists of five types of small nuclear RNA
molecules (snRNA) and more than 50 proteins (small
nuclear riboprotein particles). The basic mechanism
of splicing schematically involves autocatalytic
cleavage at the 5! end of the intron resulting in
lariat formation. This is an intermediate circular
structure formed by connecting the 5! terminus (UG)
to a base (A) within the intron. This site is called
the branch site. In the next stage, cleavage at
the 3! site releases the intron in lariat form.
At the same time the right exon is ligated (spliced)
to the left exon. The lariat is debranched to yield
a linear intron and this is rapidly degraded. The
branch site identifies the 3! end for precise cleavage
at the splice acceptor site. It lies 18–40 nucleotides
upstream (in 5! direction) of the 3! splice site.
(Figure adapted from Strachan and Read, 1999).
References
Lewin, B.: Genes VII. Oxford Univ. Press, Oxford,
2000.
Strachan, T., Read A.P.: Human Molecular
Genetics. 2nd ed. Bios Scientific Publishers,
Oxford, 1999.
Stryer, L.: Biochemistry, 4th ed. W.H. Freeman &
Co., New York, 1995.
Watson, J.D. et al.: Molecular Biology of the
Gene, 3rd ed. Benjamin/Cummings Publishing
Co., Menlo Park, California, 1987.
DNA Sequencing
Knowledge of the nucleotide sequence of a gene provides
important information about its structure, function,
and evolutionary relationship to other similar genes
in the same or different organisms. Thus, the development
in the 1970s of relatively simple methods for sequencing
DNA has had a great impact on genetics. Two basic methods
for DNA sequencing have been developed: a chemical cleavage
method (A. M. Maxam andW. Gilbert, 1977) and an enzymatic
method (F. Sanger, 1981). A brief outline of the underlying
principles follows.
Sequencing by chemical degradation
This method utilizes base-specific cleavage of DNA
by certain chemicals. Four different chemicals are
used in four reactions, one for each base. Each
reaction produces a set of DNA fragments of different
sizes. The sizes of the fragments in a reaction
mixture are determined by positions in the DNA of
the nucleotide that has been cleaved. A double-stranded
or singlestranded fragment of DNA to be sequenced
is processed to obtain a single strand labeledwith
a radioactive isotope at the 5! end (1). This DNA
strand is treated with one of the four chemicals
for one of the four reactions. Here the reaction
at guanine sites (G) by dimethyl sulfate (DMS) is
shown. Dimethyl sulfate attaches a methyl group
to the purine ring of G nucleotides. The amount
of DMS used is limited so that on average just one
G nucleotide per strand is methylated, not the others
(shown here in four different positions of G). When
a second chemical, piperidine, is added, the nucleotide
purine ring is removed and the DNA molecule is cleaved
at the phosphodiester bond just upstream of the
site without the base. The overall procedure results
in a set of labeled fragments of defined sizes according
to the positions of G in the DNA sample being sequenced.
Similar reactions are carried out for the other
three bases (A, T, and C, not shown). The four reaction
mixtures, one for each of the bases, are run in
separate lanes of a polyacrylamide gel electrophoresis.
Each of the four lanes represents one of the four
bases G, A, T, or C. The smallest fragment will
migrate the farthest downward, the next a little
less far, etc. One can then read the sequence in
the direction opposite to migration to obtain the
sequence in the 5! to 3! direction (here TAGTCGCAGTACCGTA).
Sequencing by chain termination
This method, nowmuchmorewidely used than the chemical
cleavage method, rests on the principle that DNA
synthesis is terminated when instead of a normal
deoxynucleotide (dATP, dTTP, dGTP, dCTP), a dideoxynucleotide
(ddATP, ddTTP, ddGTP, ddCTP) is used. A dideoxynucleotide
(ddNTP) is an analogue of the normal dNTP. It differs
by lack of a hydroxyl group at the 3! carbon position.
When a dideoxynucleotide is incorporated during
DNA synthesis, no bond between its 3! position and
the next nucleotide is possible because the ddNTP
lacks the 3! hydroxyl group. Thus, synthesis of
the newchain is terminated at this site. The DNA
fragment to be sequenced has to be single-stranded
(1). DNA synthesis is initiated using a primer and
one of the four ddNTPs labeled with 32P in the phosphate
groups or, for automated sequencing, with a fluorophore
(see next plate). Here an example of chain termination
using ddATP is shown (3). Wherever an adenine (A)
occurs in the sequence, the dideoxyadenine triphosphate
will cause termination of the newDNA chain being
synthesized. This will produce a set of different
DNA fragments whose sizes are determined by the
positions of the adenine residues occurring in the
fragment to be sequenced. Similar reactions are
done for the other three nucleotides. The four parallel
reactions will yield a set of fragments with defined
sizes according to the positions of the nucleotides
where the new DNA synthesis has been terminated.
The fragments are separated according to size by
gel electrophoresis as in the chemical method. The
sequence gel is read in the direction from small
fragments to large fragments to derive the nucleotide
sequence in the 5! to 3! direction. An example of
an actual sequencing gel is shown between panel
A and B.
References
Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
1999.
Rosenthal, N.: Fine structure of a gene—DNA
sequencing. New Eng. J. Med. 332:589–591,
1995.
Strachan, T., Read, A.P.: Human Molecular
Genetics. 2nd ed. Bios scientific Publishers,
Automated DNA Sequencing
Large-scale DNA sequencing requires automated procedures
based on fluorescence labeling of DNA and suitable detection
systems. In general, a fluorescent label can be used
either directly or indirectly. Direct fluorescent labels,
as used in automated sequencing, are fluorophores. These
aremolecules that emit a distinct fluorescent color
when exposed to UV light of a specific wavelength. Examples
of fluorophores used in sequencing are fluorescein,
which fluoresces pale green when exposed to a wavelength
of 494 nm; rhodamine, which fluoresces red at 555 nm;
and aminomethylcumarin acetic acid, which fluoresces
blue at 399 nm. In addition, a combination of different
fluorophores can be used to produce a fourth color.
Thus, each of the four bases can be distinctly labeled.
Another approach is to use PCR-amplified products (thermal
cycle sequencing, see A). This has the advantage that
double-stranded rather than single-stranded DNA can
be used as the starting material. And since small amounts
of template DNA are sufficient, the DNA to be sequenced
does not have to be cloned beforehand.
Thermal cycle sequencing
The DNA to be sequenced is contained in vector DNA
(1). The primer, a short oligonucleotide with a
sequence complementary to the site of attachment
on the single-stranded DNA, is used as a starting
point. For sequencing short stretches of DNA, a
universal primer is sufficient. This is an oligonucleotide
that will bind to vector DNA adjacent to the DNA
to be sequenced. However, if the latter is longer
than about 750 bp, only part of it will be sequenced.
Therefore, additional internal primers are required.
These anneal to different sites and amplify the
DNA in a series of contiguous, overlapping chain
termination experiments (2). Here, each primer determines
which region of the template DNA is being sequenced.
In thermal cycle sequencing (3), only one primer
is used to carry out PCR reactions, each with one
dideoxynucleotide (ddA, ddT, ddG, or ddC) in the
reaction mixture. This generates a series of different
chain-terminated strands, each dependent on the
position of the particular nucleotide base where
the chain is being terminated (4). After many cycles
and with electrophoresis, the sequence can be read
as shown in the previous plate. One advantage of
thermal cycle sequencing is that double-stranded
DNA can be used as starting material. (Illustration
based on Figures 4.5 and 4.6 in Brown, 1999).
Automated DNA sequencing (principle)
Automated DNA sequencing involves four fluorophores,
one for each of the four nucleotide bases. The resulting
fluorescent signal is recorded at a fixed point
when DNA passes through a capillary containing an
electrophoretic gel. The base-specific fluorescent
labels are attached to appropriate dideoxynucleotide
triphosphates (ddNTP). Each ddNTP is labeled with
a different color, e.g., ddATP green, ddCTP blue,
ddGTP yellow, and ddTTP red (1). (The actual colors
for each nucleotide may be different.) All chains
terminated at an adenine (A) will yield a green
signal; all chains terminated at a cytosine (C)
will yield a blue signal, and so on. The sequencing
reactions based on this kind of chain termination
at labeled nucleotides (2) are carried out automatically
in sequencing capillaries (3). The electrophoretic
migration of the ddNTP-labeled chains in the gel
in the capillary pass in front of a laser beam focused
on a fixed position. The laser induces a fluorescent
signal that is dependent on the specific label representing
one of the four nucleotides. The sequence is electronically
read and recorded and is visualized as alternating
peaks in one of the four colors, representing the
alternating nucleotides in their sequence positions.
In practice the peaks do not necessarily show the
same maximal intensity as in the schematic diagram
shown here. (Illustration based on Brown, 1999,
and Strachan and Read, 1999).
References
Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
1999.
Rosenthal, N.: Fine structure of a gene—DNA
sequencing. New Eng. J. Med. 332:589–591,
1995.
Strachan, T., Read, A.P.: Human Molecular
Genetics. 2nd ed. Bios Scientific Publishers,
Oxford, 1999.
Wilson, R.K., et al.: Development of an automated
procedure for fluorescent DNA
sequencing. Genomics 6:626–636, 1990.
DNA Cloning To obtain sufficient amounts of a specific DNA
sequence (e.g., a gene of interest) for study, it
must be selectively amplified. This is accomplished
by DNA cloning, which produces a homogeneous
population of DNA fragments from
a mixture of very different DNA molecules or
from all the DNA of the genome. Here procedures
are required to identify DNA from the
correct region in the genome, to separate it
fromotherDNA, and to multiply (clone) it selectively.
Identification of the correct DNA fragment
utilizes the specific hybridization of complementary
single-stranded DNA (molecular
hybridization). A short segment of singlestranded
DNA, a probe, originating from the
sequence to be studied, will hybridize to its
complementary sequences after these have
been denatured (made single-stranded, see
Southern blot analysis, p. 62). After the hybridized
sequence has been separated from
other DNA, it can be cloned. The selected DNA
sequences can be amplified in two basic ways:
in cells (cell-based cloning) or by cell-free cloning
(see polymerase chain reaction, PCR, p. 66).
Cell-based DNA cloning
Cell-based DNA cloning requires four initial steps.
First, a collection of different DNA fragments (here
labeled 1, 2, and 3) are obtained from the desired
DNA (target DNA) by cleaving it with a restriction
enzyme (see p. 64). Since fragments resulting from
restriction enzyme cleavage have a short single-stranded
end of a specific sequence at both ends, they can
be ligated to other DNA fragments that have been
cleaved with the same enzyme. The fragments produced
in step 1 are joined to DNA fragments containing
the origin of replication (OR) of a replicon, which
enables them to replicate (2). In addition, a fragment
may be joined to a selectable marker, e.g., a DNA
sequence containing an antibiotic resistance gene.
The recombinant DNA molecules are transferred into
host cells (bacterial or yeast cells). Here the
recombinant DNA molecules can replicate independently
of the host cell genome (3). Usually the host cell
takes up only one (although occasionally more than
one) foreign DNA molecule. The host cells transformed
by recombinant (foreign) DNA are grown in culture
and multiplied (propagation, 4). Selective growth
of one of the cell clones allows isolation of one
type of recombinant DNA molecule (5). After further
propagation, a homogeneous population of recombinant
DNA molecules is obtained (6). A collection of different
fragments of cloned DNA is called a clone library
(7, see DNA libraries). In cell-based cloning, the
replicon-containing DNA molecules are referred to
as vector molecules. (Figure adapted from Strachan
and Read, 1999)
A plasmid vector for cloning
Many different vector systems exist for cloning
DNA fragments of different sizes. Plasmid vectors
are used to clone small fragments. The experiment
is designed in such a way that incorporation of
the fragment to be cloned changes the plasmid’s
antibiotic resistance to allow selection for these
recombinant plasmids. A formerly frequently used
plasma vector (pBR322) is presented. This plasmid
contains recognition sites for the restriction enzymes
PstI, EcoRI, and SalI in addition to genes for ampicillin
and tetracycline resistance (1). If a foreign DNA
fragment is incorporated into the plasmid at the
site of the EcoRI recognition sequence, then tetracycline
and ampicillin resistance will be retained (2).
If the enzyme PstI is used to incorporate the fragment
to be used, ampicillin resistance is lost (the bacterium
becomes ampicillin sensitive), but tetracycline
resistance is retained. If the enzyme SalI is used
to incorporate the fragment, tetracycline resistance
disappears (the bacterium becomes tetracycline sensitive),
but ampicillin resistance is retained. Thus, depending
on how the fragment has been incorporated, recombinant
plasmids containing the DNA fragment to be cloned
can be distinguished from nonrecombinant plasmids
by altered antibiotic resistance. Cloning in plasmids
(bacteria) has become less important since yeast
artificial chromosomes (YACs) have become available
for cloning relatively large DNA fragments
cDNA Cloning
cDNA is a single-stranded segment of DNA that is complementary
to the mRNA of a coding DNA segment or of a whole gene.
It can be used as a probe (cDNA probe as opposed to
a genomic probe) for the corresponding gene because
it is complementary to coding sections (exons) of the
gene. If the gene has been altered by structural rearrangement
at a corresponding site, e.g., by deletion, the normal
and mutated DNA can be differentiated. Thus, the preparation
and cloning of cDNA is of great importance. From the
cDNA sequence, essential inferences can be made about
a gene and its gene product.
Preparation of cDNA
cDNA is prepared from mRNA. Therefore, a tissue
is required in which the respective gene is transcribed
and mRNA is produced in sufficient quantities. First,
mRNA is isolated. Then a primer is attached so that
the enzyme reverse transcriptase can form complementary
DNA (cDNA) from the mRNA. Since mRNA contains poly(A)
at its 3! end, a primer of poly(T) can be attached.
From here, the enzyme reverse transcriptase can
start forming cDNA in the 5! to 3! direction. The
RNA is then removed by ribonuclease. The cDNA serves
as a template for the formation of a new strand
of DNA. This requires the enzyme DNA polymerase.
The result is a double strand of DNA, one strand
of which is complementary to the original mRNA.
To this DNA, single sequences (linkers) are attached
that are complementary to the single-stranded ends
produced by the restriction enzyme to be used. The
same enzyme is used to cut the vector, e.g., a plasmid,
so that the cDNA can be incorporated for cloning.
Cloning vectors
The cell-based cloning of DNA fragments of different
sizes is facilitated by a wide variety of vector
systems. Plasmid vectors are used to clone small
DNA fragments in bacteria. Their main disadvantage
is that only 5–10 kb of foreign DNA can be cloned.
A plasmid cloning vector that has taken up a DNA
fragment (recombinant vector), e.g., pUC8 with 2.7
kb of DNA, must be distinguished from one that has
not. In addition, an ampicillin resistance gene
(Amp+) serves to distinguish bacteria that have
taken up plasmids from those that have not. Several
unique restriction sites in the plasmid DNA segment
where a DNA fragment might be inserted serve as
markers along with a marker gene, such as the lacZ
gene. The uptake of a DNA fragment by a plasmid
vector disrupts the plasmid's marker gene. Thus,
in the recombinant plasmid the enzyme !-galactosidase
will not be produced by the disrupted lagZ gene,
whereas in the plasmid without a DNA insert (nonrecombinant)
the enzyme is produced by the still intact lacZ
gene. The activity of the gene and the presence
or absence of the enzyme are determined by observing
a difference in color of the colonies in the presence
of an artificial substrate sugar. !-Glactosidase
splits an artificial sugar (5-bromo-4-chloro-3-indolyl-!-D-galactopyranoside)
that is similar to lactose, the natural substrate
for this enzyme, into two sugar components, one
of which is blue. Thus, bacterial colonies containing
nonrecombinant plasmids with an intact lacZ gene
are blue. In contrast, colonies that have taken
up a recombinant vector remain pale white. The latter
are grown in a medium containing ampicillin (the
selectable marker for the uptake of plasmid vectors).
Subsequently, a clone library can be constructed.
(Figure adapted from Brown, 1999)
cDNA cloning
Only those bacteria become ampicillin resistant
that have incorporated a recombinant plasmid. Recombinant
plasmids, which contain the gene for ampicillin
resistance, transform ampicillinsensitive bacteria
into ampicillin-resistant bacteria. In an ampicillin-containing
medium, only those bacteria grow that contain the
recombinant plasmid with the desired DNA fragment.
By further replication in these bacteria, the fragment
can be cloned until there is enough material to
be studied. (Figures afterWatson et al., 1987).
References
Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
1999.
Watson, J.D., et al.: Molecular Biology of the
Gene, 3 rd ed. Benjamin/Cummings Publishing
Co., Menlo Park, California, 1987.
DNA Libraries
A DNA library is a collection of DNA fragments that
in their entirety represent the genome, that is, a particular
gene being sought and all remaining DNA. It is the starting
point for cloning a gene of unknown chromosomal location.
To produce a library, the total DNA is digestedwith
a restriction enzyme, and the resulting fragments are
incorporated into vectors and replicated in bacteria.
A sufficient number of clones must be present so that
every segment is represented at least once. This is
a question of the size of the genome being investigated
and the size of the fragments. Plasmids and phages are
used as vectors. For larger DNA fragments, yeast cells
may be employed. There are two different types of libraries:
genomic DNA and cDNA.
Genomic DNA library
Clones of genomic DNA are copies of DNA fragments
from all of the chromosomes (1). They contain coding
and noncoding sequences. Restriction enzymes are
used to cleave the genomic DNA into many fragments.
Here four fragments are schematically shown, containing
two genes, A and B (2). These are incorporated into
vectors, e.g., into phage DNA, and are replicated
in bacteria. The complete collection of recombinant
DNA molecules, containing all DNA sequences of a
species or individual, is called a genomic library.
To find a particular gene, a screening procedure
is required
cDNA library
Unlike a genomic library, which is complete and
contains coding and noncoding DNA, a cDNA library
consists only of coding DNA sequences. This specificity
offers considerable advantages over genomic DNA.
However, it requires that mRNA be available and
does not yield information about the structure of
the gene. mRNA can be obtained only fromcells in
which the respective gene is transcribed, i.e.,
in which mRNA is produced (1). In eukaryotes, the
RNA formed during transcription (primary transcript)
undergoes splicing to form mRNA (2, see p. 50).
Complementary DNA (cDNA) is formed from mRNA by
the enzyme reverse transcriptase (3). The cDNA can
serve as a template for synthesis of a complementary
DNA strand, so that complete double-stranded DNA
can be formed (cDNA clone). Its sequence corresponds
to the coding sequences of the gene exons. Thus
it is well suited for use as a probe (cDNA probe).
The subsequent steps, incorporation into a vector
and replication in bacteria, correspond to those
of the procedure to produce a genomic library. cDNA
clones can only be won from coding regions of an
active (mRNA-producing) gene; thus, the cDNA clones
of different tissues differ according to genetic
activity. Since cDNA clones correspond to the coding
sequences of a gene (exons) and contain no noncoding
sections (introns), cloned cDNA is the preferred
starting material when further information about
a gene product is sought by analyzing the gene.
The sequence of amino acids in a protein can be
determined from cloned and sequenced cDNA. Also,
large amounts of a protein can be produced by having
the cloned gene expressed in bacteria or yeast cells.
Screening of a DNA library
Bacteria that have taken up the vectors can grow
on an agar-coated Petri dish, where they form colonies
(1). A replica imprint of the culture is taken on
a membrane (2), and the DNA that sticks to the membrane
is denatured with an alkaline solution (3). DNA
of the gene segment being sought can then be identified
by hybridization with a radioactively (or otherwise)
labeled probe (4). After hybridization, a signal
appears on the membrane at the site of the gene
segment (5). DNA complementary to the labeled probe
is located here; its exact position in the culture
corresponds to that of the signal on the membrane
(6). A probe is taken fromthe corresponding area
of the culture (5). It will contain the desired
DNA segment, which can now be further replicated
(cloned) in bacteria. By this means, the desired
segment can be enriched and is available for subsequent
studies.
References
Rosenthal, N.: Stalking the gene—DNA libraries.
New Eng. J. Med. 331:599–600, 1994.
Watson, J.D. et al.: Recombinant DNA. 2nd ed.
Scientific American Books, New York, 1992.
Restriction Analysis by Southern
Blot Analysis
Restriction endonucleases are DNA-cleaving enzymes with
defined sequences as targets (see next plate). They
are often simply called restriction enzymes. Since each
enzyme cleaves DNA only at its specific recognition
sequence, the total DNA of an individual present in
nucleated cells can be cut into pieces of manageable
and defined size in a reproducible way. Individual DNA
fragments can then be selected, ligated into suitable
vectors, multiplied, and examined. Owing to the uneven
distribution of recognition sites, theDNA fragments
differ in size. A starting mixture of DNA fragments
is sorted according to size. Two procedures detect target
DNA or RNA fragments after they have been arranged by
size in gel electrophoresis—the Southern blot hybridization
for DNA (named after E. Southern who developed this
method 1975) and the Northern blot hybridization for
RNA (a word play on Southern, not named after a Dr.
Northern). Immunoblotting (Western blot) detects proteins
by an antibody-based procedure.
Southern blot hybridization
The analysis starts with total DNA (1). The DNA
is isolated and cut with restriction enzymes (2).
One of the not yet identified fragments contains
the gene being sought or part of the gene. The fragments
are sorted by size in a gel (usually agarose) in
an electric field (electrophoresis) (3). The smaller
the fragment, the faster it migrates; the larger,
the slower it migrates. Next, the blot is carried
out: The fragments contained in the gel are transferred
to a nitrocellulose or nylon membrane (4). There
the DNA is denatured (made single-stranded) with
alkali and fixed to the membrane by moderate heating
(! 80!C) or UV cross-linkage. The sample is incubated
with a probe of complementary singlestranded DNA
(genomic DNA or cDNA) from the gene (5). The probe
hybridizes solely with the complementary fragment
being sought, and not with others (6). Since the
probe is labeled with radioactive 32P, the fragment
being sought can be identified by placing an X-ray
film on the membrane, where it appears as a black
band on the film after development (autoradiogram)
(6). The size, corresponding to position, is determined
by running DNA fragments of known size in the electrophoresis.
Restriction fragment length polymorphism
(RFLP)
In about every 100 base pairs of a DNA segment,
the nucleotide sequence differs in some individuals
(DNA polymorphism). As a result, the recognition
sequence of a restriction enzyme may be present
on one chromosome but not the other. In this case
the restriction fragment sizes differ at this site
(restriction fragment length polymorphism, RFLP).
An example is shown for two 5 kb (5000 base pair)
DNA segments. In one, a restriction site in the
middle is present (allele 1); in the other (allele
2) it is absent. With a Southern blot, it can be
determined whether in this location an individual
is homozygous 1–1 (two alleles 1, no 5 kb fragment),
heterozygous 1–2 (one allele each, 1 and 2), or
homozygous 2–2 (two alleles 2). If the mutation
being sought lies on the chromosome carrying the
5 kb fragment, the presence of this fragment indicates
presence of the mutation. The absence of this fragment
would indicate that the mutation is absent. It is
important to understand that the RFLP itself is
unrelated to the mutation. It simply distinguishes
DNA fragments of different sizes from the same region.
These can be used as markers to distinguish alleles
in a segregation analysis (see p. 134). In addition
to RFLPs, other types of DNA polymorphism can be
detected by Southern blot hybridization, although
polymerase chain reaction-based analysis of microsatellites
is now used more frequently
References
Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
1999.
Housman, D.: Human DNA polymorphism. New
Engl. J. Med. 332:318–320, 1995.
Strachan, T., Read, A.P.: Human Molecular
Genetics. 2nd ed. Bios Scientific Publishers,
Oxford, 1999.
Restriction Mapping
Restriction endonucleases (restriction enzymes) are
DNA-cutting enzymes. They are obtained from bacteria,
which produce the enzymes as protection from foreign
DNA. A given enzyme recognizes a specific sequence of
4–8 (usually 6) nucleotides (a restriction site) where
it cleaves the DNA. The sizes of the DNA fragments produced
depend on the distribution of the restriction sites.
More than 400 different types of restriction enzymes
have been isolated.
DNA cleavage by restriction nucleases
The cleavage patterns (recognition sequences) of
three frequently used restriction enzymes, EcoRI,
HindIII, and HpaI, are presented. For EcoRI and
HindIII the cut is “palindromic,” i.e., the cut
is asymmetric around an axis on which mirrorimage
complementary single-stranded DNA segments arise.
Each corresponds to its opposite- lying strand in
the reverse direction. Therefore, they can be joined
to a DNA fragment whose ends contain complementary
singlestranded sequences. HpaI cuts both strands
so that no single-stranded ends are formed. Frequently
cutting and seldom cutting enzymes can be distinguished
according to the frequency of occurrence of their
recognition sites.
Examples of restriction enzymes
The recognition sequences of some restriction enzymes
are shown. The names of the enzymes are derived
from those of the bacteria in which they occur,
e.g., EcoRI from Escherichia coli Restriction enzyme
I, etc. Some enzymes have a cutting site with limited
specificity. In HindII it suffices that the two
middle nucleotides are a pyrimidine and a purine
(GTPyPuAC), and it does not matter whether the former
is thymine (T) or cytosine (C), and whether the
latter is adenine (A) or guanine (G). Such a recognition
site occurs frequently and produces many relatively
small fragments, whereas enzymes that cut very infrequently
produce few and large DNA fragments.
Restriction fragments
In a given DNA segment, the recognition sequence
of a restriction enzyme occurs irregularly. Thus,
the distances between restriction sites differ.
DNA fragments of various sizes (restriction fragments)
result from digestion with a restriction enzyme.
A given restriction enzymewill cleave a given segment
of DNA into a series of DNA fragments of characteristic
sizes. This leads to a pattern that can be employed
for diagnostic purposes.
Determination of the locations of restriction
sites
Since the fragment sizes reflect the relative positions
of the cutting sites, they can be used to characterize
a DNA segment (restriction map). If a 10-kb DNA
segment cut by two enzymes, A and B, yields three
fragments, of 2 kb, 3 kb, and 5 kb, then the relative
location of the cutting sites can be determined
by using enzymes A and B alone in further experiments.
If enzyme A yields two fragments of 3 kb and 7 kb,
and enzyme B two fragments of 2 kb and 8 kb, then
the two cutting sites of enzymes A and B must lie
5 kb apart. To the left of the restriction site
of enzyme A are 3 kb; to the right of the restriction
site of enzyme B, 2 kb (1 kb = 1000 base pairs).
Restriction map
A given DNA segment can be characterized by
the distribution pattern of restriction sites. In
the example shown, a DNA segment is characterized
by the distribution of the cutting sites
for enzymes E (EcoRI) and H (HindIII). The individual
sites are separated by intervals defined
by the size of the fragments after digestion with
the enzyme. A restriction map is a linear
sequence of restriction sites at defined intervals
along the DNA. Restriction mapping is of considerable
importance in medical genetics and
evolutionary research.
DNA Amplification by
Polymerase Chain Reaction
(PCR)
The introduction of cell-free methods formultiplying
DNA fragments of defined origin (DNA amplification)
in 1985 ushered in a new era in molecular genetics (the
principle of PCR is contained in earlier publications).
This fundamental technology has spread dramaticallywith
the development of automated equipment used in basic
and applied research.
Polymerase chain reaction (PCR)
PCR is a cell-free, rapid, and sensitive method
for cloning DNA fragments. A standard reaction and
a wide variety of PCR-based methods have been developed
to assay for polymorphisms and mutations. Standard
PCR is an in vitro procedure for amplifying defined
target DNA sequences, even from very small amounts
of material or material of ancient origin. Selective
amplification requires some prior information about
DNA sequences flanking the target DNA. Based on
this information, two oligonucleotide primers of
about 15–25 base pairs length are designed. The
primers are complementary to sequences outside the
3! ends of the target site and bind specifically
to these. PCR is a chain reaction because newly
synthesized DNA strands act as templates for further
DNA synthesis for about 25–35 subsequent cycles.
Theoretically each cycle doubles the amount of DNA
amplified. At the end, at least 105 copies of the
specific target sequence are present. This can be
visualized as a distinct band of a specific size
after gel electrophoresis. Each cycle, involving
three precisely time-controlled and temperature-controlled
reactions in automated thermal cyclers, takes about
1–5 min. The three steps in each cycle are (1) denaturation
of double-stranded DNA, at about 93–95!C for human
DNA, (2) primer annealing at about 50–70!C depending
on the expected melting temperature of the duplex
DNA, and (3) DNA synthesis using heat-stable DNA
polymerase (from microorganisms living in hot springs,
such as Thermophilus aquaticus, Taq polymerase),
typically at about 70–75!C. At each subsequent cycle
the template (shown in blue) and the DNA newly synthesized
during the preceding cycle (shown in red) act as
templates for another round of synthesis. The first
cycle results in newly synthesized DNA of varied
lengths (shown with an arrow) at the 3! ends because
synthesis is continued beyond the target sequences.
The same happens during subsequent cycles, but the
variable strands are rapidly outnumbered by new
DNA of fixed length at both ends because synthesis
cannot proceed past the terminus of the primer at
the opposite template DNA.
cDNA amplification and RT-PCR
A partially known amino acid sequence of a polypeptide
can be used to obtain the sequence information required
for PCR. From its mRNA one can derive cDNA (see
complementary DNA, p. 58) and determine the sequence
of the sense and the antisense strand to prepare
appropriate oligonucleotide primers (1). When different
RNAs are available in small amounts, rapid PCRbased
methods are employed to amplify cDNA from different
exons of a gene. cDNA is obtained by reverse transcriptase
from mRNA, which is then removed by alkaline hydrolysis
(2). After a complementary new DNA strand has been
synthesized, the DNA can be amplified by PCR (3).
Reverse transcriptase PCR (RT-PCR) can be used when
the known exon sequences are widely separated within
a gene. With rapid amplification of cDNA ends (RACE-PCR),
the 5! and 3! end sequences can be isolated from
cDNA.
References
Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
1999.
Erlich, H.A., Gelfand D., Sninsky, J.J.: Recent
advances
in the polymerase chain reaction.
Science 252:1643–1651, 1991.
Erlich, H.A., Arnheim, N.: Genetic analysis with
the polymerase chain reaction. Ann. Rev.
Genet. 26:479–506, 1992.
Strachan, T., Read, A.P.: Human Molecular
Genetics. 2nd ed. Bios Scientific Publishers,
Oxford, 1999.
Volkenandt, M., Löhr, M., Dicker, A.P.: Gen-
Amplification durch Polymerase-Kettenreaktion.
Dtsch. Med. Wschr. 17:670–676,
1990.
White, T.J., Arnheim, N., Erlich, H.A.: The polymerase
chain reaction. Trends Genet. 5:185–
189, 1989.
Changes in DNA
When it was recognized that changes (mutations) in genes
occur spontaneously (T. H. Morgan, 1910) and can be
induced by X-rays (H. J. Muller, 1927), the mutation
theory of heredity became a cornerstone of early genetics.
Genes were defined asmutable units, but the question
what genes and mutations are remained. Today we know
that mutations are changes in the structure of DNA and
their functional consequences. The study of mutations
is important for several reasons. Mutations cause diseases,
including all forms of cancer. They can be induced by
chemicals and by irradiation. Thus, they represent a
link between heredity and environment. And without mutations,
well-organized forms of life would not have evolved.
The following two plates summarize the chemical nature
of mutations.
Error in replication
The synthesis of a new strand of DNA occurs by semiconservative
replication based on complementary base pairing
(see DNA replication). Errors in replication occur
at a rate of about 1 in 105. This rate is reduced
to about 1 in 107 to 109 by proofreading mechanisms.
When an error in replication occurs before the next
cell division (here referred to as the first division
after the mutation), e.g., a cytosine (C) might
be incorporated instead of an adenine (A) at the
fifth base pair as shown here, the resulting mismatch
will be recognized and eliminated by mismatch repair
(see DNA repair) in most cases. However, if the
error is undetected and allowed to stand, the next
(second) divisionwill result in a mutant molecule
containing a CG instead of an AT pair at this position.
This mutationwill be perpetuated in all daughter
cells. Depending on its location within or outside
of the coding region of a gene, functional consequences
due to a change in a codon could result.
Mutagenic alteration of a nucleotide
A mutation may result when a structural change of
a nucleotide affects its base-pairing capability.
The altered nucleotide is usually present in one
strand of the parent molecule. If this leads to
incorporation of awrong base, such as a C instead
of a T in the fifth base pair as shown here, the
next (second) round of replication will result in
two mutant molecules.
Replication slippage
A different class of mutations does not involve
an alteration of individual nucleotides, but results
from incorrect alignment between allelic or nonallelic
DNA sequences during replication. When the template
strand contains short tandem repeats, e.g., CA repeats
as in microsatellites (see DNA polymorphism and
Part II, Genomics), the newly replicated strand
and the template strand may shift their positions
relative to each other. With replication or polymerase
slippage, leading to incorrect pairing of repeats,
some repeats are copied twice or not at all, depending
on the direction of the shift. One can distinguish
forward slippage (shown here) and backward slippage
with respect to the newly replicated strand. If
the newly synthesized DNA strand slips forward,
a region of nonpairing remains in the parental strand.
Forward slippage results in an insertion. Backward
slippage of the new strand results in deletion.
Microsatellite instability is a characteristic feature
of hereditary nonpolyposis cancer of the colon (HNPCC).
HNPCC genes are localized on human chromosomes at
2p15–22 and 3p21.3. About 15% of all colorectal,
gastric, and endometrial carcinomas show microsatellite
instability. Replication slippage must be distinguished
from unequal crossing-over during meiosis. This
is the result of recombination between adjacent,
but not allelic, sequences on nonsister chromatids
of homologous chromosomes (Figures redrawn from
Brown, 1999).
References
Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
1999.
Dover, G.A.: Slippery DNA runs on and on and
on ... Nature Genet. 10:254–256, 1995.
Lewin, B.: Genes VII. Oxford University Press,
Oxford, 2000.
Rubinstein, D.C., et al.: Microsatellite evolution
and evidence for directionality and variation
in rate between species. Nature Genet.
10:337–343, 1995.
Strachan, T.A., Read, A.P.: Human Molecular
Genetics. 2nd ed. Bios Scientific Publ., Oxford,
1999.
Vogel, F., Rathenberg, R.: Spontaneous mutation
in man. Adv. Hum. Genet. 5:223–318, 1975.
Mutation Due to Different Base
Modifications
Mutations can result from chemical or physical events
that lead to base modification. When they affect the
base-pairing pattern, they interfere with replication
or transcription. Chemical substances able to induce
such changes are called mutagens. Mutagens cause mutations
in different ways. Spontaneous oxidation, hydrolysis,
uncontrolled methylation, alkylation, and ultraviolet
irradiation result in alterations that modify nucleotide
bases. DNA-reactive chemicals change nucleotide bases
into different chemical structures or remove a base.
Deamination and methylation
Cytosine, adenine, and guanine contain an amino
group. When this is removed (deamination), a modified
base with a different basepairing pattern results.
Nitrous acid typically removes the amino group.
This also occurs spontaneously at a rate of 100
bases per genome per day (Alberts et al., 1994,
p. 245). Deamination of cytosine removes the amino
group in position 4 (1). The resulting molecule
is uracil (2). This pairs with adenine rather than
guanine. Normally this change is efficiently repaired
by uracil-DNA glycosylase. Deamination at the RNA
level occurs in RNA editing (see Expression of genes).
Methylation of the carbon atom in position 5 of
cytosine results in 5- methylcytosine, containing
a methyl group in position 5 (3). Deamination of
5-methylcytosine will result in a change to thymine,
containing an oxygen in position 4 instead of an
amino group (4). This mutation will not be corrected
because thymine is a natural base. Adenine (5) can
be deaminated in position 6 to form hypoxanthine,
which contains an oxygen in this position instead
of an amino group (6), and which pairs with cytosine
instead of thymine. The resulting change after DNA
replication is a cytosine instead of a thymine in
the mutant strand.
Depurination
About 5000 purine bases (adenine and guanine) are
lost per day from DNA in each cell (depurination)
owing to thermal fluctuations. Depurination of DNA
involves hydrolytic cleavage of the N-glycosyl linkage
of deoxyribose to the guanine nitrogen in position
9. This leaves a depurinated sugar. The loss of
a base pair will lead to a deletion after the next
replication if not repaired in time (see DNA repair).
Alkylation
Alkylation is the introduction of a methyl or an
ethyl group into a molecule. The alkylation of guanine
involves the replacement of the hydrogen bond to
the oxygen atom in position 6 by a methyl group,
to form 6-methylguanine. This can no longer pair
with cytosine. Instead, it will pair with thymine.
Thus, after the next replication the opposite cytosine
(C) is replaced by a thymine (T) in the mutant daughter
molecule. Important alkylating agents are ethylnitrosourea
(ENU), ethylmethane sulfonate (EMS), dimethylnitrosamine,
and N-methyl-N-nitro- N-nitrosoguanidine.
Nucleotide base analogue
Base analogs are purines or pyrimidines that are
similar enough to the regular nucleotide DNA bases
to be incorporated into the new strand during replication.
5-Bromodeoxyuridine (5- BrdU) is an analog of thymine.
It contains a bromine atom instead of the methyl
group in position 5. Thus, it can be incorporated
into the new DNA strand during replication. However,
the presence of the bromine atom causes ambiguous
and often wrong base pairing.
UV-light-induced thymine dimers
Ultraviolet irradiation at 260 nm wavelength induces
covalent bonds between adjacent thymine residues
at carbon positions 5 and 6. If located within a
gene, this will interferewith replication and transcription
unless repaired. Another important type of UV-induced
change is a photoproduct consisting of a covalent
bond between the carbons in positions 4 and 6 of
two adjacent nucleotides, the 4–6 photoproduct (not
shown). (Figures redrawn from Lewin, 2000).
DNA Polymorphism
Genetic polymorphism is the existence of variants with
respect to a gene locus (alleles), a chromosome structure
(e.g., size of centromeric heterochromatin), a gene
product (variants in enzymatic activity or binding affinity),
or a phenotype. The term DNA polymorphism refers to
a wide range of variations in nucleotide base composition,
length of nucleotide repeats, or single nucleotide variants.
DNA polymorphisms are important as genetic markers to
identify and distinguish alleles at a gene locus and
to determine their parental origin.
Single nucleotide polymorphism (SNP)
These allelic variants differ in a single nucleotide
at a specific position. At least one in a thousand
DNA bases differs among individuals (1). The detection
of SNPs does not require gel electrophoresis. This
facilitates large-scale detection. A SNP can be
visualized in a Southern blot as a restriction fragment
length polymorphism (RFLP) if the difference in
the two alleles corresponds to a difference in the
recognition site of a restriction enzyme (see Southern
blot, p. 62).
Simple sequence length polymorphism (SSLP)
These allelic variants differ in the number of tandemly
repeated short nucleotide sequences in noncoding
DNA. Short tandem repeats (STRs) consist of units
of 1, 2, 3, or 4 base pairs repeated from 3 to about
10 times. Typical short tandem repeats are CA repeats
in the 5! to 3! strand, i.e., alternating CG and
AT base pairs in the double strand. Each allele
is defined by the number of CA repeats, e.g., 3
and 5, as shown (1). These are also called microsatellites.
The size differences due to the number of repeats
are determined by PCR. Variable number of tandem
repeats (VNTR), also called minisatellites, consist
of repeat units of 20–200 base pairs (2).
Detection of SNP by oligonucleotide hybridization
analysis
Oligonucleotides, short stretches of about 20 nucleotides
with a complementary sequence to the single-stranded
DNA to be examined, will hybridize completely only
if perfectly matched. If there is a difference of
even one base, such as due to an SNP, the resulting
mismatch can be detected because the DNA hybrid
is unstable and gives no signal.
Detection of STRs by PCR
Short tandem repeats (STRs) can be detected by the
polymerase chain reaction (PCR). The allelic regions
of a stretch ofDNA are amplified; the resulting
DNA fragments of different sizes are subjected to
electrophoresis; and their sizes are determined.
CEPH families
An important step in gene identification is the
analysis of large families by linkage analysis of
polymorphic marker loci on a specific chromosomal
region near a locus of interest. Large families
are of particular value. DNA from such families
has been collected by the Centre pour l’Étude du
Polymorphisme Humain (CEPH) in Paris, now called
the Centre Jean Dausset, after the founder. Immortalized
cell lines are stored from each family. A CEPH family
consists of four grandparents, the two parents,
and eight children. If four alleles are present
at a given locus they are designated A, B, C, and
D. Starting with the grandparents, the inheritance
of each allele through the parents to the grandchildren
can be traced (shown here as a schematic pattern
in a Southern blot). Of the four grandparents shown,
three are heterozygous (AB, CD, BC) and one is homozygous
(CC). Since the parents are heterozygous for different
alleles (AD the father and BC the mother), all eight
children are heterozygous (BD, AB, AC, or CD).
References
Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
1999.
Collins, F.S. , Guyer, M.S. , Chakravarti, A.: Variations
on a theme: cataloguing human DNA
sequence variation. Science 282:682–689,
1998.
Deloukas, P., Schuler, G., Gyapay, G., et al.: A
physical map of 30,000 human genes.
Science 282:744–746, 1998.
Lewin, B.: Genes VII. Oxford Univ. Press, Oxford,
2000.
Strachan, T., Read, A.P.: Human Molecular
Genetics. 2nd ed. Bios Scientific Publishers,
Oxford, 1999.
Recombination
Recombination lends the genome flexibility. Without
genetic recombination, the genes on each individual
chromosome would remain fixed in their particular position.
Changes could occur by mutation only, which would be
hazardous. Recombination provides the means to achieve
extensive restructuring, eliminate unfavorable mutation,
maintain and spread favorable mutations, and endow each
individual with a unique set of genetic information.
This greatly enhances the evolutionary potential of
the genome. Recombination must occur between precisely
corresponding sequences (homologous recombination) to
ensure that not one base pair is lost or added. The
newly combined (recombined) stretches of DNA must retain
their original structure in order to function properly.
Two types of recombination can be distinguished: (1)
generalized or homologous recombination, which in eukaryotes
occurs at meiosis (see p. 116) and (2) site-specific
recombination. A third process, transposition, utilizes
recombination to insert one DNA sequence into another
without regard to sequence homology. Here we consider
homologous recombination, a complex biochemical reaction
between two duplexes of DNA. The necessary enzymes,
which can involve any pair of homologous sequences,
are not considered. Two general models can be distinguished,
recombination initiated from a single-strand DNA break
and recombination initiated from a double-strand break.
Recombination initiated by single-strand
breaks
This model assumes that the process starts with
breaks at corresponding positions of one of the
strands of homologousDNA (same sequences of different
parental origin) (1). A nick ismade by a single-strand-breaking
enzyme (endonuclease) in each molecule at the corresponding
site (2), but see below. This allows the free ends
of one nicked strand to join with the free ends
of the other nicked strand, from the other molecule,
to form single-strand exchanges between the two
duplex molecules at the recombination joint (3).
The recombination joint moves along the duplex (branchmigration)
(4). This is an important feature because it ensures
that sufficient distance for the second nick is
present in each of the other strands (5). After
the two other strands have joined and gaps have
been sealed (6), a reciprocal recombinant molecule
is generated (7). Recombination involving DNA duplexes
requires topological changes, i.e., either the molecules
must be free to rotate or the restraint must be
relieved in some other way. This model has an unresolved
difficulty: How is it assured that the single-strand
nicks shown in step 2 occur at precisely the same
position in the two double helix DNA molecules?
Recombination initiated by double-strand
breaks
The current model for recombination is based on
initial double-strand breaks in one of the two homologous
DNA molecules (1). Both strands are cleaved by an
endonuclease, and the break is enlarged to a gap
by an exonuclease that removes the new 5! ends of
the strands at the break and leaves 3! single-stranded
ends (2). One free 3! end recombines with a homologous
strand of the other molecule (3). This generates
a D loop consisting of a displaced strand from the
“donor” duplex. The D loop is extended by repair
synthesis until the entire gap of the recipient
molecule is closed (4). This displaced strand anneals
to the single-stranded complementary homologous
sequences of the recipient strand and closes the
gap (5). DNA repair synthesis from the other 3!
end closes the remaining gap (6). The integrity
of the two molecules is restored by two rounds of
singlestrand repair synthesis. In contrast to the
single-strand exchange model, the doublestrand breaks
result in heteroduplex DNA in the entire region
that has undergone recombination. An apparent disadvantage
is the temporary loss of information in the gaps
after the initial cleavage. However, the ability
to retrieve this information by resynthesis from
the other duplex avoids permanent loss. (Figures
redrawn from Lewin, 2000).
References
Alberts, B. et al.: Essential Cell Biology. An Introduction
to the Molecular Biology of the Cell.
Garland Publishing, New York, 1998.
Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
1999.
Lewin, B.: Genes VII. Oxford Univ. Press, Oxford,
2000.
Transposition
Aside from homologous recombination, the overall stability
of the genome is interrupted by mobile sequences called
transposable elements or transposons. There are different
classes of distinct DNA sequences that are able to transport
themselves to other locations within the genome. This
process utilizes recombination but does not result in
an exchange. Rather, a transposon moves directly from
one site of the genome to another without an intermediary
such as plasmid or phage DNA (see section on prokaryotes).
This results in rearrangements that create newsequences
and change the functions of target sequences. Transposons
may be a major source of evolutionary changes in the
genome. In some cases they cause disease when inserted
into a functioning gene. Three examples are presented
below: insertion sequences (IS), transposons (Tn), and
retroelements transposing via an RNA intermediate.
Insertion sequences (IS) and transposons
(Tn)
A characteristic feature of IS transposition is
the presence of a pair of short direct repeats of
target DNA at either end. The IS itself carries
inverted repeats of about 9–13 bp at both ends and
depending on the particular class consists of about
750–1500 bp, which contain a single long coding
region for transposase (the enzyme responsible for
transposition of mobile sequences). Target selection
is either random or at particular sites. The presence
of inverted terminal repeats and the short direct
repeats of host DNA result in a characteristic structure
(1). Transposons carry in addition a central region
with genetic markers unrelated to transposition,
e.g., antibiotic resistance (2). They are flanked
either by direct repeats (same direction) or by
inverted repeats (opposite direction, 3).
Replicative and nonreplicative transposition
With replicative transposition (1) the original
transposon remains in place and creates a new copy
of itself, which inserts into a recipient site elsewhere.
Thus, this mechanism leads to an increase in the
number of copies of the transposon in the genome.
This type involves two enzymatic activities: a transposase
acting on the ends of the original transposon and
resolvase acting on the duplicated copies. In nonreplicative
transposition (2) the transposing element itself
moves as a physical entity directly to another site.
The donor site is either repaired (in eukaryotes)
or may be destroyed (in bacteria) if more than one
copy of the chromosome is present.
Transposition of retroelements
Retrotransposition requires synthesis of an RNA
copy of the inserted retroelement. Retroviruses
including the human immunodeficiency virus and RNA
tumor viruses are important retroelements (see p.
100 and p. 314). The first step in retrotransposition
is the synthesis of an RNA copy of the inserted
retroelement, followed by reverse transcription
up to the polyadenylation sequence in the 3! long
terminal repeat (3! LTR). Three important classes
of mammalian transposons that undergo or have undergone
retrotransposition through an RNA intermediary are
shown. Endogenous retroviruses (1) are sequences
that resemble retroviruses but cannot infect new
cells and are restricted to one genome. Nonviral
retrotransposons (2) lack LTRs and usually other
parts of retroviruses. Both types contain reverse
transcriptase and are therefore capable of independent
transposition. Processed pseudogenes (3) or retropseudogenes
lack reverse transcriptase and cannot transpose
independently. They contain two groups: low copy
number of processed pseudogenes transcribed by RNA
polymerase II and high copy number of mammalian
SINE sequences, such as human Alu and the mouse
B1 repeat families.
Trinucleotide Repeat Expansion
The human genome contains tandem repeats of trinucleotides.Normally
they occur in groups of 5–35 repeats. When their number
exceeds a certain threshold and they occur in a gene
or close to it, they cause diseases. Once the normal,
variable length has expanded, the increased number of
repeats tends to increase even further when passed through
the germline or during mitosis. Thus, trinucleotide
expansions form a class of unstable mutations, to date
observed in humans only.
Different types of trinucleotide repeats
and their expansions
Trinucleotide repeats can be distinguished according
to their localization with respect to a gene. Expansions
are greater outside genes and more moderate within
coding regions. In several severe neurological diseases,
abnormally expanded CAG repeats are part of the
gene. CAG repeats encode a series of glutamines
(polyglutamine tracts). Within a normal number of
repeats, which varies according to the gene involved,
the gene functions normally (1). However, an expanded
number of repeats leads to an abnormal gene product
with altered function. Trinucleotide repeats also
occur in noncoding regions of a gene (2). Fairly
common types are CGG and GCC repeats. The increase
in the number of these repeats can be drastic, up
to 1000 or more repeats. The first stages of expansion
usually do not lead to clinical signs of a disease,
but they do predispose to increased expansion of
the repeat in the offspring of a carrier (premutation).
Unstable trinucleotide repeats in different
diseases
Disorders due to expansion of trinucleotide repeats
can be distinguished according to the type of trinucleotide
repeat, i.e., the sequence of the three nucleotides,
their location with respect to the gene involved,
and their clinical features. All involve the central
or the peripheral nervous system. Type I trinucleotide
diseases are characterized by CAG trinucleotide
expansion within the coding region of different
genes. The triplet CAG codes for glutamine. About
20 CAG repeats occur normally in these genes, so
that about 20 glutamines occur in the gene product.
In the disease state the number of glutamines is
greatly increased in the protein. Hence, they are
collectively referred to as polyglutamine disorders.
Type II trinucleotide diseases are characterized
by expansion of CTG, GAA, GCC, or CGG trinucleotides
within a noncoding region of the gene involved,
either at the 5! end (GCC in fragile X syndrome
type A, FRAXA), at the 3! end (CGG in FRAXE; CTG
inmyotonic dystrophy), or in an intron (GAA in Friedreich
ataxia). A brief reviewof these disorders is given
on p. 394.
Principle of laboratory diagnosis of
unstable trinucleotide repeats
The laboratory diagnosis compares the sizes of the
trinucleotide repeats in the two alleles of the
gene examined. One can distinguish very large expansions
of repeats outside coding sequences (50 to more
than 1000 repeats) and moderate expansion within
coding sequences (20 to 100–200). The figure shows
11 lanes, each representing one person: normal controls
in lanes 1–3; confirmed patients in lanes 4–6; and
a family with an affected father (lane 7), an affected
son (lane 10), the unaffected mother (lane 11),
and two unaffected children: a son (lane 8) and
a daughter (lane 9). Size markers are shown at the
left. Each lane represents a polyacrylamide gel
and the (CAG)n repeat of the Huntington locus amplified
by polymerase chain reaction shown as a band of
defined size. Each person shows the two alleles.
In the affected persons the band representing one
allele lies above the threshold in the expanded
region (in practice the bands are somewhat blurred
because the exact repeat size varies in DNA from
different cells).
References
Strachan, T., Read, A.P.: Human Molecular
Genetics. 2nd ed. Bios Scientific Publishers,
Oxford, 1999.
Warren, S. T.: The expanding world of trinucleotide
repeats. Science 271: 1374–1375,
1996.
Rosenberg, R.N.: DNA-triplet repeats and neurologic
disease.NewEng. J.Med. 335: 1222–
1224, 1996.
Zoghbi, H.Y.: Spinocerebellar ataxia and other
disorders of trinucleotide repeats, pp. 913–
920, In: Principles of Molecular Medicine,
J.C. Jameson, ed. Humana Press, Totowa, NJ,
1998.
DNA Repair
Lifewould not be possible without the ability to repair
damaged DNA. Since replication errors, including mismatch,
and harmful exogenous factors are everyday problems
for a living organism, a broad repertoire of repair
genes has evolved in prokaryotes and eukaryotes. The
following types of DNA repair can be distinguished by
their basic mechanisms: (1) excision repair to remove
a damaged DNA site, such as a strand with a thymine
dimer; (2) mismatch repair to correct errors of replication
by excising a stretch of single-stranded DNA containing
the wrong base; (3) repair of UV-damaged DNA during
replication; and (4) transcriptioncoupled repair in
active genes.
Excision repair
The damaged strand of DNA is distorted and can be
recognized by a set of three proteins, the UvrA,
UvrB, and UvrC endonucleases in prokaryotes and
XPA, XPB, and XPC in human cells. This DNA strand
is cleaved on both sides of the damage by an exonuclease
protein complex, and a stretch of about 12 or 13
nucleotides in prokaryotes and 27 to 29 nucleotides
in eukaryotes is removed. DNA repair synthesis restores
the missing stretch and a DNA ligase closes the
gap.
Mismatch repair
Mismatch repair corrects errors of replication.
However, the newly synthesized DNA strand containing
the wrong base must be distinguished fromthe parent
strand, and the site of a mismatch identified. The
former is based on a difference in methylation in
prokaryotes. The daughter strand is undermethylated
at this stage. E. coli has three mismatch repair
systems: long patch, short patch, and very short
patch. The long patch systemcan replace 1 kb DNA
and more. It requires three repair proteins, MutH,
MutL, and MutS, which have the human homologues
hMSH1, hMLH1, and hMSH2. Mutations in their respective
genes lead to cancer due to defective mismatch repair.
Replication repair of UV-damaged DNA
DNA damage interferes with replication, especially
in the leading strand. Large stretches remain unreplicated
beyond the damaged site (in the 3! direction of
the new strand) unless swiftly repaired. The lagging
strand is not affected as much because Okazaki fragments
(about 100 nucleotides in length) of newly synthesized
DNA are also formed beyond the damaged site. This
leads to an asymmetric replication fork and single-stranded
regions of the leading strand. Aside from repair
by recombination, the damaged site can be bypassed.
Double-strand repair by homologous recombination
Double-strand damage is a common consequence of
! radiation. An important human pathway for mediating
repair requires three proteins, encoded by the genes
ATM, BRCA1, and BRCA2. Their names are derived fromimportant
diseases that result from mutations in these genes:
ataxia telangiectasia (see p. 334) and hereditary
predisposition to breast cancer (BRCA1 and BRCA2,
see p. 328. ATM, a member of a protein kinase family,
is activated in response to DNA damage (1). Its
active form phosphorylates BRCA1 at specific sites
(2). Phosphorylated BRCA1 induces homologous recombination
in cooperation with BRCA2 and mRAD5, the mammalian
homologue of E. coli RecA repair protein (3). This
is required for efficient DNA double-break repair.
Phosphorylated BRCA1 may also be involved in transcription
and transcription-coupled DNA repair (4). (Figure
redrawn from Ventikaraman, 1999).
References
Buermeyer, A.B. et al.: Mammalian DNA mismatch
repair. Ann. Rev. Genet. 33:533–564,
1999.
Cleaver, J.E.: Stopping DNA replication in its
tracks. Science 285:212–213, 1999.
Cortez D., et al.: Requirement of ATM-dependent
phosphorylation of Brca1 in the DNA
damage response to double-strand breaks.
Science 286:1162–1166, 1999.
Masutani, C., et al.: The XPV (xeroderma pigmentosum
variant) gene encodes human
DNA polymerase. Nature 399:700–704,
1999.
Sancar, A.: Excision repair invades the territory
of mismatch repair. Nature Genet. 21:247–
249, 1999.
Ventikaraman A.R.: Breast cancer genes and
DNA repair. Science 286:1100–1101, 1999.
Xeroderma Pigmentosum
Xeroderma pigmentosum (XP) is a heterogeneous group
of genetically determined skin disorders due to unusual
sensitivity to ultraviolet light. They are manifested
by dryness and pigmentation of the exposed regions of
skin (xeroderma pigmentosum = “dry, pigmented skin”).
The exposed areas of skin also show a tendency to develop
tumors. The causes are different genetic defects of
DNA repair. Repair involves mechanisms similar to those
involved in transcription and replication. The necessary
enzymes are encoded by at least a dozen genes, which
are highly conserved in bacteria, yeast, and mammals.
Clinical phenotype
The skin changes are limited to UV-exposed areas
(1 and 2). Unexposed areas show no changes. Thus
it is important to protect patients from UV light.
An especially important feature is the tendency
for multiple skin tumors to develop in the exposed
areas (3). These may even occur in childhood or
early adolescence. The types of tumors are the same
as those occurring in healthy individuals after
prolonged UV exposure.
Cellular phenotype
The UV sensitivity of cells can be demonstrated
in vitro. When cultured fibroblasts from the skin
of patients are exposed to UV light, the cells show
a distinct dose-dependent decrease in survival rate
compared with normal cells (1). Different degrees
of UV sensitivity can be demonstrated. The short
segment of new DNA normally formed during excision
repair can be demonstrated by culturing cells in
the presence of [3H]thymidine and exposing them
to UV light. The DNA synthesis induced for DNA repair
can be made visible in autoradiographs. Since [3H]thymidine
is incorporated during DNA repair, these bases are
visible as small dots caused by the isotope on the
film (2). In contrast, xeroderma (XP) cells show
markedly decreased or almost absent repair synthesis.
(Photograph of Bootsma & Hoeijmakers, 1999).
Genetic complementation in cell hybrids
If skin cells (fibroblasts) from normal persons
and from patients (XP) are fused (cell hybrids)
in culture and exposed to UV light, the cellular
XP phenotype will be corrected (1). Normal DNA repair
occurs. Also, hybrid cells from two different forms
of XP shownormal DNA synthesis (2) because cells
with different repair defects correct each other
(genetic complementation). However, if the mutant
cells have the same defect (3), they are not be
able to correct each other (4) because they belong
to the same complementation group. At present about
ten complementation groups are known in xeroderma
pigmentosum. They differ clinically in terms of
severity and central nervous system involvement.
Each complementation group is based on a mutation
at a different gene locus. Several of these genes
have been cloned and show homology with repair genes
of other organisms, including yeast and bacteria.
References
Berneburg, M. et al.: UV damage causes uncontrolled
DNA breakage in cells from patients
with combined features of XP-D and Cockayne
syndrome. Embo J. 19:1157–1166,
2000.
Bootsma, D.A., Hoeijmakers, J.H.J.: The genetic
basis of xeroderma pigmentosum. Ann.
Génét. 34:143–150, 1991.
Cleaver, J.E., et al.: A summary of mutations in
the UV-sensitive disorders: xeroderma pigmentosum,
Cockayne syndrome, and trichothiodystrophy.
Hum. Mutat. 14:9–22,
1999.
Cleaver, J.E.: Common pathways for ultraviolet
skin carcinogenesis in the repair and replication
defective groups of xeroderma pigmentosum.
J. Dermatol. Sci. 23:1–11, 2000.
de Boer, J., Hoeijmakers J.H.: Nucleotide excision
repair and human syndromes. Carcinogenesis
21:453–460, 2000.
Hanawalt, P.C.: Transcription-coupled repair
and human diseases. Science 266:1957–
1958, 1994.
Sancar, A.: Mechanisms of DNS excision repair.
Science 266:1954–1956, 1994.
Taylor, E.M., et al.: Xeroderma pigmentosum
and trichothiodystrophy are associated
with different mutations in the XPD
(ERCC2). Proc. Natl. Acad. Sci. 94: 8658–
8663, 1997.