~  Education Home  ~
Collins, Allen G. 1999. Molecules and evolutionary history.
In: Springer D. and J. Scotchmoor (eds.) Evolution: investigating the evidence. Paleontological Society Papers, volume 3.
This publication by the Paleontological Society aims at providing high school teachers with readable reviews of a variety of topics related to evolutionary history.
PALEONTOLOGISTS LEARN and tell the history of life; it is our job. You might suspect that paleontologists spend most of their time studying fossils. While fossils are an important source of information for the paleontologist, other types of evidence can also tell us about biological history. For instance, the rocks themselves provide important information, especially about past climates. It makes perfect sense that organisms are more easily understood if you know the environment in which they lived. A third important source of information is all around us. The organisms alive today are the current products of the various processes of evolution that have been at work for more than three billion years. Organisms carry the legacy of their histories with them, in their anatomy, behavior, and genes. By studying and comparing living organisms, we learn about the past. Advances in technology have made the abundant historical information contained in biological molecules, chiefly genes and their RNA and protein products, easier to obtain. Thus, it is not too surprising to see today’s paleontologist setting about his or her business with a rock hammer in one hand and a pipettor in the other.
Many different things can be learned about the history of life from molecules. The most important lesson lies in their ability to unveil how organisms are related to one another, i.e. life’s phylogeny (see chapters in this volume that deal with phylogeny). Our understanding of evolutionary relationships has been revolutionized in a very short period, 20 years or so, spurred on by the study of molecules. For instance, molecules have shown us that there are two distinct types of prokaryotes (Archaea and Bacteria). Furthermore, you might be surprised to learn that you are more closely related to one type of prokaryote than to the other, as will be discussed later. While interesting on its own, phylogeny is also very useful. Biological questions are easier to figure out if you know something about the phylogeny of the organisms concerned. For instance, you might want to know how an adaptation such as insect wings came about. In this case, it would be helpful to compare the anatomy and behavior of insects to the anatomy and behavior of those organisms that are most closely related to insects. But, what organisms are most closely related to insects? Phylogeny guides the paleontologist to make comparisons that best unveil the answers to biological questions of all sorts.
Technological advances have put molecules within the grasp of historical biologists and molecules have proven useful for tasks other than phylogenetic reconstruction. As you will read below, some molecules have been used as clocks to date the origination of groups of organisms. Other molecules are revealing how body plans and structures of multicellular organisms are formed. These molecules hold the exciting promise of exposing how body plans first evolved. Molecules are even found in rocks, as fossils that mark the presence and/or activity of organisms in the past.
How does molecular evidence reveal phylogeny?
Figure 1. Patterns that can be seen in a few molecular characters taken from the 18S ribosomal RNA gene sequences of ten hydrozoan species (taken from the author’s unpublished data). Differences in the molecular sequence characters are in boxes and highlighted with shading. From the patterns present in these data, one might recognize three groups of hydrozoans based on similarity of the sequences (group A: hydrozoans 1 and 2; group B: 3,4, and 5; group C: 6,7,8,9, and 10). Further, one might suppose that groups A and B inherited the characters that they uniquely share as a result of common history, suggesting that A and B are more closely related to each other than either is to Group C. Full phylogenetic analyses include so much data that the alternative grouping possibilities are immense. Computer programs are necessary to carry out phylogenetic analyses to completion.
DNA, RNA, and proteins have potential to reveal evolutionary relationships for three reasons. First, nucleic acids and proteins are composed of linear strings of numerous smaller parts, nucleotides and amino acids respectively. Each nucleotide or amino acid in a molecular sequence is a character, albeit not a very colorful one, that can be used to describe an organism. Second, these molecules are replicated from generation to generation, but not perfectly. Changes of various sorts in the genomes of all organisms happen all the time. Some of these mutations are inherited by descendants. Finally, all living organisms on Earth share some history as a common lineage. That is, living organisms are all connected by ancestor-to-descendant relationships. Thus, by comparing the nucleotide and/or amino acid sequences of different organisms, it is possible to identify characters that two or more organisms share as a result of their common history. For example, Figure 1 shows a handful of molecular characters for ten hydrozoan jellyfish. By eyeballing these data, you might detect some patterns that possibly indicate shared history. Recognizing patterns such as these is the beginning of a phylogenetic analysis. However, the data and the alternative grouping possibilities are so numerous that computer programs, as described below, are needed to carry phylogenetic analyses to completion.
It should be emphasized that besides molecules, anatomical, physiological, behavioral, embryological and other variable and inherited characters are also useful for phylogenetic inference. Today’s emphasis on using molecular characters (for phylogenetic analysis) is due in part to technological advances that have made it possible to gather numerous molecular characters inexpensively. Another reason that molecules are so commonly used is probably that they are fashionable. Fortunately, the current spate of molecular phylogenies is spurring on phylogenetic analyses based on non-molecular characters. Technological advances, for instance in image acquisition and analysis, are also making non-molecular characters more readily available. All types of data that have the potential to reveal phylogenetic history should be investigated.
Choosing the right molecule for revealing phylogeny.
Computer programs are a must for molecular analyses.
Is there any assurance that molecular analyses work?
The results of molecular phylogenetic analyses.
Consider the case of animals, plants, and fungi. Traditionally, plants and fungi were grouped together. Today, most fungal collections reside at botanical institutions as an historical consequence of this view. Later, as fungi became better characterized, they were placed as one of the five great kingdoms of life, on a par with plants and animals. Later still, as phylogenetic thinking was beginning to take hold, some morphological characteristics hinted that fungi and animals may be more closely related to each other (Cavalier-Smith, 1987). Not surprisingly, this phylogenetic hypothesis was tested with molecular sequences. Ribosomal RNA sequences corroborated the link between fungi and animals (Wainwright et al., 1993). Since then, evidence from several other genes that suggest that animals and fungi are more closely related to each other than either group is to plants has been reported (Baldauf and Palmer, 1993; Borchiellini et al., 1998). Unless or until contradictory information is brought into view, it will be accepted that this hypothesis best represents the true evolutionary relationship of these three groups.
Some phylogenetic questions generate considerable controversy. One example includes the use of fossil DNA. Under circumstances where fossils are preserved without free water (examples would be extreme cold or in amber), DNA may not degrade. Recently, DNA from a fossil mammoth was extracted and an attempt was made to determine how it is related to the two living species of elephants, Asian and African (Ozawa et al., 1997). These researchers “confirmed” that the mammoth was more closely related to the Asian elephant than it is to the African elephant in accordance with morphological data. Less than a year later, a second group of researchers reported that fossil DNA that they had extracted from mammoth indicated the contrary (Noro et al., 1998). Disagreements, such as this, are sometimes used to conclude that molecular sequences are not good at revealing phylogeny. While they may be frustrating, contradictions are normal events in the progression of scientific knowledge.
Molecular phylogenies do not only focus on events of the distant past. Some have practical importance to our lives. Recently, a molecular phylogenetic analysis was used to suggest that the virus which causes AIDS in humans, HIV, is derived from a similar virus that exists harmlessly in chimpanzees (Gao et al., 1999). Moreover, the phylogenetic results were so robust (well-supported) that they allowed the researchers to strongly suggest that a specific subspecies of chimpanzee from western equatorial Africa is the host of the strain of HIV that causes AIDS in humans. Interestingly, chimpanzees are hunted for food in this region of Africa, providing a likely mechanism of cross-species transmission of the virus to humans.
Another recent phylogenetic study, which included chimpanzees and humans, attempted a new classification of primates based on molecular, morphological, and fossil data (Goodman et al., 1998). Among the interesting conclusions of this study (for humans anyway) was that humans and chimpanzees ought to be given the same generic name. This argument rests on the fact that the degree of sequence divergence between other primate species of the same genus is equivalent to or exceeds that observed for chimpanzees and humans. Our generic name, Homo, is older and thus has precedence over the generic name of chimpanzees, Pan. Had it gone the other way, just imagine Carolus Linnaeus, the type specimen of our species, rolling in his grave on learning that he had suddenly become Pan sapiens.
Molecular phylogenies and systematics.
Because textbooks are not keeping up with our rapid gain in knowledge of evolutionary relationships, teachers are put in an unfortunate position. Teachers can hardly be expected to consult the primary literature to get the latest phylogenies or classifications. Our views of evolutionary relationships, and the classification schemes based on them, are changing so rapidly that textbooks are quickly outmoded. So, what are teachers to do? Here are a few recommendations.
Molecular phylogenies and biogeography.
How can these ideas be tested? Branch points on a phylogeny represent events of speciation, and speciation is what generates new species. Thus, a necessary first step to evaluating these hypotheses was to determine how cowrie species are related to each other. To this end, Meyer created a comprehensive phylogeny using the sequences of two genes (Meyer, 1998). He then mapped geographic distributions from living and fossil cowrie species onto this phylogeny. The resulting pattern was mosaic, but a clear picture began to emerge as he incorporated the geologic history of the region. What he found was that Hypothesis 3 is unlikely to be responsible for the high diversity seen in the TWP, because his phylogeny showed that very few closely related species have ranges that overlap in the TWP. Hypothesis 2 could also be ruled out as the primary explanation, especially over longer periods of geologic time (greater than three million years), since his phylogeny indicated that species living on the periphery of the TWP were typically ancient lineages that had remained isolated for many millions of years. Finally, Hypothesis 1, that species had been preferentially generated within the western Pacific appeared to be very likely. His phylogeny revealed that one particular group of cowries had diversified into many species in the relatively recent past. Sea level changes associated with ice ages during the last 2.5 million years have apparently isolated small basins in the TWP, providing an ideal setting for speciation to occur.
Dating divergences and the molecular clock.
When a molecular clock estimate is made for the origin of a group of organisms that has very little or no fossil record, arguments can be made about techniques and assumptions. The debates become far more interesting, however, when molecular clocks are used to date the origin of a group with a relatively robust fossil record. Fossils and molecular sequences are independent lines of evidence that ideally would corroborate each other. Molecular clock estimates for the origin of groups usually predate the earliest fossil evidence for the group. In a certain respect, one would expect this. The first fossil of a group would be a minimum age estimate of when that group first evolved. However, it is sometimes difficult to invoke such an explanation for the discrepancies observed between molecular clock estimates and first fossil estimates. Such is a case with a number of groups, e.g., primates, birds, mammals, animals, flowering plants, plants, etc.
As an example, consider the animals. Molecular clock estimates for the divergence of early animal lineages range from 1,500 to 700 million years before present (Runnegar, 1982; Wray et al., 1996; Nikoh et al., 1997; Gu, 1998; Ayala et al., 1998; Bromham et al., 1998). The oldest fossil evidence of animals is roughly 600 million years old (Brasier and McIlroy, 1998; Li et al., 1998). At present, there is a disparity of roughly 100 to 900 million years between the molecular clock estimates for when the major animal clades originated and the oldest fossil evidence that definitively demonstrates their existence. Two opposing possibilities could explain this disparity. There is either a hidden period of animal history or there is a systematic bias in the molecular clock estimates. For many paleontologists, it is hard to imagine an adequate explanation for the absence of animal fossils over hundreds of million years. The leading explanation offered by the proponents of molecular dates invokes the idea that animals were too small to be fossilized during this extended period. It can be countered that many fossils of small animals, while rare, are known. As for a bias in the molecular dates, little work has been done to address this possibility. Thus, we are currently in a state of partial ignorance concerning molecular clocks. A certain amount of caution in applying, interpreting, and evaluating molecular clocks is warranted.
A few words on the molecules behind development.
A Few Words on Fossil Molecules, Biomarkers.
Other biomarkers, preserved in fossil animal skeletons such as shells, have been used to infer ecological interactions of the past (CoBabe, in press_a, in press_b). Work of this sort is extremely promising because it strongly integrates ecology and evolution. For example, by determining the presence and ratio of certain biomarkers in a given snail shell, one can deduce whether the snail was an herbivore or a carnivore. Moreover, one can determine whether the snail was a generalized feeder or specialized in one food source. This information could be used to make connections between changes in diet and changes morphology or the environment through time. Biomarkers incorporated into skeletal material have also been used to infer the presence of intercellular chemosymbionts in animals. This valuable information about the lives of past organisms could not otherwise have been ascertained without the use of biomarkers.
Perhaps it should be mentioned briefly that not all biomarkers are molecules. Paleontological studies employ isotopes in a variety ways, sometimes as biomarkers. Just recently, isotopes have been used to infer that life existed at least 200 million years prior to the oldest fossil remains, which are known from strata dating to 3,500 million years (Rosing, 1999). These researchers looked at the ratio of two isotopes of carbon, carbon-12 and carbon-13, in 3,700 million year old strata, and found that the rocks were enriched in carbon-12. They inferred the presence of life because sediment formed on sea bottoms today is similarly enriched in carbon-12 in areas rich with bacterial plankton.
Rooting the Tree of Life, A Clever Use of Molecules.
Figure 2. The three major divisions of life, Archaea, Bacteria, and Eukaryota (used with permission from the website of the University of California Museum of Paleontology: http://www.ucmp.berkeley.edu/alllife/threedomains.html).
The method by which this result was determined is not exactly straightforward, but it required an elegant and novel use of molecules, so it is worth outlining. The key to resolving the relationships of Archaea, Bacteria, and Eukaryota was to establish the root of the tree of life. The root of a phylogenetic tree is the place on the tree that represents the last common ancestor of all organisms being considered. Figure 3 shows how alternative possibilities for placing a root on the tree of life imply alternative relationships among Archaea, Bacteria, and Eukaryota. The general method for determining the root of phylogenetic trees is to use an outgroup (one or more organisms that are more distantly related to the organisms in question). Figure 4 shows how an outgroup is used to place a root on a phylogenetic tree. In the case of the tree of all life, however, there is no outgroup because the only possibilities would be non-living things, which are not related to life by definition. Rooting the tree of life was a conundrum until molecules were cleverly put to use to answer it.
Figure 3. Three alternative possibilities for the root (the point which represents the last common ancestor) on the tree of life. Without a root, it is not possible to tell which two of the three groups (Archaea, Bacteria, and Eukaryota) is most closely related. The three alternative placements of the root on the tree, imply three separate possibilities for the evolutionary relationships of the three groups.
There is strong molecular evidence to support the assertion that Archaea and Eukaryota share a more recent common ancestor than either group does with the Bacteria (possibility #2 in Fig. 3). In order to understand how this relationship was determined, it is necessary to know a little about the process of gene duplication. Some mutations involve the duplication of portions of the genome, which may result in the creation of a redundant copy of a gene. After the duplication event, the two genes evolve separately. The first step to solving the root of life was to find a gene that was duplicated prior to the last common ancestor of everything alive today.
Figure 4. Illustration of how an outgroup, one or more organisms that is less closely related to the organisms under consideration, is normally used to root a branching diagram. Without an outgroup to root the tree, it is not possible to tell which two of the three organisms (bird, jellyfish, and butterfly) are most closely related. By including a fern as an outgroup to the analysis, the tree can be rooted, and the relatedness between the bird, jellyfish, and butterfly is revealed.
Following along with Figure 5, let’s step through the logic of how such a gene can resolve this puzzle. Consider a gene, GeneA, which was duplicated (A1 and A2) prior to the last common ancestor of everything alive today. Next, suppose the last common ancestor inherits the two forms of the gene, GeneA1 and GeneA2, and subsequently passes them on to all of its descendants. The two forms of GeneA, A1 and A2, will be present in any living organism (Fig 5A.).
Figure 5. Illustration of how an anciently duplicated gene could be used to determine the root of the tree of life.
If you were to build a tree using the sequences from Archaea, Bacteria, and Eukaryota for either GeneA1 or GeneA2, then the three groups would be revealed as distinct from one another. However, without a root there would be no way of knowing which two of the three groups shared the most recent common ancestor. All is not lost, however, because we know that GeneA1 and GeneA2 are related to each other. Further, we know that the gene that gave rise to GeneA1 and GeneA2 (denoted by a gray star in Fig. 5) predates the last common ancestor of all life (denoted by a gray circle in Fig. 5).
The next step is to build a gene tree using the sequences of both GeneA1 and GeneA2. Finally, we can reason that the root of this gene tree should be placed on the branch that connects the GeneA1 side of the tree to the GeneA2 side. This is because the root represents the common gene ancestor, which existed prior to the last common ancestor of the three groups of life.
It was through this ingenious method that two groups of scientists independently concluded that the Archaea and Eukaryota are more closely related to each other than either is to the Bacteria (Gogarten et al., 1989; Iwabe et al., 1989). Since then, several studies that have relied on different pairs of anciently duplicated gene sequences have all reached the same conclusion (Brown and Doolittle, 1995; Lawson et al., 1996; Gribaldo and Caqmmarano, 1998). It is difficult to imagine that Halobacterium, a salt-loving single-celled organism without organelles or a nucleus, shares more history with you than it does with true bacteria, but it appears to be true.
Using molecular sequence data to derive phylogeny has become a widespread practice because these data are reasonably easy to obtain. The relative ease of collecting molecular sequences is largely due to a key technological innovation, the polymerase chain reaction (PCR). When a biological problem is recognized that can be addressed by a molecular phylogeny, appropriate tissue samples must be gathered. After that, a few simple tricks are performed back in the laboratory to extract historical information from the molecules.
This section should provide a general understanding of how molecular sequence data are obtained and is not a guide to performing this type of work. There are numerous variants on each of the basic steps outlined here, including ones using household items that can be easily used in the classroom. Detailed descriptions of a bevy of molecular techniques useful to the systematist can be found in a comprehensive book edited by Hillis et al. (1996b).
Extracting nucleic acids from tissue.
Once extracted, DNA can be visualized by placing a tiny amount in a gel and applying a current. Since DNA is negatively charged, it will migrate towards a positive charge at a speed that is proportional to the size of the DNA fragment. Thus, after current has been applied for a period, smaller pieces of DNA will have moved farther through the gel than larger pieces. The next step is to stain the gel with a substance called ethidium bromide. Ethidium bromide sticks to DNA, and has the handy characteristic of fluorescing under UV light. Thus, you can take a photo of DNA in an ethidium bromide stained gel under UV light (Fig. 6). Having large pieces of extracted DNA maximizes your chances of successful PCR.
Figure 6. Partial photograph of a gel containing DNA that has been stained with ethidium bromide. Samples were loaded at the left, and moved through the gel to the right when a current was applied.
Amplifying the target gene by PCR.
The PCR mix is put into a machine that cycles (25 to 35 times) through three temperatures. During the time that the PCR mix is at the first temperature, which is very high (92o C to 95o C), double-stranded DNA splits (denatures) into single strands. At the second temperature of the cycle, which is much cooler (37 o C to 60 o C), single strands of DNA bind (anneal) to complementary pieces of DNA. During this annealing stage, the primers preferentially find their match because they are much smaller than other pieces of DNA in the solution. During the third stage, DNA is replicated. This process occurs at 72 o C, the temperature at which the enzyme optimally extends new DNA strands by binding free nucleotides. This enzyme is active at 72 o C because it is derived from a heat loving bacterium. The discovery of this enzyme was an essential step in the development of PCR. Similarly acting enzymes from most organisms would be active at lower temperatures and destroyed by the high temperature needed for DNA denaturation. After the third stage, the cycle begins again. During each cycle, the concentration of the target gene increases, which enhances the efficiency of the annealing process in step two. In this way, the number of copies of the target gene grows exponentially. In order to check the results after the PCR is completed, a portion of the PCR product is run through a gel, stained with ethidium bromide, and photographed as described above (Fig. 6).
Deriving the sequence of nucleotides.
The second stage in “reading” the sequence is to run the partial DNA strands through a gel. The fragments move through the gel at a speed that is proportional to their length. The first fragment to reach the end of the gel is one nucleotide long. It is followed by the fragment that is two nucleotides long, and so forth. Near the bottom end of the gel is a laser that shines on each fragment as it passes by. A sensor records the characteristic signal given off by the final nucleotide of the fragment; a G, A, T, or C. The sequence of the target gene is recorded nucleotide by nucleotide. The process is rarely perfect, so genes are usually sequenced in both directions and the complementary sequences are compared to check for discrepancies. The discrepancies are then resolved by visually inspecting digital images recorded by the sensor. Accuracy of molecular sequences is extremely important for later analyses. To paraphrase an esteemed colleague, you cannot make chicken salad from chicken “excrement”.
I hope I have shown that biological molecules are an enormous source of information about the history of life. Many biological questions have already been clarified using molecules (I have shared just a few), but countless questions still remain un-addressed. Current and future generations of scientists have a great deal of work ahead of them. But this is fortunate, because this type of work is incredibly fun. Many historical questions require time spent travelling and working in the field where other fascinating questions arise. Solving these questions is a challenging task, requiring creative and synthetic thinking. But it is a rewarding endeavor because these problems are tractable, all the more so given our growing knowledge of biological molecules. And so, we are able to share what we have learned about the history of life. In fact, being a historical biologist is so enjoyable that it is more like play than labor.
I thank a number of people who have reviewed and improved this manuscript, including J. A. Johnson, H. H. Hamilton, M. Stefanski, J. W. Valentine and two anonymous reviewers. As always, I am indebted to the University of California Museum of Paleontology for providing support. Finally, I am grateful to M. L. Sogin and J. D. Silberman who taught me how to pipette and think at the laboratory bench.
Avise, J. C. 1994. Molecular Markers, Natural History and Evolution. Chapman & Hall, New York, 511 p.
© Paleobio.org   Updated: June 2003   Contact: jen-AT-paleobio or allen-AT-paleobio