Home > M.R. Bauer Foundation > 2000 Summary Report > Bruce Birren, Ph.D.

The 2000 Volen Center Scientific Retreat

Bruce Birren, Ph.D.
Whitehead Institute
Massachusetts Institute of Technology
Center for Genome Research
Cambridge, Massachusetts

The Human Genome Project:
Not a Thousand People with Pipettes

Traditionally, the science of genetics concerned itself with the study of one gene at a time, or on occasion, the interaction of a very few genes. However, we increasingly understand that the regulation of gene function and expression is governed through the complex interaction of the entire set of proteins expressed in any given cell. Hence a single gene view necessarily limits our ability to understand complex biological systems. Despite a century's worth of research in genetics, it was only with the advent of high-through-put DNA sequencing that we were able to generate a complete list of the genes present in any organism. Our ability to sequence entire genomes thus represents a dramatic new turn in biology and the technical advances that make this possible, and the implications of this wealth of data bear some reflection.

The history of research involving the nematode C. elegans shows three distinct phases of work, each representing increasing power to understand basic biological processes. In the first, the genetic era, we identified mutants with readily observed phenotypes, including the large class with uncoordinated movement. In the molecular era, individual genes could be cloned and sequenced and transformation allowed the direct study of gene function. However, in the genomic era, when the entire genome sequence is known, we suddenly recognize that the complete cast of genetic players contains what were yet unimagined elements. Similarly, despite the intense study of the fruit fly Drosophila over the past 70 years, by last year when the genome sequence became available, only 2,500 genes had been identified via traditional means. With the completion of the sequence we not only knew that there were 13,600 genes, but we could classify them into functional categories and immediately recognize genes capable of driving biological processes for which no evidence had ever been seen. In addition, from the sequence we also recognize that many functions that were thought to be performed by a single protein are carried out by a family of related proteins, which has obvious implications for our attempts to understand any one of them.

One of the most challenging technical aspects of the Human Genome Project has been to scale up what had originally been a complex series of laboratory steps performed by highly trained scientists. The first need was to deeply understand the laboratory processes, the causes of variation in the process, and the main drivers of cost and data quality. From this analysis we were able to design simple, robust procedures that could be performed in an automated fashion. In this way we were able to generate large amounts of high quality data in a highly predictable manner. In a large scale project that takes place over a sustained period, a great deal of work occurs once the learning curve has been climbed. Thus the energy invested in thoroughly understanding the process is paid back in increased efficiency and economies of scale. Automated data capture and analysis become equally essential to the success of a large project.

A central goal of the Human Genome Project has been to not only obtain the human genome sequence but in so doing develop a process for sequencing that will allow us to efficiently obtain sequence for any large genome. Increasing the ease with which we obtain further genome sequence will fundamentally change the way we approach biological research. For example, having the complete gene list of an organism allows us to study gene expression by simultaneously examining all genes, rather than a single gene at a time. The sequence of each new organism DNA sequence is beginning to reveal the full depth of the diversity of life on earth, a difficult problem given that only a tiny minority of the species can be cultured in a lab. Comparative genomics, in which sequence from one organism is compared to another, not only allows us to recognize the evolutionary history of species and t heir component biochemical pathways but provides an important new opportunity to identify the regulatory signals embedded in DNA sequence. Non-coding sequences that have preserved similarity over evolutionary time can imply conserved function associated with gene regulation. We continue to develop faster and cheaper ways to sequence DNA with the understanding that these data and associated technologies will form the foundation of biological research in the next few decades.

 

 

 

Speaker Schedule for Current Year  |  Reports from Previous Years
Top of Page | Life Sciences | Brandeis University