After the First Decade of Metagenomics, Adolescent Growth Spurt Anticipated
But computational and other bottlenecks still need to be addressed
September 29, 2008
WALNUT CREEK, CA—Mostly hidden from the scrutiny of the naked eye, microbes have been said to run the world. The challenge is how best to characterize them given that less than one percent of the estimated hundreds of millions of microbial species can be cultured in the laboratory. The answer is metagenomics—an increasingly popular approach for extracting the genomes of uncultured microorganisms and discerning their specific metabolic capabilities directly from environmental samples. Now, some ten years after the term was coined, metagenomics is going mainstream and already paying provocative dividends according to a “Q&A,” News and Views by the U.S. Department of Energy Joint Genome Institute (DOE JGI) microbial ecology program head Philip Hugenholtz and MIT researcher Gene Tyson, published in the September 25, 2008 edition of the journal Nature.
“By employing the techniques of metagenomics we can go beyond the identification of specific players to creating an inventory of the genes in that environment,” said Hugenholtz. “We find that genes occurring more frequently in a particular community seem to confer attributes beneficial for maintenance of the function of that particular ecological niche.”
Hugenholtz and Tyson were part of the team assembled by University of California, Berkeley geochemist Jillian Banfield to investigate microbial communities associated with the acid mine drainage of Iron Mountain in far Northern California in 2004. In the dank recesses of the mine, protected by moon suits from the highly acidic effluent, the researchers scooped up pink biofilm growing on the surface of acid mine drainage streams. Extracting the nucleic acid from the sample and directing DOE JGI’s powerful DNA sequencing resource on them, the Banfield team was able to reconstruct the metabolic profiles of the organisms living under such inhospitable conditions—like putting many Humpty-Dumpties back together again. Their findings, published in Nature 428, 37–43 (01 Feb 2004), showed that reconstructing the genomes of dominant populations from the environment was feasible and that the imprints of evolutionary selection could be discerned in these genomes.
Since this pioneering work, DOE JGI has gone on to characterize many other metagenomes with other newly selected targets in the sequencing queue at the Walnut Creek, Calif. Production Genomics Facility. These range from the hindguts of termites, to plumb for microbes producing cellulose-degrading enzymes, likewise to microbial communities in the cow rumen, foregut of the tammar wallaby, and the crop of the hoatzin, the Amazon stinkbird. Beyond guts, the DOE JGI, through its Community Sequencing Program is enabling metagenomic explorations of Lake Washington near Seattle, Antarctica’s Lake Vostok, and the Great Salt Lake, in addition to the hypersaline mats at Guerrero Negro, Baja California. A video podcast of the Lake Vostok CSP project is featured on the DOE JGI site. Nature features an audio podcast which includes an interview with Hugenholtz on their site.
Responding to the steadily increasing need to manage and interpret the terabases and terabytes of metagenomic data now bubbling up into the public domain, DOE JGI launched the Integrated Microbial Genomes with Microbiome Samples (IMG/M) data management and analysis system, developed in collaboration with Berkeley Lab’s Biological Data Management and Technology Center. IMG/M provides tools for analyzing the functional capability of microbial communities based on the DNA sequence of the metagenome in question.
“Metagenomic tools are becoming more widely available and improving at a steady pace,” said Hugenholtz. “But, there are still computational and other bottlenecks to be addressed, such as the high percentage of uncharacterized genes emerging from metagenomic studies.”
In the Nature piece, Hugenholtz and Tyson go on to cite the emergence of next generation sequencing technologies that are already creating a deluge of data that has outstripped the computational power available to cope with it.
“Nevertheless, it’s not necessary to compare all the data to glean useful biological insights,” Hugenholtz said. “What we can capture will help steer the direction toward a relevant data subset to investigate. At least with metagenomics, we have the environmental genetic blueprints awaiting our interpretation. We are still far from capturing and characterizing the dazzling diversity of the microbial life on earth—but at least we have hit upon the gold standard for scratching the surface.”
The U.S. Department of Energy Joint Genome Institute, supported by the DOE Office of Science, unites the expertise of five national laboratories—Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest—along with the Stanford Human Genome Center to advance genomics in support of the DOE missions related to clean energy generation and environmental characterization and cleanup. DOE JGI’s Walnut Creek, CA, Production Genomics Facility provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges.
The Biological Data Management and Technology Center (BDMTC) at Lawrence Berkeley National Laboratory serves as a source of expertise in and provides support for data management and bioinformatics tool development projects for several organizations in the San Francisco Bay Area. The Center enables collaborating organizations to share experience, expertise, technology, and results across projects, employing industry practices in developing data management systems and bioinformatics tools, while maintaining academic high standards for the underlying data generation, interpretation, and analysis methods and algorithms.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.