A-Z Index | Phone Book | Careers

InTheLoop | 09.29.2008

The weekly newsletter for Berkeley Lab Computing Sciences employees

September 29, 2008

JGI Expects Growth Spurt in Metagenomics

Some ten years after the term was coined, metagenomics is going mainstream and already paying provocative dividends according to a “Q&A,” News and Views by the U.S. Department of Energy Joint Genome Institute (DOE JGI) microbial ecology program head Philip Hugenholtz and MIT researcher Gene Tyson, published in the September 25 edition of the journal Nature.

“By employing the techniques of metagenomics we can go beyond the identification of specific players to creating an inventory of the genes in that environment,” said Hugenholtz. “We find that genes occurring more frequently in a particular community seem to confer attributes beneficial for maintenance of the function of that particular ecological niche.”

Responding to the steadily increasing need to manage and interpret the terabases and terabytes of metagenomic data now bubbling up into the public domain, DOE JGI launched the Integrated Microbial Genomes with Microbiome Samples (IMG/M) data management and analysis system, developed in collaboration with Berkeley Lab’s Biological Data Management and Technology Center. IMG/M provides tools for analyzing the functional capability of microbial communities based on the DNA sequence of the metagenome in question.

“Metagenomic tools are becoming more widely available and improving at a steady pace,” said Hugenholtz. “But, there are still computational and other bottlenecks to be addressed, such as the high percentage of uncharacterized genes emerging from metagenomic studies.” In the Nature piece, Hugenholtz and Tyson go on to cite the emergence of next generation sequencing technologies that are already creating a deluge of data that has outstripped the computational power available to cope with it.

NERSC Users Group Will Meet on Thursday and Friday

The NERSC Users Group (NUG) will meet Thursday and Friday, October 2–3, in Room 238 at the Oakland Scientific Facility. The meeting will also be available by web conference over the Internet and via teleconference. Both onsite and remote participants must pre-register at Nersc Users Group Meetings October 2-3, 2008.

Thursday will be training day with the following topics:

  • Franklin Quad Core Update/Differences — Helen He
  • File Transfer Best Practices — David Turner
  • Enabling Grid File Transfers: The NERSC CA — Shreyas Cholia
  • Franklin IO: Systems Overview — Richard Gerber
  • Franklin IO: Best Practices for Application Performance — Katie Antypas
  • Accelerating X Windows with NX — Janet Jacobsen
  • Franklin Profiling and Performance Tools — Jonathan Carter
  • Debugging with DDT — David Lecomber, Allinea Software
  • DDT and Tools Hands-On (Continues all day Friday)

The business meeting will be held on Friday:

  • Welcome, logistics, and intro of new NUGEX members — Stephane Ethier, PPPL
  • Science at NERSC — Kathy Yelick
  • DOE Update — Yukiko Sekine, DOE
  • NERSC Global Filesystem Future Directions — Shane Canon
  • Cray XT4 Franklin Update — Bill Kramer
  • NERSC Workload Analysis — Harvey Wasserman
  • NERSC I/O Analysis — Andrew Uselton
  • Defining Best Practices for Network Tuning — Brent Draney
  • Grid Services at NERSC — Shreyas Cholia
  • NERSC-6 Procurement Update — Bill Kramer
  • 2007/2008 User Survey Results — Francesca Verdier
  • NERSC Web Services: Gathering Feedback and Future Directions — Jen Jasper
  • Q&A, open discussions — NUG Members, chaired by Stephane Ethier
  • Machine room tour — Howard Walter

Cloud Computing Seminar Series Starts Wednesday

Keith Jackson of the Data Intensive Systems Group in CRD’s Advanced Computing for Science Department has initiated a Cloud Computing Seminar Series which starts this Wednesday, October 1, between 10 and 11 a.m. in 50B-2222 with a video link to OSF Room 254. The seminar group will meet on the first Wednesday of each month.

This seminar group is meant to share experiences with, and spur discussion of, the use of cloud computing and software as a service in a scientific environment. At the initial meeting, Keith will give a brief presentation on his goals in organizing this activity and then open the floor up to a discussion of goals and suggested topics. Most meetings after that will consist of a presentation by someone on their experiences with a particular technology or tool, e.g., Amazon EC2, Google App Engine, Eucalyptus, NanoHub, etc., and then an open discussion of how these technologies might be of use in our environment.

To join the mailing list.

Retirement Party for Frank Hale Tomorrow

Please join us at Luka’s Tap Room at noon tomorrow, September 30, to wish Frank Hale all the best for his retirement. Frank got his first job as a computing consultant in 1975 and was the first person hired in the User Services Group when NERSC moved to Berkeley in 1996. He took a year-and-a-half hiatus from User Services to be Computing Sciences’ liaison to DOE. In addition to regular stints as a consultant for NERSC users, Frank is also responsible for much of the third-party software supporting NERSC’s user community. Before joining NERSC, Frank spent 11 years with Berkeley Lab’s Earth Sciences Division.

Please RSVP to Jonathan Carter by midday today. Luka's is located at 2221 Broadway, Oakland, at the corner of West Grand and Broadway.

ESnet Welcomes New UNIX System Administrator

From ESnet Department Head Steve Cotter:

In case you were wondering who that other person was roaming the halls of Building 50 looking lost (other than me), it was our new hire Deb Heller-Evans. She’s joining Stan’s team as a UNIX system administrator and will be residing in Don’s old office [50A-3113]. Deb brings a wealth of experience with her from her time at LLNL and in the commercial sector at Bank of America, Teradyne and Silicon Graphics . If you get a chance, swing by her desk over the next couple days and introduce yourself, take her to lunch or walk her over to the cafeteria and treat her to a cup of coffee.

Deb — as I said when we met earlier today, we’re excited to have you on board. You’re joining a great team!

Volunteers Needed for Two Job Fairs, October 2 and 7

Bernadette Cu-Todd, the Senior Recruiting Consultant for Computing Sciences, needs volunteers to help staff the booth at two upcoming job fairs:

  • Thursday, October 2: Masters and PhDs Job Fair at UC Berkeley, noon–4 p.m.
  • Tuesday, October 7: Stanford University Fall Career Fair, 11 a.m.–3 p.m.

For more information or to volunteer, please email Bernadette at BCu-Todd@lbl.gov.

UC Berkeley Workshop on Using Python for Scientific Computing

Fernando Perez, a research scientist at the Helen Wills Neuroscience Institute at U.C. Berkeley, will be holding a two-day introductory workshop covering the use of the Python programming language for scientific computing on October 9 and October 16, 2008. The workshop is targeted at the level of a graduate student in engineering or the sciences. A working knowledge of basic programming is assumed, as well as familiarity with calculus, basic linear algebra, FFTs, and other similar topics. Participation is limited to 20 people. For more information, see Python Fundamentals.

This Week’s Seminar Schedule

Wednesday, October 1, 2:00–3:30 p.m., Berkeley Wireless Research Center

2108 Allston Way, Suite 200, Berkeley
The IBM Global Technology Outlook and the 50 Billion Transistor Challenge
Michael Rosenfield, Director, VLSI Systems, IBM Research

The Global Technology Outlook (GTO) examines in great depth the current trajectories of new technologies in the lab and marketplace, concentrating on trends that could be disruptive or the harbingers of change. It has proved remarkably prescient and has allowed IBM to make sound decisions and investments in future technology directions. It also has sought to anticipate some effects those technology trends might have on specific industries. I will present key portions of the 2008 and prior GTOs covering technology and systems trends, new methodologies for chip design, application-optimized systems, and a brief overview of the 2008 GTO, in the context of a workshop recently held at IBM Research: “The 50 Billion Transistor Challenge.”

Silicon CMOS technology is now widely expected to scale to at least 11 nm and possibly beyond, resulting in the availability of over 50 billion transistors on a reasonably sized chip for enterprise systems. What will we do with 50 billion transistors on a chip and how will we design chips with this many transistors?

Thursday, October 2, 11 a.m.–12:30 p.m., 521 Cory Hall (Hogan Room), UCB

ParLab Seminar
Spiral: Generating Software and Hardware Implementations for Linear Transforms
Franz Franchetti, Carnegie Mellon University

Spiral is a program and hardware design generation system for linear transforms such as the discrete Fourier transform, discrete cosine transforms, filters, and others. For a user-selected transform, Spiral autonomously generates different algorithms, represented in a declarative form as mathematical formulas, and their implementations to find the best match to the given target platform. Besides the search, Spiral performs deterministic optimizations on the formula level, effectively restructuring the code in ways unpractical at the code or design level.

In this talk, we give a short overview on Spiral. We explain then how Spiral generates efficient programs for parallel platforms including vector architectures, shared and distributed memory platforms, and GPUs; as well as hardware designs (Verilog) and automatically partitioned software/hardware implementations. As all optimizations in Spiral, parallelization and partitioning are performed on a high abstraction level of algorithm representation, using rewriting systems. We also discuss how Spiral is currently extended beyond its original problem domain, using coding algorithms (Viterbi decoding and JPEG 2000 encoding) and image formation (synthetic aperture radar, SAR) as examples.

Friday, October 3, 10:30 a.m. to noon, Building 50 Auditorium

Earth Sciences Division Distinguished Scientist Seminar Series
Numerical Methods for Large-Scale Experimental Design
Eldad Haber, Emory University

While experimental design for well-posed inverse linear problems has been well studied, covering a vast range of well-established design criteria and optimization algorithms, its ill-posed counterpart is a rather new topic. The ill-posed nature of the problem entails the incorporation of regularization techniques. The consequent nonstochastic error introduced by regularization needs to be taken into account when choosing an experimental design. We discuss different ways to define an optimal design that controls both an average total error of regularized estimates and a measure of the total cost for the design. We also introduce a numerical framework that efficiently implements such designs and natively allows for the solution of large-scale problems. To illustrate the possible applications of the methodology, we consider a borehole tomography example and a two-dimensional function recovery problem.

Friday, October 3, 12:30–1:30 p.m., Building 50 Auditorium

Environmental Energy Technologies Division Distinguished Lecturer Series
Extrapolate the Past ... Or Invent the Future
Vinod Khosla, Khosla Ventures

Vinod Khosla is the founder of Khosla Ventures, whose mission is to “assist great entrepreneurs determined to build companies with lasting significance.” Khosla was a co-founder of Daisy Systems and founding Chief Executive Officer of Sun Microsystems, where he pioneered open systems and commercial RISC processors.

Sun was funded by Kleiner Perkins, and in 1986 Vinod switched sides and joined Kleiner Perkins Caufield & Byers (KPCB). In 2004, driven by the need for flexibility and a desire to be more experimental, to fund sometimes imprudent “science experiments,” and to take on both “for profit” and “social impact” ventures, he formed Khosla Ventures. Khosla Ventures focuses on both traditional venture capital technology investments and clean technology ventures. Social ventures include affordable housing and microfinance, among others.

Link of the Week: The Limits of Statistics

When Nassim Taleb talks about the limits of statistics, he becomes outraged. “My outrage,” he says, “is aimed at the scientist-charlatan putting society at risk using statistical methods. This is similar to iatrogenics, the study of the doctor putting the patient at risk.” Taleb recently wrote an essay titled “The Fourth Quadrant: A Map of the Limits of Statistics” for the online journal Edge. The essay begins:

Statistical and applied probabilistic knowledge is the core of knowledge; statistics is what tells you if something is true, false, or merely anecdotal; it is the “logic of science”; it is the instrument of risk-taking; it is the applied tools of epistemology; you can’t be a modern intellectual and not think probabilistically—but... let’s not be suckers. The problem is much more complicated than it seems to the casual, mechanistic user who picked it up in graduate school. Statistics can fool you. In fact it is fooling your government right now. It can even bankrupt the system (let’s face it: use of probabilistic methods for the estimation of risks did just blow up the banking system).

But the good news is, “We can identify where the danger zone is located, which I call ‘the fourth quadrant’, and show it on a map with more or less clear boundaries.” Taleb is Distinguished Professor of Risk Engineering at New York University’s Polytechnic Institute and is the author of last year’s bestseller The Black Swan: The Impact of the Highly Improbable.

About Computing Sciences at Berkeley Lab

The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.

ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.

Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.