InTheLoop | 06.01.2010
June 1, 2010
Shalf and Strohmaier Chairing Sessions at ISC; TOP500 List Released
John Shalf of NERSC and Erich Strohmaier of CRD are chairing sessions at the International Supercomputing Conference (ISC), which is being held from May 30 to June 3 in Hamburg, Germany. Shalf will chair a session on “HPC: Future Technology Building Blocks,” and is featured in Monday’s ISC’10 Video Blog. Strohmaier will chair a session on “Focusing LINPACK: The TOP500 Yardstick” and will co-chair a “Hot Seat Session.”
Associate Lab Director Horst Simon is on the ISC Advisory Board and Award Committee; Shalf is on the Program Committee; Strohmaier is on the Scientific Program Committee; and Strohmaier and Simon are co-authors of the TOP500 List, along with Hans Meuer of the University of Mannheim and Jack Dongarra of the University of Tennessee.
The June TOP500 List was officially released at ISC yesterday, with two Chinese systems in the TOP10 and 24 in the TOP500; China is now tied with Germany in fourth place for the number of systems on the list, behind the USA, UK, and France. NERSC’s Franklin system is now ranked No. 17. The BBC website offers an interactive treemap visualization of the TOP500. The graphic allows you to visualize the list by the speed of each machine, the operating systems used, what it is used for, the country where it is based, the maker of the silicon chips used to build the machine, and the manufacturer of the supercomputer.
Wehner Named Lead Author of Chapter in Next IPCC Climate Report
Michael Wehner, a climate researcher in Berkeley Lab’s Computational Research Division who specializes in extreme weather changes, has been invited to serve as a lead author of Chapter 12, “Long-Term Climate Change: Projections, Commitments and Irreversibility,” in the Working Group I contribution to the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC). The AR5 Synthesis Report is scheduled to be published in September 2014.
Wehner’s selection was approved during the May 19-20 meeting of the Bureau of the Intergovernmental Panel on Climate Change in Geneva. Wehner was a contributing author of the IPCC AR4 report, for which the IPCC was honored with the 2007 Nobel Peace Prize.
NERSC Authors Win Best Paper Award at Cray User Group Meeting
The Best Paper award at CUG2010, the Cray User Group meeting held May 24–27 at the University of Edinburgh, Scotland, went to “Application Acceleration on Current and Future Cray Platforms” written by Alice Koniges, Robert Preissl, and Jihan Kim of NERSC; David Eder, Aaron Fisher, Nathan Masters, and Velimir Mlaker of Lawrence Livermore National Laboratory; Stephane Ethier and Weixing Wang of Princeton Plasma Physics Laboratory; Martin Head-Gordon of UC Berkeley and LBNL’s Chemical Sciences Division; and Nathan Wichmann of Cray Inc. This paper examines three different applications and means for improving their performance with a particular emphasis on methods that are applicable for many/multicore and future architectural designs.
One of two runner-ups for Best Paper was “Analyzing the Effect of Different Programming Models upon Performance and Memory Usage on Cray XT5 Platforms” by Hongzhang Shan of Berkeley Lab/CRD; Haoqiang Jin of NASA Ames; Karl Fuerlinger of UC Berkeley; and Alice Koniges and Nicholas J. Wright of NERSC. This paper looks at memory requirements and performance of the NAS parallel benchmarks in various languages including UPC and hybrid models. Jaguar and Hopper performance are also compared and discussed.
Microsoft Research Video Highlights ACS Work on Digital Watershed
Every year, Microsoft Research organizes the Silicon Valley TechFair to highlight the company’s work with the research community. At this year’s event on May 6, the Berkeley Water Center was highlighted both in the opening address and in a video produced for the event. The video focuses on the “digital watershed,” a project between Microsoft Research, Berkeley Lab’s Advanced Computing for Science (ACS) Department, and UC Berkeley to develop better tools for storing and accessing data on water resources. Watch the video, which features ACS Department Head Deb Agarwal.
CS Staff Contribute to ACM SIGMOD/PODS Conference
Several CRD and NERSC staff are coauthors of papers that will be presented at the 2010 ACM SIGMOD/PODS Conference in Indianapolis, Indiana, on June 6–11, 2010. The annual ACM SIGMOD/PODS conference (Special Interest Group on Management of Data and Principles of Database Systems) is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. CS contributors include:
- David Patterson: “PIQL: A Performance Insightful Query Language” (with Michael Armbrust, Stephen Tu, Armando Fox, Michael Franklin, Nick Lanham, Beth Trushkowsky, and Jesse Trutna, all of UC Berkeley)
- David Patterson: “Characterizing, Modeling, and Generating Workload Spikes for Stateful Services” (with Peter Bodik, Armando Fox, Michael Franklin, and Michael Jordan, all of UC Berkeley)
- Lavanya Ramakrishnan, Keith Jackson, Shane Canon, Shreyas Cholia, and John Shalf: “Defining Future Platform Requirements for e-Science Clouds” (position paper)
Algorithms for Massive Data Sets Will Be Workshop Topic
The Workshop on Algorithms for Modern Massive Data Sets (MMDS 2010) will be held June 15–18 at Stanford University. The goals of this series of workshops are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly structured scientific and internet data sets, and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote cross-fertilization of ideas. Go here for registration information. Jim Demmel of UC Berkeley and CRD will be one of the speakers.
Course Offered on Algorithms for Many-Core Processors
The Virtual School of Computational Science and Engineering is offering a five-day course on using multicore devices for scientific computing, scaling parallel code to tens of thousands of CPU cores, handling large data volumes, and more. The course, “Proven Algorithmic Techniques for Many-Core Processors,” will be held simultaneously at multiple locations across the country using high-definition videoconferencing technology, on August 2–6, 2010. The Oakland Scientific Facility will be one of those locations, thanks to local organizers Hemant Shukla and John Shalf of NERSC. Register online or send an email with your name, affiliation, email and phone to Hemant Shukla.
This Week’s Computing Sciences Seminars
Wavelet Approximation of GRID Fields in Chemoinformatics
Wednesday, June 2, 10:00–11:00 am, 50B-2222
Richard Martin, University of Sheffield, UK
The interactions which a small molecule can make with a receptor can be modelled using three-dimensional molecular fields, such as GRID fields (Goodford, 1985), however, the cumbersome nature of these fields makes their storage and comparison computationally expensive. Wavelets are a family of multiresolution signal analysis functions which have become widely used for information compression (Sundling et al., 2006). We have applied the non-standard wavelet transform to generate low-resolution approximations (wavelet thumbnails) of finely sampled GRID fields. We show that wavelet thumbnails with significantly reduced storage requirements provide near identical results to the original data in similarity searching and virtual screening experiments based on pre-aligned molecules. We have also applied the wavelet techniques to generate compressed GRIDs for 3D QSAR analysis, again based on pre-aligned molecules.
More recently, we have developed a method for aligning wavelet thumbnails to enable the comparison of GRID fields in arbitrary orientation. Clique detection is applied to extrema in the wavelet thumbnails and used to map one thumbnail onto another. The aligned thumbnails are then scored using the Tanimoto coefficient applied to the entire thumbnails. We have compared our method to ROCS ComboScore (Grant et al., 1996) which uses an atom-centric approach to representing molecules, rather than the receptor-centric approach in GRID. While ROCS remains by far the faster approach, our method demonstrates comparable virtual screening performance and we show that the different nature of the underlying representations leads to complementary behaviour in the retrieval of actives.
Parallel de Novo Assembly of Short Read Metagenomic Data
Thursday, June 3, 10:00–11:00 am, 50B-2222
Kamesh Madduri, LBNL/CRD
In this talk, I will present a new parallel method for de novo assembly of large-scale genomic data on multicore clusters. This new method belongs to the family of Eulerian path-based approaches to genome assembly, and involves construction, traversal, and simplification of a large string graph. The building blocks of these string graphs (also known as de Bruijn graphs) are words with k nucleotides (or kmers). I will discuss parallelization of various steps of the assembly process, including ingestion of short read data, creating a kmer spectrum, initial graph construction, simplification, and error correction.
Analysis and assembly of metagenomic data is challenging due to the typical uneven representation of organisms within the metagenome, and polymorphism between related member organisms, leading to uneven coverage. I will highlight metagenome-specific algorithmic changes to speed up the memory-intensive graph construction step, and present parallel performance results for assembling a 20G bp data set.
Link of the Week: Why Computers Crash But We Don’t
Yale University researchers have described why computers tend to malfunction more than living organisms by analyzing the control networks in both an E-coli bacterium and the Linux operating system. Both systems are arranged in hierarchies, but with some key differences in how they achieve operational efficiencies. The molecular networks in the bacteria are arranged in a pyramid, with a limited number of master regulator genes at the top that control a wide base of specialized functions. The Linux operating system is set up more like an inverted pyramid, with many different top-level routines controlling a few generic functions at the bottom. This organization arises because software engineers tend to save money and time by building on existing routines rather than starting systems from scratch, says Yale professor Mark Gerstein. “But it also means the operating system is more vulnerable to breakdowns because even simple updates to a generic routine can be very disruptive,” Gerstein says. Read more.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.