Berkeley Lab Researchers Receive $6.3 Million to Tackle Supercomputing Challenges

September 30, 2010

Contact: Linda Vu, lvu@lbl.gov , 510-495-2402

Leading-edge supercomputing systems play a vital role for pushing the boundaries of scientific discovery, from predicting trends in global climate to understanding how supernovae detonate. Although today's fastest systems perform at the "petascale," executing about a quadrillion calculations per second, even this is not fast enough to meet the needs of future mission critical Department of Energy applications in climate and energy-security. Over the last two years, the nation's leading computer scientists held numerous workshops to determine what system requirements will be necessary to advance scientific research in the next decade, and there is nearly universal agreement that supercomputing systems will need to reach "exaflop" performance, a thousand times faster than today's petaflop systems, by 2018.

Over the last four decades, scientists sped up computing systems by doubling the number of transistors on a processor chip every two years. As the number of transistors on a chip climbed, so did the speed and complexity—but not the energy efficiency. Experts predict that an exascale system will utilize as many as a billion processors, which means that power is the leading design constraint for next-generation of supercomputers. Many scientists agree that building an efficient exascale system will require fundamental breakthroughs in hardware technology, programming models, algorithms and software—at both the system and application levels. Because of power constrains, scientists will also have to develop effective designs for data movement and storage.

To tackle these exascale roadblocks, eight research groups at the Lawrence Berkeley National Laboratory (Berkeley Lab) will receive approximately $6.3 million in grants from the Department of Energy's (DOE's) Office of Advanced Scientific Computing Research (ASCR) to investigate critical technologies and architectures for exascale computing, extreme scale scientific data management and analysis, as well as X-Stack software. The X-Stack refers to the scientific software stack that supports extreme scale scientific computing, from operating systems to development environments.

The projects are:

CoDEx: A Hardware/Software Co-Design Environment for the Exascale Era

Research Area: Architectures and Critical Technologies for Exascale Computing
Principal Investigators: John Shalf (Berkeley Lab), Curtis Jansen (Sandia National Laboratory), Dan Quinlan (Lawrence Livermore National Laboratory), Sudhakar Yalamanchili (Georgia Institute of Technology) Contributors: David Donofrio, Alice Koniges, Lenoid Oliker and Samuel Williams of Berkeley Lab. Helgi Adalsteinsson, Damian Dechev and Ali Pinar of Sandia National Laboratories. Chunhua Liao and Thomas Panas of Lawrence Livermore National Laboratory.

Applications and algorithms will need to adapt as node architectures evolve towards exascale computing. The CoDEx (Co-Design for Exascale) project led by NERSC's John Shalf will bring together application and algorithm developers with system modeling and simulation experts to co-design applications, architectures and programming environments. Their goal is to ensure that exascale hardware performance is accompanied by programmability and high performance for scientific applications.

CoDEx is a tightly integrated set of tools and design methodologies that will provide an environment to prototype ideas for future programming models and software infrastructure for exascale machines. Berkeley Lab will provide a highly configurable, cycle-accurate simulation of node architectures, developed through the Green Flash project. Livermore Lab will use the ROSE compiler framework to process existing codes and reconstruct their communication patterns for any scale of parallelism. These extrapolated communication patterns can then be used to test the scalability of future interconnection networks using Sandia's SST/macro event simulator, which can simulate interconnect designs containing millions of endpoints. By designing both software and hardware collaboratively, researchers hope that the tools could be effectively applied to complex high performance computing applications.

Data Movement Dominates: Advanced Memory Technology to Address the Real Exascale Power Problem

Research Area: Architectures and Critical Technologies for Exascale Computing
Principal Investigator:Paul Hargrove and John Shalf of Berkeley Lab, Arun Rodrigues and Richard Murphy of Sandia National Labs. Keren Bergman of Columbia University, Bruce Jacob of the University of Maryland and Denis Resnick of Micron Technologies.
Contributors:Leonid Oliker of Berkeley Lab

As supercomputers grow in size, they use more power moving data than performing calculations. This project will explore how 3D integrated memory systems and silicon photonics interconnects could make data movement more efficient on exascale platforms. The researchers note that these technologies could drive data access costs down from more than 100 picojoules (pJ) per bit to less than 10 pJ per bit. One watt of electricity is equivalent to one joule per second. A picojoule is a million millionth (10^-12 ) of a joule.

In the next few years, this collaboration will produce a set of key results to steer Department of Energy investments in technology and architecture for an exascale system before 2020. In addition to looking at advanced data movement capabilities, the team will also explore support for PGAS (partitioned global address space) and message-driven computation from the perspective of enabling very large address spaces. The collaboration with Micron will ensure future memory subsystems will provide building blocks for large shared address space systems in the exascale time-frame, and provide better integration between local and global memory—a critical power and performance bottleneck both today and in the future.

Bringing Exascale I/O Within Science's Reach: Middleware for Enabling and Simplifying Scientific Access to Extreme Scale Parallel I/O Infrastructure

Research Area: Extreme Scale Scientific Data Management and Analysis
Principal Investigator: Prabhat of Berkeley Lab
Contributors: Wes Bethel and John Wu of Berkeley Lab

Topology-based Visualization and Analysis of Multi-dimensional Data and Time Varying Data at the Extreme Scale

Research Area: Extreme Scale Scientific Data Management and Analysis
Principal Investigator: Gunther H, Weber of Berkeley Lab

The deluge of data generated by petascale simulations of complex physical phenomena like climate and combustion, are already beginning to exceed our capabilities for analyzing them effectively. Although exascale computers will allow scientists to simulate even more complex phenomena in unprecedented detail, they will not be able to effectively gain insights from these results without aggressive improvements in data analysis technology.

This project will bridge the gap between advances in simulation complexity and current data analysis capabilities by developing effective methods for computing multi-dimensional datasets. The team led by Weber, will then integrate these algorithms into standard visualization tools like VisIt or R to make them widely accessible. These tools will be especially useful to researchers who need to extract multi-dimensional features that are not explicitly present in the data, like burning regions in combustion simulations or storm systems in climate applications, and gather quantitative information about them.

Runtime System for I/O Staging in Support of In-Situ Processing
of Extreme Scale Data

Research Area: Extreme Scale Scientific Data Management and Analysis
Principal Investigator: Scott Klasky of Oak Ridge National Laboratory
Contributors: Arie Shoshani and John Wu of Berkeley Lab. Nagiza Samatova, Matthew Wolf and Norbert Podhorszki of OakRidge National Laboratory. Karsten Schwan and Greg Eisenhauer of Georgia Institute of Technology.

Scientists currently spend a tremendous amount of time managing the flow of data generated by supercomputers—from the time that it is generated to the time it is analyzed, and results published. As extreme-scale systems grow in computational power and produce ever-larger datasets, it is becoming apparent that the software infrastructure to efficiently and effectively manage these datasets is largely lacking. This in turn will limit scientific discovery.

This project, led by Scott Klasky, proposes to develop new solutions for managing the avalanche of data expected at the extreme-scale by using tools that can reduce, analyze and index the data while it is still in memory (otherwise known as"in-situ" data processing). The team will partner with applications researchers and combine proven technologies, like ADIOS, FastBit and Parallel R, to create a system that will run in-situ in dedicated "staging" nodes, which are used to not only accelerate input/output speed to external disk, but also to pre-analyze, index, visualize, and reduce the overall amount of information from these solutions. Because the input/output of a system will not speedup significantly as computing power increases, this in-situ data processing solution will go a long way to reduce overall power consumption in addition to speeding up the scientific workflow.

Scalable and Power Efficient Data Analytics for Hybrid Exascale Systems

Research Area: Extreme Scale Scientific Data Management and Analysis
Principal Investigator: Alok Choudhary of Northwestern University
Contributors: John Wu of Berkeley Lab

As scientific data grows in size and complexity, experts predict that current data analytics capabilities may soon become a bottleneck on the road to building an exascale system. As a result, this project proposes to develop a library of exascale functions and software to accelerate data analytics and knowledge discovery for large-scale scientific applications.

Because the execution of many data analysis algorithms is dominated by a small number of kernels, these researchers will develop a comprehensive library of exascale data analysis and mining kernels. To address the fact that HPC systems are becoming inherently heterogeneous, this team will also design algorithms for data analysis kernels accelerated on hybrid multi-note, multi-core, HPC architectures comprised of a mix of graphics processing units (GPUs), field-programmable gate arrays (FPGAs) and solid-state drives (SSDs), as well as develop their scalable implementations. Finally to address the energy challenge to exascale, the team will build on their performance-energy tradeoff analysis framework to enable data analysis kernels, algorithms and software to be parameterized so that users can choose the right power-performance optimizations.

An Open, Integrated Software Stack for Extreme-Scale Computing

Research Area: X-Stack
Principal Investigator: Pete Beckman of Argonne National Laboratory and Jack Dongarra of University of Tennessee
Contributors: Pavan Balaji, Kamil Iskra, Ewing Lusk, Robert Ross, and Rajeev Thakur of Argonne National Lab. Paul Hargrove and Kathy Yelick of Lawrence Berkeley National Lab. Al Geist, Arthur Bernard Maccabe, Jeffrey S. Vetter of Oak Ridge National Lab. Kevin Pedretti, Ron Brightwell and Michael A. Heroux, of Sandia National Labs. Franck Cappello, William Kramer and Marc Snir of University of Illinois Urbana-Champaign. Allen D. Malony and Sameer S. Shende of University of Oregon. James Demmel of University of California at Berkeley. Barbara Chapman of University of Houston. George Bosilca of University of Tennessee. John Mellor-Crummey of Rice University.

Experts widely agree that high performance computing (HPC) software infrastructures on which all scientific applications are built must be revolutionized before any breakthroughs can be achieved on future exascale system. The challenge of developing a software stack that supports extreme scale scientific computing, from operating systems to development environments, within a useful timeframe, will require a level of organization that only a critical mass of scientific software researchers and developers, working in a coordinated way with HPC Centers, vendors, and leading application groups, can accomplish.

To tackle this challenge, this project brings together a national team of software researchers from top Universities, as well as the leading Department of Energy and National Science Foundation computing facilities, to develop an integrated software stack, or X-Stack, for exascale platforms. Members of the collaboration have extensive experience in translating research breakthroughs into useful scientific software that can be implemented at supercomputing centers across the nation, as well as on smaller systems that are important in the application development chain.

Auto-Tuning for Performance and Productivity on Extreme-Scale Computations

Research Area: X-Stack
Principal Investigator: Sam Williams of Berkeley Lab
Contributors: Leonid Oliker of Berkeley Lab, John Gilbert of UC Santa Barbra, Stéphane Ethier of Princeton Plasma Physics Laboratory.

The computing industry is embracing multicore technology as a means of providing ever-increasing peak performance computing. Because there is no consensus on multicore architecture, a plethora of efficient, high-performance multicore architectures have emerged. Unfortunately, the detailed architectural knowledge required to fully exploit these architectures is so prohibitively high that novel programming and optimization tools are required to ensure computational scientists are reaping the benefits of advances in commodity computing technology.

Research shows that automatic performance tuning technologies, or auto-tuning, may be a solution to this problem because it provides performance portability for a few key computation kernels from one generation of architecture to the next. Building on advances in this area, this project will address auto-tuning's two principal limitations: an interface ill-suited to the forthcoming hybrid SPMD (single program multiple data) programming model, and a scope limited to fixed-function numerical routines. To that end, the team will build a series of broadly applicable, auto-tuned efficiency layer components. To address auto-tuning's first limitation, researchers will employ both auto-tuned SPMD computational collectives and concurrent runtimes in a hybrid programming that provides communication via shared memory as well as message passing. This model will allow programmers to gracefully transition from the existing flat MPI programming model to a hybrid-programming model capable of exploiting the full potential of multicore. To address the second limitation, the team will develop a runtime for concurrent operations on discrete data structures (deques, sets, and priority queues) and extend the sparse collectives and reduction runtimes to operate on non-numeric data via alternate semirings.

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.