Berkeley Lab Helps Develop Software for Exascale Supercomputers

September 28, 2012

Jon Bashor, Jbashor@lbl.gov, 510-486-5849

With over 20 petascale supercomputers – systems capable of performing quadrillions of operations per second – installed worldwide, scientists now have their eye on the next major milestone in machine performance. Expected to become available in the next decade, exascale supercomputers will be 1,000 times faster than today’s petascale machines.

But achieving this scale of computing capability will require a closer partnership between supercomputer industry leaders and the research institutions to overcome a number of hurdles, starting with the processors. In the past, major performance gains in supercomputers came from a combination of faster processors and more of them, but designers are unable to make significantly faster processors today because they would be too hot and too power-hungry. This means exascale systems will need a lot more computations going on in parallel – roughly a billion at any given instant. Added to that mind-boggling amount of parallelism, programmers may also have to worry about a mix of processors types in a single “heterogeneous” system, more complicated memory organization, and an increasing rate of hardware failures.

To scientific programmers, the question is not whether an exascale system can be built, but whether it can be programmed. With complex applications involving millions of lines of software, the task of rewriting these to run on an exascale system is daunting. And they are not sure where to start, since there is no agreement on how exascale systems will be programmed.

The U.S. Department of Energy’s (DOE) Advanced Scientific Computing Research program has therefore funded a set of projects to explore alternatives in the layers of software that will be used on these systems. This exascale software stack (X-Stack) includes programming languages and libraries, compilers, runtime systems and tools to help programmers handle massive parallelism, data movement, heterogeneity, and failures. And the goal is to make exascale programming convenient.

Computer scientists at Lawrence Berkeley National Laboratory will contribute their expertise to three X-Stack projects and are collaborating with UC Berkeley on a fourth.

DEGAS: Dynamic, Exascale Global Address Space Programming Environments. Led by LBNL Associate Lab Director for Computing Sciences Kathy Yelick, the DEGAS project will develop an integrated set of programming models, runtime systems and tools for exascale systems. The DEGAS team is a joint California/Texas effort, with partners at Rice University, the University of Texas at Austin, UC Berkeley and Lawrence Livermore National Laboratory. The project builds on the team’s work in programming languages like UPC and Co-Array Fortran, which use a global address space notion on both shared and distributed memory platforms – one can directly access remote data structures without involving the remote program. This “never having to say receive” model makes it convenient for building large complex data structures and for optimizing data movement. The new themes in DEGAS are hierarchical programming and dynamic control. Thinking hierarchically will make massive parallelism and data movement more manageable and will better reflect the reality of exascale systems. Both programming features to control data movement and compiler transformations to optimize communication are part of the project. While petascale systems are mostly programmed in a static manner, with work divided up among processors and only rebalanced when the computational costs diverge, DEGAS’s runtime system will be much more dynamic to allow for processors that run at different speeds. Integrated throughout the DEGAS software stack is support for fault resilience based on the notion of “containment domains” that allow programmers to dial in their desired level of fault resilience. To encourage application scientists to try out these ideas, DEGAS will be designed to allow for interoperability with existing programming models, including MPI, but will also be exploring some work in high level programming models.

DEGAS is one of DOE’s major X-Stack centers, funded at just under $10 million spread over three years across the five institutions. In addition to Yelick, there are many Berkeley Lab contributors, including technical area leads Paul Hargrove, Steven Hofmeyr, Costin Iancu, Eric Roman, and Erich Strohmaier, all of the Future Technologies Group in the Computational Research Division. James Demmel and Krste Asanović (both with joint appointments at Berkeley Lab) lead the UC Berkeley team, and Dan Quinlan the Livermore team. In Texas, Vivek Sarkar and John Mellor-Crummey co-lead the Rice project, and Mattan Erez leads at UT Austin.

X-Tune: Autotuning for Exascale: Self-Tuning Software to Manage Heterogeneity. The X-Tune project focuses in on the specific problem of generating highly tuned code for individual nodes (containing processor and memory) in an exacale system. This is where the most disruption will occur on the path to exascale. Historically, very important computational kernels were hand-written, sometimes in assembly language, and carefully tuned over a period of months or even years. Berkeley researchers therefore developed the idea of autotuning – generating many versions of each kernel, often based on a common pattern, and then having a computer try them out to see which is faster. Autotuning replaces programmer time by computer time, and the approach has proven successful on petascale systems to optimize the supercomputer’s performance by selecting the implementation that best meets the optimization criteria (performance, power, or some combination of these). The X-Tune project will take this idea one step further by integrating autotuning into a compiler, further automating the process of generating the code instances and making it easier for future exascale programmers to get highly tuned performance out of their codes.

The project is a collaboration between the University of Utah, Berkeley Lab, the University of Southern California, and Argonne National Lab. Led by Mary Hall of the University of Utah, the team includes Berkeley Lab’s Sam Williams (LBNL institutional lead), Brian van Straalen and Lenny Oliker.

XPRESS: eXascale Programming Environment and System Software. XPRESS is developing an X-Stack system called OpenX, which is designed around the idea that exascale machines will be highly dynamic. It uses a global addresss space, like DEGAS, but emphasizes remote function invocation rather than reads and writes. As a result, the execution unfolds in dynamic and unpredictable ways as computations migrate toward the data. The project also includes a lightweight operating system that supports the necessary functions with very low overhead, so that dynamic parallelism can be unleashed. The goal is to achieve full adaptive management of computing resources and task scheduling through instrumented control of the runtime system. Berkeley Lab’s component of the XPRESS project will target plasma physics application needs for fusion energy. The ultimate goal is to demonstrate the impact on selected applications and illustrate a path forward for others. The focus will be on highly scalable codes, including code extracted from a variety of applications using the particle method.

XPRESS will be led by Ron Brightwell of Sandia National Laboratories. The local PI from Berkeley Lab is Alice Koniges of the National Energy Research Scientific Computing Division.

CORVETTE: Program Correctness, Verification, and Testing for Exascale. The CORVETTE project is located at UC Berkeley, and is addressing the problem of how to make sure exascale applications are working correctly. Not surprisingly, with a billion-way parallel program there are many opportunities for the parallel threads to interfere with one another and produce “data races” that may intermittently cause the program to crash or give incorrect results. CORVETTE is developing techniques to find parallelism bugs in the code by forcing particular executions on a small number of processors, so that one doesn’t need to use an exascale machine to debug an exascale application. The project will use some of the same tools to solve a set of problems that arise in floating point programs: First, in order to save time and memory, the team would like each part of a code to use the lowest precision needed to get the right answer; one of their tools will automate this process. Second, dynamic scheduling and nonassociativity of floating point can lead to different results each time a program is run, even on the same machine; so the team’s tools will identify sources of this nonreproducibility, and they will design new algorithms that are reproducible.

CORVETTE will be led by Koushik Sen at UC Berkeley, and involves joint faculty scientist Jim Demmel and former graduate student Chang-Seo Park, as well as collaborations with Berkeley Lab’s Costin Iancu.

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.