InTheLoop | 05.07.2012
May 7, 2012
A 100-Gigbit Highway for Science: Researchers Take a “Test Drive” on ANI Testbed
With $62 million in funding from the Recovery Act, ESnet built a 100 Gbps long-haul prototype network and a wide-area testbed. So far more than 25 groups have taken advantage of ESnet’s wide-area testbed, which is open to researchers from government agencies and private industry, to test new, potentially disruptive network technologies without interfering with other network traffic. Here are some of their stories. Read More.
Visualizing How Space Weather Cracks Earth’s Cocoon
Earth is mostly protected from solar radiation by the magnetosphere. But sometimes the magnetosphere “cracks,” allowing radiation to seep in and wreak havoc on electronics. This phenomenon is not well understood, so scientists ran simulations to investigate what happens. In the process, they generated approximately 3 petabytes of data, and reached out to Berkeley Lab’s Burlen Loring to develop customized techniques for analyzing this data. Read more.
NERSC Announces Data Intensive Computing Pilot Program Awards
NERSC’s new data-intensive science pilot program is aimed at helping scientists capture, analyze, and store the increasing stream of scientific data coming out of experiments, simulations and instruments. Those selected for the pilot program will get access to large data stores, priority access to a 6 terabyte flash-based file system, and priority access to Hadoop-style computing resources on NERSC’s Carver Infiniband cluster. They may also use NERSC’s Science Gateways for web access.
The first awards in this pilot program were made to the following eight projects:
- High Throughput Computational Screening of Energy Materials
- Analysis and Serving of Data from Large-Scale Cosmological Simulations
- Interactive Real-Time Analysis of Hybrid Kinetic-MHD Simulations with NIMROD
- Next-Generation Genome-Scale In Silico Modeling: The Unification of Metabolism, Macromolecular Synthesis, and Gene Expression Regulation
- Data Processing for the Dayabay Reactor Neutrino Experiment’s Search for Theta13
- Transforming X-Ray Science toward Data-Centrism
- Data Globe
- Integrating Compression with Parallel I/O for Ultra-Large Climate Data Sets
John Bell Is Elected to National Academy of Sciences
John Bell, an applied mathematician and computational scientist who leads the Center for Computational Sciences and Engineering and the Mathematics and Computational Science Department at Berkeley Lab, has been elected to the National Academy of Sciences. He is one of only two mathematicians on this year’s list. Read more.
After Five Years, NERSC’s Franklin Cray XT4 System Retires
Last week, NERSC retired one of its most scientifically prolific supercomputers to date—a Cray XT4 named Franklin, in honor of the United States’ pioneering scientist Benjamin Franklin. Over its five-year lifetime, Franklin has delivered 1.18 billion processor hours to scientific research. Read more.
UC Berkeley Gets $60 Million for Theoretical Computing Institute
A groundbreaking $60 million award to UC Berkeley from the Simons Foundation will establish the campus as the worldwide center for theoretical computer science. The gift funds the creation of a new institute where top computer theorists and researchers from around the globe will converge to explore the mathematical foundations of computer science and extend them to tackle challenges in fields as diverse as mathematics, health care, climate modeling, astrophysics, genetics, economics, and business.
The Simons Institute for the Theory of Computing, with Richard Karp as founding director, will begin operations in July 2012, and its first scientific programs will start in January 2013. Later that year, the institute will move into its home in Calvin Hall. Collaborators with the Simons Institute will include Berkeley Lab and NERSC. Read the UC Berkeley press release.
Bike to Work Day This Thursday, May 10
Bike to Work Day is coming up on Thursday, May 10, 2012. Cycling is a great way to stay healthy and reduce your energy consumption. For Bike to Work Day, energizer stations will be open throughout the East Bay, offering snacks and musette bags to cyclists.
If you find a traffic or road hazard, such as a pothole that needs repair, you can report it on the East Bay Bicycle Coalition (EBBC) website: http://www.ebbc.org/?q=hazards_map. The EBBC will report the hazard to the appropriate authorities and track it until it is corrected.
And more locally, the Lab’s own Bicycle Coalition web page offers tips for Lab cyclists, from locations of shower facilities (50B, 3rd floor) to air pumps (there is a bicycle pump at the lower level of B65).
Person or Computer: Could You Pass the Turing Test?
In this commentary, David Bailey, head of the Complex Systems Group in Berkeley Lab’s Computational Research Division, and University of Newcastle mathematics professor Jon Borwein discuss mathematician Alan Turing’s contributions to computing. Read more.
This Week’s Computing Sciences Seminars
Address Splitting Network Architectures
Monday, May 7, 2:00–3:00 pm, 521 Cory Hall, UC Berkeley
Jordi Domingo-Pascual, Universitat Politècnica de Catalunya/BarcelonaTech
The talk will introduce the main concepts of address splitting, using different address spaces. Currently IP addresses are overloaded with several semantics, being used as network point of attachment, name of the device, as part of the connection identification and even as user identification in some applications. There is a common understanding among researchers that it would be desirable to split the network-related functions from the user-related ones. Part of the research on future internet architectures propose using a Locator as the identifier for routing-related functions and the Endpoint Identifier (or device identifier) for user-related (or endpoint-related) functions. This means defining at least two different address spaces: one for the network and the other for the host. Among several network architectures based on the address splitting paradigm LISP (Locator/ID Separation Protocol) is designed as an evolutionary approach for the Internet Network Architecture. The talk will introduce some of the architectures based on address splitting and present in more detail the LISP architecture, the foreseen deployment scenarios and some performance evaluation aspects for the new entities needed for supporting the new architecture.
Dissertation Talk: Resource Allocation and Scheduling in Heterogeneous Cloud Environments
Monday, May 7, 3:00–4:00 pm, 465H Soda Hall (Rad Lab), UC Berkeley
Gunho Lee, UC Berkeley CS Division
Recently, there has been a dramatic increase in the popularity of cloud computing systems that rent computing resources on-demand, bill on a pay-as-you-go basis, and multiplex many users on the same physical infrastructure. These cloud computing environments provide an illusion of infinite computing resources to cloud users so that they can increase or decrease their resource consumption rate according to the demands.
At the same time, the cloud environment poses a number of challenges. Two players in cloud computing environments, cloud providers and cloud users, pursue different goals; providers want to maximize revenue by achieving high resource utilization, while users want to minimize expenses while meeting their performance requirements. However, it is difficult to allocate resources in a mutually optimal way due to the lack of information sharing between them. Moreover, ever-increasing heterogeneity and variability of the environment poses even harder challenges for both parties.
In this thesis, we address “the cloud resource management problem,” which is to allocate and schedule computing resources in a way that providers achieve high resource utilization and users meet their applications’ performance requirements with minimum expenditure.
We approach the problem from various aspects, using MapReduce as our target application. From provider’s perspective, we propose a topology-aware resource placement solution to overcome the lack of information sharing between providers and users. From user’s point of view, we present a resource allocation scheme to maintain a pool of leased resources in a cost-effective way and a progress share-based job scheduling algorithm that achieves high performance and fairness simultaneously in a heterogeneous cloud environment. To deal with variability in resource capacity and application performance in the Cloud, we develop a method to predict the job completion time distribution that is applicable to making sophisticated trade-off decisions in resource allocation and scheduling. Our evaluation shows that these methods can improve efficiency and effectiveness of cloud computing systems.
Dissertation Talk: Tuning Hardware and Software for Multiprocessors
Tuesday, May 8, 11:00 am–12:00 pm, 310 Soda Hall, UC Berkeley
Marghoob Mohiyuddin, UC Berkeley EECS
Technology scaling trends have enabled the exponential growth of computing power. However, the performance of the memory and the interconnect subsystems scales less aggressively. This means that unless the software stack doesn’t have memory/interconnect performance as the bottleneck, system performance will lag behind the raw computing power. This problem can be alleviated if algorithms/software minimize/avoid communication. To this end, we describe algorithms as well as implementations of a communication-avoiding linear algebra kernel called “matrix powers.” Results show up to 2.3× speedups over the naive algorithms on modern architectures.
Another problem plaguing the supercomputer industry is the power bottleneck—power has, in fact, become the preeminent design constraint for future HPC systems, computational efficiency is being emphasized over simply peak performance. Static benchmark codes have traditionally been used to find architectures optimal with respect to specific metrics. Unfortunately, because compilers generate sub-optimal code, benchmark performance can be a poor indicator of the performance potential of architecture design points. Therefore, we present hardware/software co-tuning as a novel approach for system design, in which traditional architecture space exploration is tightly coupled with software auto-tuning for delivering substantial improvements in area and power efficiency.
We demonstrate the proposed methodology by exploring the parameter space of a Tensilica-based multi-processor running three of the most heavily used kernels in scientific computing, each with widely varying micro-architectural requirements: sparse matrix vector multiplication, stencil-based computations, and general matrix-matrix multiplication. Results demonstrate that co-tuning significantly improves hardware area and energy efficiency—a key driver for next generation of HPC system design.
In this talk, we present an overview of the key results from the dissertation. We show how communication-avoiding algorithms and their tuning can improve performance significantly for sparse codes. We also show that hardware and software, when co-designed for specific scientific computations, can yield significant improvements in energy efficiency.
OSF Brown Bag: Introduction to the Genepool System for NERSC Staff
Tuesday, May 8, 12:00–1:30 pm, OSF 943-238
Katie Antypas, NERSC
The Genepool system is a cluster dedicated to the JGI’s computing needs. Phoebe is a smaller test system for Genepool that is primarily used by NERSC staff to test new system configurations and software. This introduction to the Genepool system for NERSC staff will discuss its configuration, usage, and support model.
Adaptive Time-Space Algorithms for the Simulation of Multi-Scale Reaction Waves
Thursday, May 10, 10:00–11:00 am, 50B-2222
Max Duarte, Ecole Centrale Paris and University of Nice
Numerical simulations of multi-scale phenomena are commonly used for modeling purposes in many applications such as combustion, chemical vapor deposition, or air pollution modeling. In general, all these models raise several difficulties created by the high number of unknowns, the wide range of temporal scales due to large and detailed chemical kinetic mechanisms, as well as steep spatial gradients associated with very localized fronts of high chemical activity. Furthermore, a natural stumbling block to perform 3D simulations with all-scales resolution is either the unreasonably small time step due to stability requirements or the unreasonable memory requirements for implicit methods.
In this work, we introduce a new resolution strategy for multi-scale reaction waves based mainly on time operator splitting and space adaptive multiresolution, in the context of very localized and stiff reaction fronts. It considers high order time integration methods for reaction, diffusion and convection problems, in order to build a time operator splitting scheme that exploits efficiently the special features of each problem. Based on theoretical studies of numerical analysis, such a strategy leads to a splitting time step which is restricted neither by fast scales in the source term nor by restrictive stability limits of diffusive or convective steps, but only by the physics of the phenomenon. Moreover, this splitting time step is dynamically adapted, taking into account a posteriori error estimates, carefully computed by a second embedded and economic splitting method.
The main goal is then to perform computationally very efficient as well as accurate in time and space simulations of the complete dynamics of multi-scale phenomena under study, considering large simulation domains with conventional computing resources and splitting time steps purely dictated by the physics of the phenomenon and not by any stability constraints associated with mesh size or source time scales. Applications will be presented in the fields of combustion waves, biomedical engineering, and plasma discharges dynamics.
Dissertation Talk: Selective Embedded Just-In-Time Specialization (SEJITS) for Productive High Performance Embedded Domain Specific Languages
Friday, May 11, 12:00–1:00 pm, 405 Soda Hall, UC Berkeley
Shoaib Kamil, UC Berkeley EECS and LBNL/CRD
Domain-expert “productivity programmers” desire scalable application performance, but usually must rely on “efficiency programmers” who are experts in explicit parallel programming to achieve it. Since such efficiency programmers are rare, to maximize reuse of their work we wish to encapsulate their expertise into mini-compilers for domain-specific embedded languages (DSELs) glued together by a common high-level host language that is familiar to productivity programmers. The SEJITS (Selective Embedded Just-In-Time Specialization) methodology enables embedding these mini-compilers in widely used productivity languages such as Python, Ruby, and Lua by leveraging features of these languages (like good Foreign Function Interfaces, introspection, and metaprogramming) and external optimizing compiler toolchains. SEJITS combines DSELs and code generation with auto-tuning, enabling programmers to build high-performance productive DSELs in modern productivity languages.
This talk outlines our proof of concept, called Asp (Asp is SEJITS for Python) which strives to make developing DSEL compilers easy. I will present results for a number of implemented DSELs and applications across a variety of domains, including machine learning, stencil computations, and graph algorithms. Results show these compilers can obtain up to 98% of peak performance, work well with existing software packages, and can be used to obtain high parallel performance across domains and architectures, all while programming in a productive high-level language.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.