IntheLoop | 06.21.2010
June 21, 2010
CRD Researchers Get $4 Million for Energy-Related Research
CRD researchers have received more than $4 million in funding for six projects that will help develop computational methods to answer some of the nation’s most pressing questions regarding energy efficiency, climate stabilization, and next-generation, carbon-neutral energy sources. Read more.
Astrophysics Code Scales to Over 200,000 Processors
Performing high-resolution, high-fidelity, three-dimensional simulations of Type Ia supernovae, the largest thermonuclear explosions in the universe, requires not only algorithms that accurately represent the correct physics, but also codes that effectively harness the resources of the next generation of the most powerful supercomputers. Berkeley Lab’s Center for Computational Sciences and Engineering (CCSE) has developed two codes that can do just that. Read more.
Cray CEO Peter Ungaro Outlines Future Directions in Talk at NERSC
As part of his June 10 visit to Berkeley Lab and NERSC, Pete Ungaro, the Chief Executive Officer of Cray Inc., gave a candid assessment of the companys current role in the high performance computing market and described the company’s next-generation supercomputer architecture. Ungaro noted that the company has strong orders for its recently announced Cray XE6 supercomputer, one of which is slated to be delivered to NERSC later this year. Read more.
CS Staff Co-Author Posters at HotPar ’10
Last week, June 14–15, the Second USENIX Workshop on Hot Topics in Parallelism (HotPar ’10) was held at UC Berkeley. CRD and NERSC staff were co-authors of two posters:
- “A Principled Kernel Testbed for Hardware/Software Co-Design Research” by Alex Kaiser, Samuel Williams, Kamesh Madduri, Khaled Ibrahim, David Bailey, James Demmel, and Erich Strohmaier, all from CRD
- “Resource Management in the Tessellation Manycore OS” by Juan A. Colmenares, Sarah Bird, Henry Cook, Paul Pearce, and David Zhu, University of California, Berkeley; John Shalf of NERSC and Steven Hofmeyr of CRD; Krste Asanović and John Kubiatowicz, UC Berkeley
David Patterson of UC Berkeley and CRD was Program Co-Chair and is a HotPar Steering Committee member.
CS Staff Play Key Roles in VECPAR’10 and Workshops
VECPAR’10, the Ninth International Meeting on High Performance Computing for Computational Science, is being held this week (June 22–25) in Sutardja Dai Hall on the UC Berkeley campus. The Organizing Committee includes Osni Marques (chair), Tony Drummond, and Erich Strohmaier of CRD, Jonathan Carter of NERSC, and Masoud Nikravesh of CITRIS. Marques, Carter, and Drummond are also on the Scientific Committee, as is Sherry Li of CRD. Presentations include:
- “Petascale Parallelization of the Gyrokinetic Toroidal Code” by Stephane Ethier (Princeton Plasma Physics Laboratory), Mark Adams (Columbia University), Jonathan Carter (NERSC), and Leonid Oliker (CRD)
- “Multicore Research” by David Patterson (UC Berkeley and CRD)
- “On Techniques to Improve the Robustness and Scalability of the Schur Complement Method” by Ichitaro Yamazaki and Sherry Li (CRD)
- “VDBSCAN and α-Bisecting Spherical K-Means in Distributed Information Retrieval Systems” by Daniel Jiménez and Vicente Vidal (Polytechnic University of Valencia) and Tony Drummond (CRD)
Workshops held in conjunction with VECPAR are the Fifth International Workshop on Automatic Performance Tuning (iWAPT), the Workshop on Programming Environments for Emerging Parallel Systems (PEEPS), and the Tutorial on High Performance Tools for the Development of Scalable and Sustainable Applications (HPC Tools).
Osni Marques of CRD and John Shalf of NERSC are on the iWAPT Program Committee. Jonathan Carter of NERSC is general chair of the iWAPT Organizing Committee. Lenny Oliker of CRD and Nick Wright of NERSC are the organizers of PEEPS. Tony Drummond and Osni Marques of CRD are the organizers of HPC Tools, along with Sameer Shende of the University of Oregon and Jose E. Roman of the Technical University of Valencia. Presentations at PEEPS will include:
- “Lattice Boltzmann Hybrid Auto-Tuning on High-End Computational Platforms” by Sam Williams (CRD)
- “Experiences with UPC at Scale” by Costin Iancu (CRD)
CS Staff Contribute to HPDC, OGF29, and ScienceCloud in Chicago
The ACM International Symposium on High Performance Distributed Computing (HPDC), the Open Grid Forum (OGF29), and the first Workshop on Scientific Cloud Computing (ScienceCloud) are all being held this week in Chicago, and Berkeley Lab researchers are involved in all three.
At HPDC, a poster will be presented on “Lessons Learned from Moving Earth System Grid Data Sets over a 20 Gbps Wide-Area Network”; the first author is Raj Kettimuthu of Argonne National Lab, and the many co-authors include Alex Sim, Dan Gunter, and Vijaya Natarajan of CRD, Eli Dart of ESnet, and Jason Hick and Jason Lee of NERSC.
At OGF29, Inder Monga of ESnet is co-chair of the Network Service Interface Working Group, which will hold a working session. Evangelos Chaniotakis of ESnet will also participate in OGF29.
At ScienceCloud, Keith Jackson, Lavanya Ramakrishnan, and Rollin Thomas of CRD, with Karl Runge of the Berkeley Lab Physics Division, co-authored “Seeking Supernovae in the Clouds: A Performance Study,” which is a best paper award finalist. Ramakrishnan will also present “Comparison of Resource Platform Selection Approaches for Scientific Workflows.” Kathy Yelick and Jeff Broughton of NERSC are on the workshop’s Steering Committee, and Ramakrishnan is on the Technical Program Committee. Broughton is also a session chair.
Self-Assessment Forms Are Due This Friday
For non-represented CS employees, Self-Assessment Forms for the pilot Performance Management Program (PMP) are due to your manager on Friday, June 25. The submission of this information to your manager is the first step in engaging in a conversation that fosters quality feedback and alignment on goals and objectives. The form, including a guidance sheet, can also be found on the HR Performance Management web site.
For represented CS employees, Employee Worksheets are also due on Friday, June 25, and the usual Performance Review and Development (PRD) process will be followed.
If you have any questions pertaining to the Performance Management Pilot or the Performance Review and Development process, please contact Marcia Ocon Leimer at (510) 495-2727.
Safety Suggestions Lead to Pit Stairway Improvements
Computing Sciences staff recently reported that many of the lights on the pit stairs were not working, and that several of the steps were dangerously unstable. These issues were entered into the Corrective Action Tracking System, and after interim repairs to the stairway lights, an entirely new lighting system was installed. The steps have been made stable, and are awaiting more permanent repair. The three CRD employees who reported the pit stairway problems have been nominated for SPOT Awards. Read more.
Safety suggestions and near-miss events should be reported to Betsy MacGowan at firstname.lastname@example.org or 510-495-2826. For emergencies call x7911 or 911 from a cell phone.
NERSC Has Opening for Web Application Developer
NERSC has developed a web toolkit called NEWT designed to make HPC web accessible through easy-to-write web applications. The purpose of this position is to use these and other technologies to deploy specific web applications for NERSC users. Based on communication with NERSC user groups, this position will develop portals and science gateways that bring NERSC resources to the web. These web-based scientific applications are tailored to the needs of specific science groups and their computing and data needs. See job details. The Lab’s Employee Referral Incentive Program (ERIP) awards $1,000 (net) to employees whose referral of an external candidate leads to a successful hire.
Wednesday Is Deadline to Apply for Grace Hopper Conference Funding
The Berkeley Lab Computing Sciences Diversity Working Group is able to pay the travel and registration of a few Lab staff to attend the Grace Hopper Celebration of Women in Computing conference in Atlanta, Georgia from September 28 to October 2, 2010. If you would like to attend the conference with your travel costs sponsored by the Diversity Working Group, please contact Deb Agarwal. Send the following information:
- Paragraph giving the reason you would like to attend Grace Hopper
The deadline for submitting applications is June 23, 2010.
Aerobatic Cecilia Aragon Video Featured in the Other “In the Loop”
The June 2010 issue of the other “In the Loop” — the official e-newsletter of the International Aerobatic Club — features this item:
Last month, Reggie asked readers to identify who was flying what type of airplane in a photo posted in his editor’s note. The pilot was none other than Cecilia Aragon flying her Sabre 320. Cecilia won a bronze medal at both the 1993 U.S. National Aerobatic Championships and the 1994 World Aerobatic Championships, and flew air shows for a number of years before moving on to other adventures. We’ll be bugging her for a story in Sport Aerobatics Magazine in the coming months, so keep your eyes peeled.
Here’s a wonderful vintage video of Cecilia at the top of her game.
Around here Cecilia is better known as a computer scientist in CRD’s Advanced Computing for Science Department, winner of the Presidential Early Career Award, and one of Hispanic Business Magazine’s 2009 Women of Vision. But the video, composed of several clips made at different times during the 1990s, shows her when she was still doing air shows (and working at NASA). She’s not sure who put the video on YouTube or how it found its way to the newsletter, but she still gives flight instructions occasionally. In 2008 IEEE’s magazine The Institute featured a story about her hobby (scroll down to the second story).
This Week’s Computing Sciences Seminars
Evolving Time Surfaces in a Virtual Stirred Tank
Monday, June 21, 10:00–11:00 am, 50F-1647
Farid Harhad, Louisiana State University
The complexity of large scale computational fluid dynamic simulations demands powerful tools to investigate the numerical results. Time surfaces are the natural higher-dimensional extension of time lines, the evolution of a seed line of particles in the flow of a vector field. Adaptive refinement of the evolving surface is mandatory for high quality under reasonable computation times. In contrast to the lower-dimensional time line, there is a new set of refinement criteria that may trigger the refinement of a triangular initial surface, such as based on triangle degeneracy, triangle area, surface curvature, etc. In this talk we describe the computation of time surfaces for initially spherical surfaces. The evolution of such virtual “bubbles” supports analysis of the mixing quality in a stirred tank CFD simulation. We discuss the performance of various possible refinement algorithms, how to interface alternative software solutions, and how to effectively deliver the research to the end-users, involving specially designed hardware representing the algorithmic parameters.
Server-Push Architecture for Data Access Performance Optimization
Monday, June 21, 11:00 am–12:00 pm, 50F-1647
Surendra Byna, NEC Laboratories America, Inc.
In the last few years, we have seen enormous changes in the façade of computing. Supercomputing research is targeting exascale computing. Almost every computing device is being equipped with multi-core and many-core processors. Parallel computing has moved from the edge to the center of both the computer science research and IT industry due to the emergence of multi-core processors. Despite all these advances, data access delay has been a major reason for poor sustained performance of systems. The key to achieving efficiency in computing is to improve data access performance. In this talk, I will discuss our efforts in improving data access performance. Our efforts can be divided into memory performance modeling and parallel I/O performance optimization.
We developed memory performance prediction models based on data access patterns that are useful to choose effective optimization and prefetching strategies, with low overhead. I present our models to predict memory access cost, classifying the memory cost from communication and middleware latency. We have utilized these models to improve the performance of message passing interface (MPI) derived data types.
Our approach in optimizing parallel I/O performance is server-based data pushing. In traditional data prefetching, the client has to predict what data an application will access in the future and issue prefetching requests. However, due to the prediction overhead, aggressive and accurate prediction methods have been given low priority. In our method, we separate the prediction overhead onto a dedicated server and let the server push data closer to applications. I will talk about applying this method in parallel I/O and file systems.
Technical Computing with MATLAB and Simulink (Ball Tracking Case Study)
Tuesday, June 22, 8:30 am–12:30 pm, Bldg. 50 Auditorium
Isaac Noh, MathWorks Application Engineer
In this session, we will demonstrate how to acquire, analyze, and visualize data in MATLAB and then design and model a system using Simulink. Our example involves building, modeling, and deploying a system that rotates a laser to track an object.
Developing an Algorithm in MATLAB
Case study: Image acquisition and analysis
- Accessing data (from files or hardware)
- Analyzing and visualizing data
- Sharing your work
Modeling and Controlling a System in Simulink
Case study: Object tracking mechanism
- Simulating a dynamic system in Simulink
- Designing event-driven logic using Stateflow
- Performing feedback control design
- Deploying to hardware
Handling Large Data Sets in MATLAB
Tuesday, June 22, 2:30–4:30 pm, Bldg. 50 Auditorium
Isaac Noh, MathWorks Application Engineer
This seminar will describe strategies for handling large amounts of data in MATLAB and avoiding “out-of-memory” errors. It will provide you with an understanding of the causes of memory limitations in MATLAB and a set of techniques to increase the available memory in MATLAB. It will also show techniques for minimizing memory usage in MATLAB while accessing, storing, processing, and plotting data.
- Understanding the maximum size of an array and the workspace in MATLAB
- Setting the 3GB switch under Windows XP to get 1GB more memory for MATLAB
- Using textscan to read large text files and memory mapping feature to read large binary files
Large-Scale Data Management for the Sciences
Thursday, June 24, 10:00–11:00 am, 50F-1647
Tanu Malik, Purdue University
Modern scientific repositories are growing rapidly in size. Scientists are increasingly interested in viewing the latest data as part of query results. Current scientific middleware systems, however, assume repositories are static. Thus, they cannot answer scientific queries with the latest data. The queries, instead, are routed to the repository until data at the middleware system is refreshed. In data-intensive scientific disciplines, such as astronomy, indiscriminate query routing or data refreshing often results in runaway network costs. This severely affects the performance and scalability of the repositories and makes poor use of the middleware system.
In this talk, I will present Delta, a dynamic data middleware system for rapidly growing scientific repositories. Delta’s key component is a decision framework that adaptively decouples data objects, choosing to keep some data objects at the middleware, when they are heavily queried, and keeping some data objects at the repository, when they are heavily updated. Our algorithm profiles incoming workload to search for optimal data decoupling that reduces network costs. It leverages formal concepts from the network flow problem, and is robust to evolving scientific workloads.
Distributed applications such as the Delta framework often rely on a priori knowledge of query cardinalities to make optimization decisions. In this context, I will present a black-box approach to selectivity estimation that is suitable for distributed applications. We evaluate the efficacy of Delta through a prototype implementation by running query traces collected from a real astronomy survey.
Resource Selection Models for Scientific Workflows in Hybrid Platforms
Thursday, June 24, 2:00–3:00 pm, 50F-1647
Emad Soroush, Microsoft Research
Cloud computing is increasingly considered as an additional computational resource platform for scientific workflows. The cloud offers opportunity to scale-out applications from desktops and local cluster resources. Each platform has different properties (e.g., queue wait times in high performance systems, virtual machine startup overhead in clouds) and characteristics (e.g., custom environments in cloud) that make choosing from these diverse resource platforms for a workflow execution a challenge for scientists. Scientists are often faced with deciding resource platform selection trade-offs with limited information on the actual workflows.
While many workflow planning methods have explored resource selection or task scheduling, these methods often require fine-grained characterization of the workflow that is onerous for a scientist. In this presentation, we describe our work in using blackbox characteristics for suitability of different resource platforms. In our blackbox method, we use only limited high-level information on the workflow length, width, and data sizes. The length and width are indicative of the workflow duration and parallelism.
We have two sets of experiments. In the first experiment, we compare the effectiveness of the blackbox approach to other resource selection models using four exemplar scientific workflows on local cluster, HPC centers, and cloud platforms. We accomplish that by implementing different resource selection models (whitebox and blackbox models) and simulating the total runtime of the scientific workflow given platform characteristics and resource model chosen. We also provide a resource utilization feedback to the user. In the second experiment, we investigate the effectiveness of the blackbox model among different sets of synthetic workflows. Early results suggest that the blackbox model often makes the same resource selections as a more fine-grained whitebox model. We believe the simplicity of the blackbox model can help inform a scientist on the applicability of a new resource platform, such as cloud resources, even before porting an existing workflow.
Link of the Week: The Real Science Gap
In “The Real Science Gap” in Miller-McCune magazine, Beryl Lieff Benderly challenges some common beliefs about science education in the U.S. and discusses why careers in science are appealing to fewer young Americans. Some excerpts:
The American research enterprise has become so severely dysfunctional that it actively prevents the great majority of the young Americans aspiring to do research from realizing their dreams.… [The problem] is not, as many believe, that the nation is producing too few scientists, but, paradoxically, just the opposite.
“There is no scientist shortage,” declares Harvard economics professor Richard Freeman, a pre-eminent authority on the scientific work force. Michael Teitelbaum of the Alfred P. Sloan Foundation, a leading demographer who is also a national authority on science training, cites the “profound irony” of crying shortage — as have many business leaders, including Microsoft founder Bill Gates — while scores of thousands of young Ph.D.s labor in the nation’s university labs as low-paid, temporary workers, ostensibly training for permanent faculty positions that will never exist….
If the nation truly wants its ablest students to become scientists, [Harold] Salzman says, it must undertake reforms — but not of the schools. Instead, it must reconstruct a career structure that will once again provide young Americans the reasonable hope that spending their youth preparing to do science will provide a satisfactory career.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.