InTheLoop | 01.21.2014
Are Earths Rare? Perhaps Not
A pipeline developed at NERSCfor finding Earth-like planets in our Milky Way Galaxy recently released some surprising findings. One out of every five sun-like stars in our Milky Way galaxy has an Earth-sized planet orbiting it in the Goldilocks zone—not too hot, not too cold—where surface temperatures should be compatible with liquid water, according to a statistical analysis of data from NASA's Kepler spacecraft by Erik Petigura, a graduate student at the University of California, Berkeley (UC Berkeley). Petigura and his colleague Andrew Howard, now at the University of Hawaii, Manoa, spent three years developing a transit search pipeline called TERRA that is optimized for finding small planets. When they used this tool on supercomputers at the National Energy Research Scientific Computing Center (NERSC) to analyze nearly four years of Kepler observations, the scientists determined that our galaxy could contain as many as 40 billion habitable Earth-sized planets. »Read more.
New Framework for Nanocrystallography Analysis
The emerging field of X-ray nanocrystallography offers great promise for studying macromolecules such as those found in the membranes of cell walls, but there are a number of challenges in using the technique. In a paper published Dec. 16, 2013 in the online edition of the Proceedings of the National Academy of Sciences, Berkeley applied mathematicians Jeffrey Donatelli and James Sethian describe their development of mathematical tools to address some of these issues. The work is part of a new project, led by Sethian, known as CAMERA (The Center for Applied Mathematics in Energy Research Applications) to design and apply mathematical solutions to data and imaging problems at Berkeley Lab facilities funded by the Department of Energy's Office of Basic Energy Sciences. »Read more.
NERSC Supercomputers Push Cosmological Limits
Julian Borrill, Borrill, co-lead of Lawrence Berkeley National Laboratory's Computational Cosmology Center, and one of the investigators for the Planck telescope project, explained his work in an HPCWire podcast last week. »Read more.
TOP500 List Founder and ISC General Chair Hans Meuer Dies at Age 77
Prof. Hans Meuer—who together with CRD's Erich Strohmaier created the widely watched biannual listing of the world's 500 most powerful supercomputers—died Monday at his home in Germany. He was 77. Meuer also was the guiding force behind the International Supercomputing Conference (ISC), which began in 1986 as a small workshop at the University of Mannheim, where Meuer was a professor of computer science. Today, ISC is the world's second largest high-performance computing conference, drawing about 2,500 attendees. Both Strohmaier and Berkeley Lab deputy director Horst Simon are popular presenters at ISC and also serve as editors of the TOP500 list. Meuer is survived by his wife, Ursula, and sons Andreas, Martin and Thomas.
This Week's Computing Sciences Seminars
Grappa: A Latency-Tolerant Runtime for Large-Scale Irregular Applications
Tuesday, January 21, 2014, 10:00am - 11:30am, Bldg. 50A, Room 5132
Simon Kahan, Jacob Nelson, Brandon Holt, Brandon Myers, University of Washington
This seminar will be delivered in three separate talks as outlined below:
Grappa: A Latency-Tolerant Runtime for Large-Scale Irregular Applications
Jacob Nelson, University of Washington
Grappa is a runtime system for commodity clusters that presents a multithreaded, single address space abstraction to applications. Grappa's purpose is to enable scalable performance of irregular parallel applications, such as graph processing. Poor data locality, imbalanced parallel work and complex communication patterns make scaling these applications difficult. Grappa serves both as a C++ user library and as a foundation for higher level languages. It tolerates delays to remote memory by multiplexing thousands of lightweight workers to each processor core, balances load via fine-grained distributed work-stealing, increases communication throughput by aggregating smaller data requests into large ones, and provides efficient synchronization and remote memory operations. Together, these features allow Grappa to provide performance on commodity hardware that approaches that of the Cray XMT, a custom system used to target the real time graph analytics market.
Turning Contention into Cooperation: Reducing the cost of synchronized global data structures in Grappa
Brandon Holt, University of Washington
In this age of "big data", analyzing and understanding the data pouring in from various sources is crucial. A large fraction of data analytics tasks, such as graph analytics, are characterized as irregular due to their fine-grained, data-dependent access patterns with scarce spatial locality. Grappa is a runtime system for commodity clusters and clouds that presents a massively parallel, single address space programming model to applications and delivers scalable performance on irregular highly-parallel workloads. Massive multithreading is used to tolerate the latency of aggregating communication and providing sequentially-consistent access to global memory. In the Grappa library of global data structures, maintaining a consistent view is costly. Borrowing a technique from physically-shared memory, flat combining, this work allows threads to cooperate locally to combine data structure operations and commit them in bulk. This greatly reduces serialization on the global data structure, and can even avoid instantiating some operations completely. Future work will formalize the semantics of these combining data structure operations and leverage opportunities to avoid communication even further.
Code generation for In-Memory Path-Counting Queries
Brandon Myers, University of Washington Dissatisfaction with relational databases for large-scale graph processing has motivated a new class of graph databases that offer fast graph processing but sacrifice the ability to express basic relational idioms. However, we hypothesize that the performance benefits amount to implementation details, not a fundamental limitation of the relational model. To evaluate this hypothesis, we are exploring code-generation to produce fast in-memory algorithms and data structures for graph patterns that are inaccessible to conventional relational optimizers. In this talk, we present preliminary results for this approach on path-counting queries, which includes triangle counting as a special case. We compile Datalog queries into main-memory pipelined hash-join plans in C++, and show that the resulting programs easily outperform PostgreSQL on real graphs with different degrees of skew. We then produce analogous parallel programs for Grappa, a runtime system for distributed memory architectures. Grappa is a good target for building a parallel query system as its shared memory programming model and communication mechanisms provide productivity and performance when building communication-intensive applications. Our experiments suggest that Grappa programs using hash joins have competitive performance with queries executed on Greenplum, a commercial parallel database. We find preliminary evidence that a code generation approach simplifies the design of a query engine for graph analysis and improves performance over conventional relational databases.
Measuring and Manipulating the Robustness of Large Networks
Wednesday, January 22, 2014, 11:00am - 12:00pm, Bldg. 50F, Room 1647
Leman Akoglu, Department of Computer Science, Stony Brook University
The function and performance of networks rely on their robustness, defined as their ability to continue functioning in the face of damage (targeted attacks or random failures) to parts of the network. Prior research has proposed profoundly diverse measures to quantify, and a long list of manipulation strategies to alter network robustness. In this talk, I will focus on these two aspects: robustness measures and manipulation techniques. First, I will discuss various robustness measures and identify their strengths and weaknesses. Analysis suggests natural connectivity, based on the weighted count of loops in a network, to be a reliable measure. Then, I will introduce a fast, principled manipulation algorithm that directly optimizes this robustness measure, which leads to significant performance improvement over up to 11 existing, ad-hoc heuristic solutions for node and edge removals. At the end, I will also highlight other research works, including fast vertex proximity processing in massive graphs and analysis of human brain networks.
The Parallel Nonsymmetric QR Algorithm with Aggressive Early Deflation
Thursday, January 23, 2014, 9:30am - 10:30am, Bldg. 50F, Room 1647
Meiyue Shao, École polytechnique fédérale de Lausanne (EPF Lausanne)
We present the new parallel nonsymmetric QR algorithm in ScaLAPACK v2.0. The multiwindow bulge chain chasing approach allows most computations in the bulge chasing stage to be performed in level 3 BLAS. Multilevel aggressive early deflation algorithms are developed to decrease the total amount of communications. These techniques make the new approach significantly outperform the pipelined QR algorithm in ScaLAPACK v1.8.0. Both performance models and numerical experiments demonstrate the efficiency of our new approach.
Hybrid Methods for Solving Large Sparse Systems
Thursday, January 23, 2014, 11:00am - 12:00pm , Bldg. 50B, Room 4205
Iain Duff, STFC-RAL, UK and CERFACS, France
Although current work on direct methods has enabled the solution of very large systems efficiently, there are still applications that can cause problems particularly because of memory requirements. On the other hand, iterative methods can often be very slow to converge even with a good preconditioner. We are thus drawn to examine techniques that combine the best features of both approaches. We call these hybrid methods. We will examine in particular a hybrid technique based on the Block Cimmino method. This is a somewhat elementary iterative technique but can be accelerated in a number of ways. We study a derivative of this method which is essentially a direct method and compare the performance of this against the MUMPS multifrontal solver and Block Cimmino accelerated by block conjugate gradients. A major strength of the block Cimmino method is that it is easily parallelizable. We also comment on this aspect of the approach.
This is joint work with Ronan Guivarch, Daniel Ruiz, and Mohamed Zenadi of ENSEEIHT-IRIT, Toulouse.