A-Z Index | Directory | Careers

Performance Analysis

Performance analysis focuses on the research and development of technologies and algorithms that enhance the performance, scalability, and energy efficiency of applications running on the Department of Energy's multicore-, manycore-, and accelerator-based supercomputers.

At Berkeley Lab, our researchers develop performance models to understand the inherent bottlenecks in today's systems as well as predict the performance and bottlenecks of tomorrow's exascale systems. To that end, we have formed strong research collaborations with computer science, computer architecture, applied math, and application research teams. The performance insights gained help drive performance optimization, architectural evaluation and procurements, and new algorithm development.

Projects

Roofline Model

Roofline is a visually intuitive performance model that is used to bound the performance of various numerical methods and operations running on a variety of architectures. Rather than simply using percent-of-peak estimates, the model can be used to assess the quality of attained performance by combining locality, bandwidth, and different parallelization paradigms into a single performance figure. One can examine the resultant Roofline figure in order to determine both the implementation and inherent performance limitations. Contact: Samuel Williams (Williams on the Web)

TOP500 Supercomputing Sites

The TOP500 project was started in 1993 to provide a reliable basis for tracking and detecting trends in high-performance computing. Twice a year, a list of the sites operating the 500 most powerful computer systems is assembled and released. The best performance on the Linpack benchmark is used as performance measure for ranking the computer systems. The list contains a variety of information including the system specifications and its major application areas. Contact: Erich Strohmaier (Strohmaier on the Web)

Performance Analysis of AI Hardware and Software

The performance characteristics of AI training and inference can be quite distinct from HPC applications despite possessing similar computational methods (large/small matrix multiplications, stencils, gather/scatter, etc...) albeit at reduced precision (single, half, BFLOAT16). Understanding the interplay between science, AI method, framework, and architecture is essential in not only in quantifying the computational potential for current and future architectures running AI models, but also identifying the bottlenecks and the ultimate limits of today's models. Contact: Samuel Williams (Williams on the Web)

Machine Learning for Performance Analytics

The complexity of performance analysis is recently exacerbated with growing complexity of modern architectures. We are leveraging machine learning techniques to analyze machine-level performance data to answer a wide set of performance questions including identifying scaling bottlenecks and studying the similarity of computational patterns. We coupled that analysis with intuitive visualization techniques for application developers, such as Dashing. We are leveraging these methodologies in our recent SciDAC5 partnershipsContact: Khaled Ibrahim (Ibrahim on the Web)

Runtime Optimization for HPC Deep-learning Applications

HPC deep-learning applications have distinct computational attributes in comparison with commercial workloads regarding their computational intensity and data streaming requirements. Optimizing the processing efficiency requires visiting a wide range of fronts from compressibility of the data, multi-layer software stack interactions, accelerated preprocessing, etc. Our optimization techniques improve the efficiency of executing these workloads by up to an order of magnitude. Contact: Khaled Ibrahim (Ibrahim on the Web)

Graph Operation Performance on Emerging HPC Architectures

The algorithms and operations related to graphs are characterized with low computational intensity and high memory footprint. Within the guidance of the standardization efforts in this field, we are assessing the performance implications of graph algorithms on shifting architectural trends of HPC systems towards heterogeneity and preference of computational patterns with high regularity and arithmetic intensity. We are developing a framework for graph computations with an emphasis on leveraging algorithmic techniques that take advantage of heterogeneity and couple this framework with operation- and structure-aware optimization. Contact: Oguz Selvitopi (Selvitopi on the Web)