Breakthrough:
Berkeley Lab researchers have developed a new, highly efficient method for building large mathematical structures known as H2 matrices. An H2 matrix is a type of hierarchical matrix (or H-matrix) that efficiently represents and computes with large, dense matrices. These matrices often arise in scientific applications such as solving partial differential equations, integral equations, and other computational problems. By designing and implementing this method on graphics processing units (GPUs), the team achieved up to 1,000 times faster performance compared to previous approaches, making it possible to handle much larger problems using less memory and time. This breakthrough, which greatly expands the capabilities of scientific computing on modern hardware, was recognized with the Best Paper award at the 26th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (IPDSEC 2025).
Background:
Fast and scalable construction of H2 matrices is crucial for modern scientific research because they are used to speed up advanced computational techniques, such as solving complex equations (PDE solvers), training machine learning models, and tackling challenging problems where scientists need to work backwards from data to discover causes (inverse problems). In areas like energy systems modeling and materials science, researchers often need to analyze very large datasets or run detailed simulations, such as designing new battery materials or optimizing power grids. The new sketching-based method—which uses compact, approximate representations (or “sketches”) of large datasets to accelerate calculations—for building H2 matrices allows these calculations to be done much more quickly and efficiently—even for very large problems—by taking advantage of parallel computing on both CPUs and GPUs, scaling with linear complexity as problem size increases, and offering dramatic speedups over previous methods.
Breakdown:
This novel method uses a bottom-up, sketching-based algorithm that generalizes and improves upon previous techniques developed for HSS (Hierarchically Semi-Separable) matrices, enabling the construction of the more efficient H2 matrix format. The bottom-up approach systematically builds up the matrix structure from smaller blocks, allowing for better scalability and flexibility compared to traditional top-down methods.
To achieve high performance, the algorithm is carefully implemented on GPUs. It optimizes data layouts to minimize memory transfer overhead and organizes computations into batches, allowing the GPU to process many tasks in parallel. The method specifically leverages batched dense linear algebra kernels and batched routines for extracting matrix entries, which maximizes GPU throughput and efficiency. This combination of algorithmic innovation and hardware-aware implementation enables the rapid construction of large H2 matrices, making advanced scientific simulations and data analyses much more practical.
Co-authors:
Wajih Halim Boukaram, Yang Liu, Pieter Ghysels, and Sherry Li.
Publication:
Wajih Halim Boukaram, Yang Liu, Pieter Ghysels, and Sherry Li, “Adaptive Sketching Based Construction of H2 Matrices on GPUs,” Proceedings of the 26th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (IPDSEC), 2025.
Funding:
SciDAC-5 FASTMath Institute.
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab's Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.