During the symposium, postdoctoral researchers currently working at the Lab shared 10-minute slide presentations on their projects with an audience of peers, mentors, and coworkers, followed by interactive Q&As throughout the day

The annual Berkeley Lab Computing Sciences’ Postdoc Symposium offers the area’s early career researchers a unique opportunity to refine their technical presentation skills. The participants receive coaching and mentorship from senior leaders and communications professionals in the Computing Sciences Area and the chance to present their research to the broader lab community. Afterward, participants get a recording of their talk to share with others and to hone their presentation skills.

“Real-Time Eigensolver Using Quantum Hardware”

Scalable Solvers Group
Morning Session, Group 1

Abstract: Recently, quantum algorithms leveraging dynamical evolution under a many-body Hamiltonian of interest have proven exceptionally effective in extracting important spectral information. In the first part of the presentation, we show how to pinpointing individual eigenvalues near the edge of the Hamiltonian spectrum, for example the ground state energy, using real-time evolution on quantum hardware. In the second part, we delve into the capacity of real-time evolution for accessing the aggregate of eigenvalues across the entire spectrum, where we focus on the estimation of spectral statistics.

“MDLoader: A Hybrid Model-driven Data Loader for Distributed Deep Neural Networks Training”

Performance and Algorithms Research Group
Morning Session, Group 1

Abstract: In-memory distributed storage that keeps the dataset in the local memory of each computing node is widely adopted for DL training over file-based I/O for its rapid speed. Processes can then use either one-sided communication or collective communication to fetch data from remote processes. However, selecting one-sided communication or collective communication for optimal performance depends on a variety of factors. Thus, this research proposes MDLoader, a hybrid in-memory data loader for distributed deep neural networks. MDLoader introduces a model-driven performance estimator to automatically switch between one-sided and collective communication at runtime.

“Studying Rare Chemical Reactions via Deep Learning”

Scalable Solvers Group
Morning Session, Group 1

Abstract: The Variational Quantum Eigensolver offers a powerful solution for quantum chemistry problems. However, its accuracy can be compromised by measurement errors, especially in a constrained measurement budget. To address this, we introduce an innovative reinforcement learning methodology to optimize shot assignment in VQE.

“Superconducting Computing for the Future of HPC”

Computer Architecture Group
Morning Session, Group 1

Abstract: Superconducting digital circuits operate on information that is carried by ps-wide, $\boldsymbol{\mu}$V-tall, single flux quantum (SFQ) pulses. These circuits can operate at frequencies of hundreds of GHz with orders of magnitude lower switching energy than complementary-metal-oxide-semiconductors (CMOS). However, under the stringent area constraints of modern superconductor technologies, fully-fledged, CMOS-inspired superconducting architectures cannot be fabricated at large scales. Unary SFQ (U-SFQ) is an alternative computing paradigm that can address these area constraints. In U-SFQ, information is mapped to a combination of streams of SFQ pulses and in the temporal domain. In this work, we extend U-SFQ to introduce novel building blocks such as a multiplier and an accumulator. These blocks reduce area and power consumption by (2x) and (4x) compared with previously-proposed U-SFQ building blocks, and yield at least 97% area savings compared with binary approaches. Using these multiplier and adder, we propose a U-SFQ Convolutional Neural Network (CNN) hardware accelerator capable of a comparable peak performance with a state-of-the-art superconducting binary approach (B-SFQ) in (32x) less area. CNNs can operate with 5-8 bits of resolution with no significant degradation in classification accuracy. For 5 bits of resolution, our proposed accelerator yields (5x) to (63x) better performance than CMOS and (15x) to (173x) better area efficiency than B-SFQ.

“SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning”

Machine Learning and Analytics Group
Morning Session, Group 1

Abstract: Modeling high-fidelity spatiotemporal dynamics is central to modern science and engineering. However, it is computationally demanding and resource-intensive to acquire high-quality simulation or experimental data. Recently, machine learning has emerged as a promising tool for reconstructing high-resolution dynamics. In this presentation, I will introduce several recent works on learning spatiotemporal data from sparse measurements.

“Scaling Many-Body Quantum Materials Calculations to the Exascale”

Applied Computing for Scientific Discovery Group
Morning Session, Group 1

Abstract: The BerkeleyGW program presents a scalable and portable platform to achieve highly accurate quantum mechanical calculations based on many-body perturbation theory, across a wide range of material systems. Of particular interest is the calculation of the polarizability within the RPA formalism, which often presents a significant bottleneck when calculating important quantities such as the self-energy corrections to the ground or excited states. By developing an efficient GPU offloading method for this RPA calculation, the calculation of highly accurate quantum mechanical properties can be sped up drastically.

“Dynamic Graph Sketching: To Infinity And Beyond”

Performance and Algorithms Group
Morning Session, Group 2

Abstract: Existing graph stream processing systems must store the graph explicitly in RAM which limits the scale of graphs they can process. The graph semi-streaming literature offers algorithms which avoid this limitation via linear sketching data structures that use small (sublinear) space, but these algorithms have not seen use in practice to date. In this talk I will explore what is needed to make graph sketching algorithms practically useful, and as a case study present a sketching algorithm for connected components and a corresponding high-performance implementation. Finally, I will explore the potential applications for large-scale science.

“Quantum Circuit Resizing for Resource Optimization”

Applied Computing for Scientific Discovery Group
Morning Session, Group 2

Abstract: Mid-circuit measurement and reset (MMR) technology, initially aimed at implementing quantum error correction, has enabled circuit optimization in the NISQ era. By reducing qubit requirements through a technique called circuit resizing, MMR allows larger programs to run on smaller quantum chips with fewer gates. However, not all circuits are resizable; traditional algorithms are limited to circuits with specific gate dependencies. This work introduces a novel numerical-instantiation-based resynthesis algorithm that transforms non-resizable circuits into resizable ones, expanding the range of programs that can benefit from this optimization

“IDIOMS – Index-powered Distributed Object-centric Metadata Search for Scientific Data Management”

Scientific Data Management Group
Morning Session, Group 2

Abstract: Scientific applications continuously produce vast data, necessitating efficient metadata search tools in object-centric data management systems. Many existing solutions don’t optimize for HPC systems or detailed affix-oriented metadata searches. IDIOMS, a metadata search engine with a distributed index, addresses this by enhancing affix-based metadata search performance in parallel object-centric storage. Unique features include support for four metadata query types and two parallel application scenarios. The integration of a distributed adaptive radix tree with a local trie-based index offers superior scalability and performance. IDIOMS outperforms competitors like SoMeta, showcasing up to 50-fold performance boosts and minimal indexing overhead.

“Elephants Sharing the Highway: Studying TCP Performance in Large Transfers for Scientific Networks”

Scientific Data Management Group
Morning Session, Group 2

Abstract: The increasing demands for bandwidth are challenging the capabilities of advanced data networks, potentially compromising their performance. The efficiency and reliability of these networks are often maintained using TCP congestion control algorithms, which are designed to manage bandwidth effectively. Crucial elements such as Active Queue Management (AQM) algorithms and the size of router buffers play a significant role in network throughput. This study provides a comprehensive analysis of TCP fairness and network performance by examining several TCP variants—CUBIC, Reno, Hamilton, and both versions of BBR—within high-capacity networks that can handle up to 25 Gbps. We investigate how these TCP versions interact with different AQM mechanisms including FIFO, FQ_CODEL, and RED, as well as with varying router buffer sizes. Our research uncovers that adjustments in buffer management and queuing strategies can lead to varied performance outcomes depending on the available bandwidth. Among the key findings, BBR version 2 is identified as an exceptionally equitable algorithm, especially crucial for fast data transfers in contexts like scientific data exchange. These insights are vital for informing the development of future network infrastructures that aim to optimize performance fairly and efficiently.

“Visualizing Local and Global Structure in Neural Network Loss Landscapes”

Machine Learning & Analytics Group
Morning Session, Group 2

Abstract: Characterizing the loss of a neural network can provide insights into local structure (e.g., smoothness of the so-called loss landscape) and global properties of the underlying model (e.g., generalization performance). Inspired by powerful tools from topological data analysis (TDA) for summarizing high-dimensional data, here we characterize the underlying shape (or topology) of loss landscapes. To demonstrate the utility and versatility of our approach, we study established models from image pattern recognition (e.g., ResNet) and scientific machine learning (e.g., physics-informed neural networks) and show how quantifying the shape of loss landscapes can provide new insights into model performance and learning dynamics.

“Deep Learning Operator for Fusion Simulations”

Scalable Solvers Group
Morning Session, Group 3

Abstract: This presentation delves into the exploration of deep learning techniques, specifically the Fourier Neural Operator (FNO) type, for enhancing fusion simulation. We explore how FNO can transform our approach to fusion research by achieving accurate and efficient results. By applying FNO to the simulation of fusion processes, we aim to shed light on the promising opportunities for advancing computational methodologies in the field of fusion plasmas.

“Achieving Performance on the Exascale Era”

Performance and Algorithms Research Group
Morning Session, Group 3

Abstract: In-memory distributed storage that keeps the dataset in the local memory of each computing node is widely adopted for DL training over file-based I/O for its rapid speed. Processes can then use either one-sided communication or collective communication to fetch data from remote processes. However, selecting one-sided communication or collective communication for optimal performance depends on a variety of factors. Thus, this research proposes MDLoader, a hybrid in-memory data loader for distributed deep neural networks. MDLoader introduces a model-driven performance estimator to automatically switch between one-sided and collective communication at runtime.

“Particle Track Reconstruction with a GNN-based Pipeline”

Scientific Data Management Group
Morning Session, Group 3

Abstract: In preparation for the upcoming HL-LHC era, ATLAS is pursuing several methods to reduce the resources consumption needed to reconstruct the trajectory of charged particles (tracks) in the new all-silicon Inner Tracker (ITk). This includes the development of new algorithms suitable for massively parallel computing architecture like GPUs. Algorithms for track pattern recognition based on graph neural networks (GNNs) have emerged as a particularly promising approach. I willl describe a first functional implementation of a GNN-based track pattern reconstruction for ITk, achieving a high GNN track reconstruction efficiency and promising fake track rate.

“Feature Extraction of Electroencephalogram Signals by Solving a Nonlinear Eigenvalue Problem “

Scalable Solvers Group
Morning Session, Group 3

Abstract: Common spatial pattern (CSP) is a widely-used method for extracting discriminatory features of electroencephalogram (EEG) signals. The minmax CSP enhances the robustness of CSP by employing data-driven covariance matrices. We demonstrate that, by leveraging optimal conditions, the minmax CSP is recasted as an eigenvector-dependent nonlinear eigenvalue problem (NEPv). We solve this NEPv using a self-consistent field (SCF) iteration with line search, and demonstrate its local quadratic convergence. Furthermore, we highlight the improved motor imagery classification rates and reduced running time of the proposed solver in comparison to the existing minmax CSP algorithm.

“Optical Surfaces Inverse Design with Tandem Neural Networks”

Applied Computing for Scientific Discovery
Morning Session, Group 3

Abstract: We introduce a novel approach to optical metasurface design using tandem neural networks that bypasses the complex light-matter interaction modeling. Trained with over 35,000 laser-engineered samples, our model predicts optical properties from laser parameters with high accuracy (error < 2.5%). This significantly narrows the necessary parameter space for design, facilitating rapid innovation. Our validated method, applied to thermophotovoltaic emitters, promises to expedite advancements in energy-harvesting technologies and beyond.

“Parallel Tensor Decomposition Algorithms in the Tensor-Train Format “

Scalable Solvers Group
Afternoon Session, Group 4

Abstract: The tensor-train (TT) format is a low rank tensor representation frequently used for high order tensors. Traditionally, the TT format is computed directly with all the elements in the tensor. In this talk, we propose two algorithms, parallel-TT-sketching and parallel-TT-cross, that partition the tensor and perform decomposition individually on each sub-tensor before merging factors together. This type of factorization framework is ideal for distributed memory parallelism. For the algorithms, we provide theoretical guarantees, and scaling and communication analysis. For example, strong scaling results on the Hilbert tensor suggest that both algorithms are better than their serial counterparts, and scale well with the number of computing cores, with respect to both storage and timing.

“Structured Sparsity for Markov Cluster Algorithm”

Performance and Algorithms Research Group
Afternoon Session, Group 4

Abstract: The Markov cluster algorithm (MCL) performs flow-based clustering on a graph, often constructed to represent similarities among data points. MCL (and its distributed version HipMCL) has high memory footprint and high communication cost when running at large concurrencies, which makes it hard to apply to extremely large datasets. We are investigating how to utilize the structure of the similarity matrix to accelerate the computation and reduce the memory footprint.

“Reduced Descriptions of Chemical Rate Processes Using Column Selection”

Mathematics Group
Afternoon Session, Group 4

Abstract: Continuous time Markov chains offer a simple and rigorous model for understanding the kinetics of classical chemical systems. However, even when discretized, the number of states to consider in such an approach commonly scales exponentially with the size of the chemical system. It is thus quite difficult to rigorous and efficiently compute a reduced (human-understandable) description of such systems. Here, I describe our recent work to automatically select configurations using a combination of efficient Laplacian solvers, randomized sampling, and repeated column selection in a matrix-free approach.

“Fast Algorithms for Constructing Surrogate Models”

Scalable Solvers Group
Afternoon Session, Group 4

Abstract: I will discuss recently developed randomized algorithms which make it possible to solve certain Kronecker structured least squares problems very fast. I will discuss the utility of this approach when fitting surrogate models in the form of polynomial chaos expansions.

“High-Dimensional Integration with Low-Rank Tensors”

Scalable Solvers Group
Afternoon Session, Group 4

Abstract: Low-rank approximation to solutions of large-scale ODEs and high-dimensional PDEs has proven to be a useful technique across a wide range of scientific applications. We present a geometric perspective on numerical time integration of low-rank solutions to initial value problems which provides a framework for understanding stability and accuracy of low-rank integrators.

“Quantized Tensor Trains for Superfast Numerical Simulation”

Scalable Solvers Group
Afternoon Session, Group 4

Abstract: It has recently been shown that quantized tensor trains (QTTs) can be used to approximately solve time-dependent partial differential equations like the Navier-Stokes and the Vlasov-Maxwell equations with reduced cost and reasonable accuracy. Time evolution in the QTT format is particularly interesting, since it was observed that one may be able to take larger time steps than with traditional grid-based methods, which are limited by the CFL condition. Here, we investigate different QTT time evolution schemes, and compare their performance for simple test problems with respect to time step and grid resolution. In particular, we focus on quantifying errors in observables and the amount of numerical noise that is introduced.

“Enabling Fast Weather Simulations by Reducing Communication Costs”

Application Performance Group, NERSC
Afternoon Session, Group 5

Abstract: The Energy Research and Forecasting (ERF) project is a modern C++ GPU-enabled code that bridges the gap between mesoscale weather simulations and microscale wind turbines through its adaptive mesh refinement capability provided by the underlying AMReX framework. Performance profiling of ERF on the Perlmutter compute system is presented. The code performs optimally with 4 to 8 OpenMP threads on CPUs with a hybrid MPI+OpenMP parallelism. On GPUs, the numerical computations demonstrate excellent weak scaling up to 32 nodes while enabling GPU-aware MPI communication reduces the wall times by 20%. Strided access due to stencil operations and dependency loops along the vertical coordinate due to specific boundary conditions are identified as the primary cost factors for the GPU computations.

 

“Deep Learning for Bayesian Superresolution Electron Microscopy”

Computational Biosciences Group
Afternoon Session, Group 5

Abstract: In electron microscopy, the precision of individual electron detection and localization is limited by physical limitations of sensor fabrication, as well as sensor saturation in high-electron-dose regimes. In this work, we investigate the use of modern computer vision techniques in boosting localization accuracy beyond physical detector limits, as well as uncertainty-aware segmentation to unify low-dose and high-dose detection regimes. Our model is based on a distribution-matching approach that integrates particle physics knowledge in its estimates.

 

“Adaptive Sketching Based Hierachical Matrix Construction on GPUs”

Scalable Solvers Group
Afternoon Session, Group 5

Abstract: I will present my recent work on the construction of Hierachical matrices on GPUs in optimal time using sketching. The method only requires a fast matrix vector product and entry generation for the operator being approximated and can do so adaptively.

 

“The Challenges and Solutions of Generative AI for Image Watermarking Security”

Robust Deep Learning
Afternoon Session, Group 5

Abstract: This study delves into the impact of generative artificial intelligence on image production, where GenAI’s capabilities blur the lines between human and machine-generated content. Our study investigates the practical implications of adversarial attacks on watermarking systems, revealing vulnerabilities even in sophisticated watermarks like the Tree-Ring Watermark. This underscores the pressing need for robust and adaptive security measures. Insights from adversarial attacks not only deepen our understanding of watermark vulnerabilities but also drive the development of more resilient and dynamic watermarking techniques. Strategic watermark removal from watermarked images opens avenues for the creation of robust watermarking algorithms. These findings contribute to the ongoing dialogue on image generation, watermarking security, and the intricate relationship between adversarial attacks and watermark vulnerabilities.