Collage depicting 6 speakers at the 2023 Computing Sciences Postdoc Symposium.

On Tuesday, Feb. 7, 2023, the CSA held its fourth annual Postdoc Symposium at Berkeley Lab, where 18 postdoctoral speakers working at the Lab shared 10-minute slide presentations on their research with an audience of peers, mentors, and coworkers. View their individual presentations below.

“Exascale Simulations of Pulsar Magnetospheres”

Center for Computational Sciences and Engineering 
Morning Session, Group 1

Abstract: Pulsars are highly magnetized, rapidly rotating neutron stars that are the source of extreme particle acceleration that cannot be explained with conventional theory. Instead, explanations rely on microphysical plasma processes like magnetic reconnection that can only be simulated with a first-principles method like electromagnetic particle-in-cell (PIC). However, a simulation that captures both the small, centimeter-scale plasma phenomena and the large, ten-kilometer pulsar is challenging due to the factor-of-a-million scale difference. This talk will discuss our efforts to bridge the scale gap with the exascale PIC code WarpX and state-of-the-art computing resources. We take a two-pronged approach: 1) global simulations with scaled plasma parameters, and 2) zoomed-in simulations that resolve small-scale plasma processes. I will share results from these two campaigns, and discuss our next steps: developing algorithms that will allow us to couple the small and large scales and reduce the scale difference. These techniques, like adaptive mesh refinement in PIC and directly coupling PIC and magnetohydrodynamics, will be useful for other astrophysical and plasma applications.

“Interaction Energy Minimization, Symmetry, and Optimization Methods”

Math for Experimental Data Analysis Group
Morning Session, Group 1

Abstract: Structured geometric point sets play important roles in coding theory, mathematical biology, computational chemistry, wireless communications, compressed sensing, and `big data’ applications due to their often desirable statistical properties for measurement and transmission. I’ll describe in this talk some research which investigates numerical phenomena related to interaction energy minimization, detailing several results on continuous `probabilistic’ energies. In addition I’ll talk about some experiments using parallelized computation and optimization methods like trust-region conjugate gradient to numerically minimize energy.

“Towards a Quantum Network Test-Bed Over Deplolyed Fiber with Trapped Ions”

Hartmut Häffner Group
Morning Session, Group 1

Abstract: Quantum information science and the potential advantages it promises over classical technologies continue to garner interest in the academic, industrial and governmental spheres. The ability to generate high-rate, high-fidelity entanglement between quantum nodes separated by large (beyond the laboratory scale) physical distances promises to enable applications in quantum computation and networking, including modular/distributed quantum computing, quantum repeaters, and ultra-secure communication protocols. Trapped ions represent a strong candidate for memory nodes of a quantum internet because of their long coherence times, and photons are a natural choice for a traveling qubit. Here, we propose to demonstrate high-rate, high-fidelity (HRHF) remote entanglement between two trapped-ion quantum nodes separated by a 10 km telecom band optical fiber. This link will serve as a testbed for protocols which require HRHF remote entanglement such as quantum teleportation and distributed quantum computing. In this talk, we describe the trapped-ion-based scheme in detail as well as provide an update on the current status of the research effort.

“A High-Order Embedded Boundary Method for Modeling Fluid Flows Around Complex Geometries”

Applied Numerical Algorithms Group
Morning Session, Group 1

Abstract: This work presents a high-order finite-volume embedded boundary method that operates on moving domains of arbitrarily complex geometries. Meshing for this algorithm is automated by embedding geometries in a Cartesian grid with cut-cells along the boundaries. Our approach for solutions on these grids achieves fourth-order accurate discretizations that are stable in the presence of small cut-cells, without approaches such as mesh modification, cell merging, or redistribution. This is accomplished using a weighted least-squares reconstruction to evaluate fluxes in the presence of irregular cells with non-trivial boundary conditions. We demonstrate applications with incompressible flow solutions inside expanding bubbles of fluid, show the validity of the method, and discuss future work to extend to more complex physics.

“Parallel H2-Matrix Factorization Based on Skeletonization”

Scalable Solvers Group
Morning Session, Group 1

Abstract: Hierarchical matrices allow for memory efficient representation of the data sparse matrices that often appear in scientific applications. The open source H2Opus library provides distributed CPU and GPU implementations of several key operations using the H2-variant of hierarchical matrices, where nested row and column bases allow for asymptotically optimal memory storage requirements. In this talk, we introduce a new parallel factorization algorithm based on skeletonization and discuss its implementation on GPUs. The resulting factorization can be computed in optimal time and with the aid of non-uniform batched linear algebra routines can utilize the GPU efficiently.

“Achieving Performance in the Exascale Era”

Performance and Algorithms Research Group
Morning Session, Group 2

Abstract: Finding the optimal performance parameters for an application across different platforms or programming models is a challenging problem. Several Machine-Learning-based autotuners have been proposed in the past to this end, but have required sampling many configurations for training the model, which is expensive. This research proposes a search methodology based on Bayesian optimization and transfer learning that addresses performance portability issues. The proposed methodology prunes the search space to a reduced set of meaningful samples by sharing historical-searching data from other platforms. The experimental results show that our methodology outperforms other performance-portability search strategies by a significant margin, and also answers other hot-topic performance-portability questions (e.g. NVIDIA/CUDA vs AMD/HIP performance).

“DISCOS – A mesoscale method for polymers”

Applied Mathematics Group
Morning Session, Group 2

Abstract: Polymers are important and are widely used in many industrial and scientific applications. Their versatility stems from complex constituents and structures that exhibit multiple length and time scales, thus computer simulations serve as a powerful tool to understand polymers. Molecular Dynamics is among the most popular computational methods in this field. However, explicitly resolving solvent molecules is computationally expensive, making it challenging to scale the problem size to real applications, while implicit-solvent representations do not yet model full hydrodynamics. On the other hand, Brownian Dynamics can faithfully capture hydrodynamic interactions using Green’s functions, but with a tradeoff of having to invert a dense mobility matrix. We have recently developed a meso-scale fluid model, the Discrete-Ion Stochastic Continuum Overdamped Solvent (DISCOS) algorithm, and in this work, we extend it to simulate polymeric systems. In DISCOS, the solvent is modeled using fluctuating hydrodynamics, and polymers are simulated via spring-bead model by the immersed boundary method. We validate DISCOS by first simulating a single-chain polymer and comparing to Rouse and Zimm model, and then generalize to multi-chain systems such as membranes.

“Optimizing Search Layouts in Packed Memory Arrays”

Performance and Algorithms Research Group
Morning Session, Group 2

Abstract: This paper introduces Search-optimized Packed Memory Arrays (SPMAs), a
collection of data structures based on Packed Memory Arrays (PMAs) that
address suboptimal search via cache-optimized search layouts. Traditionally,
PMAs and B-trees have tradeoffs between searches/inserts and scans: B-trees
were faster for searches and inserts, while PMAs were faster for scans.

Our empirical evaluation shows that SPMAs overcome this tradeoff for unsorted
input distributions: on average, SPMAs are faster than B+-trees (a variant of
B-trees optimized for scans) on all major operations. We generated datasets
and search/insert workloads from the Yahoo! Cloud Serving Benchmark (YCSB) and found that SPMAs are about 2x faster than B+-trees regardless of the
ratio of searches to inserts. On uniform random inputs, SPMAs are on average
between 1.3x-2.3x faster than B+-trees on all operations.
Finally, we vary the amount of sortedness in the inputs to stress the
worst-case insert distribution in the PMA. We find that the worst-case B+-tree
insertion throughput is about 1.5x faster than the worst-case PMA
insertion throughput. However, the worst-case input for the PMA is sorted and
highly unlikely to appear naturally in practice. The SPMAs maintain higher
insertion throughput than the B+-tree when the input is up to 25% sorted.

“Discovering Reaction Channels with Reinforcement Learning”

Scalable Solvers Group
Morning Session, Group 2

Abstract: Reactive trajectories between metastable states are important in studying reactions, yet they are rare. This proposal provides a new method to identify the reaction channels where reactive trajectories occur frequently via reinforcement learning (RL). The action function in RL learns to seek the connective configurations of high reactive probability based on reward from simulation. Then, we characterize the reactive channels by data points sampled by shooting from the located connective configurations. These data points bridge stable states and cover most transition regions of interest, enabling us to study reaction mechanism (e.g., computing committor function and reaction rate) on narrowed regions rather than entire configuration space.

“Searching for Spin Liquids: Classical and Quantum Approaches”

Applied Computing for Scientific Discovery
Afternoon Session, Group 1

Abstract: I discuss a highly frustrated spin-1 model on the triangular lattice, with nearest- and next-nearest-neighbor antiferromagnetic S.S interactions and nearest-neighbor (S.S)^2 interactions. Using DMRG, I find three magnetically ordered phases, namely 120∘ spiral order, stripe order, and tetrahedral order, as well as two spin nematic phases: ferroquadrupolar and antiferroquadrupolar. While the data could be consistent with a spin liquid phase between the 120∘ spiral and antiferroquadrupolar orders, the more likely scenario is a direct continuous transition between these two orders.

 

 

“Faster Simulation of Computer Architecture Using Optimistic Parallel Discrete Event Simulation”

Computer Architecture Group
Afternoon Session, Group 1

Abstract: The end of Moore’s law has placed a two-fold demand on hardware simulation. Firstly,
efficient co-design requires fast simulation of hardware systems in order to vet proposed designs. Secondly, modern simulator platforms need to become increasingly concurrent as well. To address these challenges, we propose the development of an optimistic time warp-based parallel discrete simulation (PDES) backend for the Structural Simulation Toolkit (SST). Optimistic PDES can hide synchronization costs by speculatively executing tasks. Historically, optimistic PDES has not been used in hardware simulation due to its complexity and additional overhead. In this paper, we call this conventional wisdom into question by demonstrating a 2.1x to 3.7x speed-up using our optimistic SST backend versus the current SST-PDES implementation for tiled mesh network-on-chip (NOC) architectures.

“FerroX : A GPU-accelerated, 3D Phase-Field Simulation Framework for Modeling Ferroelectric Devices”

Center for Computational Sciences and Engineering
Afternoon Session, Group 1

Abstract: Fundamental understanding and design of ferroelectric materials based devices for logic-in-memory and transistor enhancement is essential to enable low switching energies and thereby revolutionary power reductions in computing. Efficient modeling and simulations can provide in-depth insights into the underlying physics, as well as pave the road to facilitate researchers with reliable design tools for new microelectronic devices. In this talk, I will present a performance-portable, 3D phase-field simulation framework, FerroX, for modeling and design of ferroelectric-based microelectronic devices. FerroX is open-source (https://github.com/AMReX-Microelectronics/FerroX) and demonstrates a significant (15x) speedup on GPU architectures compared to the CPU counterparts. I will demonstrate the application of the code with simulations of multi-domain negative capacitance effects in Metal-Ferroelectric-Insulator-Metal (MFIM) and Metal-Ferroelectric-Insulator-Semiconductor-Metal (MFISM) devices. I will also discuss our efforts towards simulations of ferroelectric field effect transistors.

“Multiscale Modeling of Carbon Nanotube Field Effect Transistors (CNTFETs) for Photodetection”

 Center for Computational Sciences and Engineering
Afternoon Session, Group 1

Abstract: The growing need for the miniaturization of transistors has led to the development of efficient field-effect transistors based on highly-conductive materials such as graphene, carbon nanotubes, and silicon nanowires. These advances have enabled technologies such as photodetectors and biosensors to have applications in astrophysics, biomedical engineering, and security. Computational modeling of these nanotransistors requires the development of a broad range of multi-scale tools, such as quantum or classical transport modules for quantifying the current and gain, depending on transistor channel length, and a scalable electrostatic module for self-consistent calculation of potential. High-frequency applications also need characterization of the surrounding circuitry, requiring a solver for fully-coupled Maxwell equations for modeling electromagnetic waves traversing through microscale transmission lines that connect these materials to IC inputs. Toward this goal, the microelectronics team at LBNL has been developing highly scalable, open-source software frameworks that are portable across platforms ranging from laptops to manycore/GPU architectures. We will discuss the recent advancements in these capabilities and how they will equip us to achieve DOE’s microelectronics goals.

“Tensor Equation Methods for Electron Correlation Energy Computation”

Scalable Solvers Group
Afternoon Session, Group 1

Abstract: In this talk, we present some tensor equation variants of Møller–Plesset method of the second order (MP2) in quantum chemistry to compute electron correlation energy. Specifically, using the structure of Kronecker products, we rewrite the target linear system into a Sylvester tensor equation, and develop various solvers based on structures of different chemical formulations. In particular, we develop a factored alternating direction implicit (fADI) method utilizing data sparsity in canonical orbital representation, and a sparsity enforcement Krylov subspace method from sparsity in localized orbital representation. We provide complexity analysis of our tensor equation solvers, and numerical results show that these methods are faster than traditional linear system solvers for realistic test cases from chemical experiments.

“Dynamic Mode Decomposition of One-Time Dynamics for Nonequilibrium Green’s Function”

Scalable Solvers Group
Afternoon Session, Group 2

Abstract: Simulating quantum many-body systems away from equilibrium is computationally challenging. To make it easier, a practical way is to examine the Green’s function based on the many-body perturbation theory. However, the Kadanoff-Baym equations (KBEs) which describe the dynamics of the two-time non-equilibrium Green’s function (NEGF) form a set of coupled nonlinear integro-differential equations difficult to solve. In fact, to propagate the system until time T, typical numerical methods will take O(T^3) computational time. To deal with this problem, I applied DMD, which is a data-driven model order reduction technique, to simulate the long-time dynamics of the NEGF by using snapshots computed within a small time window. This technique was first applied to the time-diagonal of the two-time Green’s function, and then to the off-diagonal elements by decomposing the Green’s function into a number of one-time functions. The effectiveness of DMD is demonstrated on a two-band Hubbard model system. In the equilibrium limit, the DMD analysis yields results that are consistent with those produced from a linear response analysis. In the nonequilibrium case, the extrapolated dynamics produced by DMD is more accurate than a special Fourier extrapolation scheme. A potential pitfall of the standard DMD method comes from the insufficient spatial/momentum resolution of the discretization scheme. For the model system, this problem can be overcome by using a variant of DMD known as the higher order DMD (HODMD).

“Optimizations for Distributed-Memory Parallel SGD”

Mathematics Group
Afternoon Session, Group 2

Abstract: Matrix factorization for collaborative filtering methods usually utilize parallel asynchronous stochastic gradient descent (PASGD) algorithms. Stale data usage problem of these algorithms can be avoided by introducing synchronizations at the cost of reduced scalability. We propose a PASGD algorithm which efficiently handles synchronizations per epoch in a scalable fashion. In the proposed algorithm we limit the number of synchronizations to K, where K denotes the number of processors, in such a way that stale data usage is completely avoided. Sparse nature of the input data used in collaborative methods has the potential of minimizing the stale data usage and communication volume with the use of intelligent partitioning models. We propose a hypergraph partitioning model which encapsulate reducing stale data use and communication volume while minimizing the number of synchronizations. We utilize a new recursive-bipartitioning algorithm to realize a novel cutsize metric fort his encapsulation. Experiments conducted on up to 512 processors show that proposed algorithm improve the parallel runtime and factorization error

“A New Jigsaw Puzzle Game for Quantum Many-Body Theory”

Scalable Solvers Group
Afternoon Session, Group 2

Abstract:We will present a new theoretical framework, called the combinatorial Mori-Zwanzig theory, for calculating the interactive Green’s function for quantum many-body systems. The diagrammatic method developed under this framework can be interpreted as a novel jigsaw puzzle game based on planar trees, which makes it generally different from the commonly adopted Feynman diagrams. Through the relevant discussion, we wish to bring interesting connections that bridge graphs, algebraic combinatorics, and quantum field theory.

“Applications of Online Surrogate Models to Scientific Computation”

Scalable Solvers Group
Afternoon Session, Group 2

Abstract: In Bayesian optimization of black-box function via surrogate modelings, we are faced with the challenges of high dimensionality and the selection of model and variables is an important part to solve the problem. In this talk, we will talk about the state-of-art techniques for model and variable selection in the online surrogate model using Gaussian processes.