A-Z Index | Directory | Careers

Analysis and Visualization

As computational scientists are confronted with increasingly massive datasets from supercomputing simulations and experiments, one of the biggest assets is having the right data analysis and visualization tools to gain scientific insight from the data. Toward this end, Berkeley Lab researchers are collaborating with application scientists to identify and address challenging, large-scale, data-rich problems emerging from simulations, experiments, and observations. Together they are using theoretical and applied research to conceive, design, and implement new methods in high-performance machine learning, data and image analytics, computational geometry and topology, and visualization technologies.

Projects

Topological Cacti

Current visualization techniques provide either structural or quantitative information based on topological analysis over a data set. Contours, the connected components of level sets, play an important role in understanding the global structure of a scalar field. In particular, their nesting behavior and topology – often represented in the form of a contour – has been used extensively for visualization and analysis. We introduced a new visual metaphor for contour trees, called topological cacti [1], that extends the traditional toporrery display of a contour tree to display additional quantitative information as the width of the cactus trunk and the length of its spikes. Contact: Gunther Weber

MetroMaps: Map-based Representations for Analyzing Optimization Solution Spaces

Understanding the solution space plays an important role in a wide range of optimization applications. For example, finding solutions to many urgent problems (e.g., pollution and global warning) requires novel insights into chemical systems and processes. It is possible to gain these insights through computational modeling and analysis of the system. However, there is a lack of effective visualization techniques for multi-dimensional functions, and chemists often study only one or two parameters of interest at a time. MetroMaps was developed to address these challenges by allowing chemists to gain insight into complex chemical systems and understand higher dimensional energy functions. Contacts: Maciej Haranczyk, Gunther Weber

Dionysus

Dionysus is a library for computing persistent homology. It implements an extensive collection of algorithms used in topological data analysis. Contact: Dimitriy Morozov

DIY

DIY is a block-parallel library for implementing scalable algorithms that can execute both in-core and out-of-core. The same program can be executed with one or more threads per MPI process, seamlessly combining distributed-memory message passing with shared-memory thread parallelism. The abstraction enabling these capabilities is block parallelism; blocks and their message queues are mapped onto processing elements (MPI processes or threads) and are migrated between memory and storage by the DIY runtime. Complex communication patterns, including neighbor exchange, merge reduction, swap reduction, and all-to-all exchange, are possible in- and out-of-core in DIY. Contact: Dimitriy Morozov

Reeber

A library for shared- and distributed-memory parallel computation of merge trees. Contact: Dimitriy Morozov

Characterizing Possible Failure Modes in Physics-Informed Neural Networks

Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial problems, they can easily fail to learn relevant physical phenomena for even slightly more complex problems. In particular, we analyze several distinct situations of widespread physical interest, including learning differential equations with convection, reaction, and diffusion operators. We provide evidence that the soft regularization in PINNs, which involves PDE-based differential operators, can introduce a number of subtle problems, including making the problem more ill-conditioned. Importantly, we show that these possible failure modes are not due to the lack of expressivity in the NN architecture but that the PINN's setup makes the loss landscape very hard to optimize. We then describe two promising solutions to address these failure modes. The first approach is to use curriculum regularization, where the PINN's loss term starts from a simple PDE regularization and becomes progressively more complex as the NN gets trained. The second approach is to pose the problem as a sequence-to-sequence learning task rather than learning to predict the entire space-time at once. Extensive testing shows that we can achieve up to 1-2 orders of magnitude lower error with these methods as compared to regular PINN training. Contact: Michael Mahoney

ImageXD

Image processing across domains or ImageXD is an initiative centered at essential image transformations, analysis, and measurements necessary to understand scientific images across scientific domains. Co-founded by Ushizima while a data scientist fellow at BIDS, UC Berkeley, this has been an important venue (since 2016) to discuss new package developments and for featuring innovative modules for image analysis. Contact: Dani Ushizima

Python vision for MicroCT

This project aims to support new and current users at the LBNL Advanced Light Sources who require automated methods coded in Jupyter notebooks, running through specialized entry points with all required installations, and readily available at NERSC. This means docker plus shifter plus python software stack ready to run analytics and visualization. These notebooks provide code to quickly explain how to handle 2D and 3D image representations using NumPy, how to perform key image transformations, such as multiscale pyramids with SCIpy and skimage, as well as visualization schemes using matplotlib, plotly, itkwidgets, and other python libraries used into the construction of scientific pipelines and workflows. Contact: Dani Ushizima

Adaptive Self-Supervision Algorithms for Physics-informed Neural Networks

Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adapting the location of the collocation points as training proceeds. Specifically, we propose a novel adaptive collocation scheme that progressively allocates more collocation points (without increasing their number) to areas where the model is making higher errors (based on the gradient of the loss function in the domain). This, coupled with a judicious restarting of the training during any optimization stalls (by simply resampling the collocation points in order to adjust the loss landscape) leads to better estimates for the prediction error. We present results for several problems, including a 2D Poisson and diffusion-advection system with different forcing functions. We find that training vanilla PINNs for these problems can result in up to 70% prediction error in the solution, especially in the regime of low collocation points. In contrast, our adaptive schemes can achieve up to an order of magnitude smaller error, with similar computational complexity as the baseline. Furthermore, we find that the adaptive methods consistently perform on par or slightly better than a vanilla PINN method, even for large collocation point regimes. The code for all the experiments has been open-sourced. Contact: Michael Mahoney

Large-Scale, Self-driving WAN Network (also called DAPHNE)

Starting with ESnet wide area network (WAN), we investigate controllers and WAN traffic statistics from science experiments to understand WAN traffic delivery challenges such as long-living flows, TCP performance issues, and underutilized resources. This project uses artificial intelligence (AI) combined with network controllers to support complex end-to-end network connectivity within ESnet. Contact: Mariam Kiran (Kiran on the Web)

Large-scale, Self-driving 5G Network for Science

As science expands beyond the laboratory walls, we are investigating intelligent edge-to-core connectivity to connect wireless, 5G, and beyond to ESnet networks in a self-autonomous manner. This project uses artificial intelligence (AI) combined with network virtualization to support complex end-to-end network connectivity – from edge 5G sensors to supercomputing facilities like the National Energy Research Scientific Computing Center (NERSC). Contact: Mariam Kiran (Kiran on the Web)

Intelligent Automation of Network: Wireless, 5G, and Satellites

As science uses complex edge hardware, there is a need for end-to-end connectivity of edge devices to central ESnet controllers. In this project, we are exploring intent APIs that help connect Open RAN and 5G MEC to the SENSE controller, enabling end-to-end seamless connectivity to ESnet. Contact: Mariam Kiran (Kiran on the Web)

POSEIDON: Intelligent science workflows

PosEiDon aims to advance the knowledge of how simulation and machine learning (ML) methodologies can be harnessed and amplified to improve DOE’s computational and data science. PosEiDon will explore the use of simulation, ML, and hybrid methods to predict, understand, and optimize the behavior of complex DOE science workflows (simulation, instrument data analysis, ML, and superfacility) on the production of DOE computational and data infrastructure (CDI). The solutions will be developed based on data collected from DOE and NSF testbeds and validated and refined in production CDI. Contact: Mariam Kiran (Kiran on the Web)

News

Neuroscience Simulations at NERSC Shed Light on Origins of Human Brain Recordings

July 11, 2022

Using simulations run at NERSC, a team of researchers at Berkeley Lab has found the origin of cortical surface electrical signals in the brain and discovered why the signals originate where they do. Read More »

SENSEI Showcased at SC18

November 7, 2018

Berkeley Lab scientists at SC18 are showcasing SENSEI, a lightweight software infrastructure that enables simulations to make use of a wide array of popular in situ analysis and visualization packages. Read More »

SRP Paves the Way for DOE Early Career Award

July 27, 2022

Professor Tanzima Islam credits her participation in the Sustainable Research Pathways at Berkeley Lab program with paving the way for her DOE Early Career Award. Read More »

Saye and Sethian’s Bubble Visualization Honored by 2013 Visualization Challenge

February 7, 2014

A visualization created by Berkeley Lab mathematicians Robert Saye and James Sethian of soap bubbles bursting and reforming has won honorable mention in the 2013 International Science and Engineering Visualization Challenge, sponsored by Science magazine and the National Science Foundation. Read More »

Supercomputers Capture Turbulence in the Solar Wind

December 16, 2013

With help from Berkeley Lab's visualization experts and NERSC supercomputers, astrophysicists can now study turbulence in unprecedented detail, and the results may hold clues about some of the processes that lead to destructive space weather events. Read More »