Building on Berkeley Lab’s rich tradition of team science, we combine deep expertise in mathematics, statistics, computing, and data sciences with a wide range of scientific disciplines to drive AI innovation. We collaborate with researchers across domains to generate, process, and curate vast datasets, which serve as critical resources for scientific discovery.

In parallel, we develop and deploy cutting-edge AI models and tools. By adapting existing machine learning methods and creating new ones, we address the diverse, evolving needs of scientific research, answering fundamental questions and enabling breakthroughs across various fields. These AI models are applied to solve complex scientific problems and accelerate discovery in key research areas.

Our efforts are supported by robust high-performance computing and networking infrastructure. We operate the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) and the Energy Sciences Network (ESnet), which provide the computational power, data storage, and remote access necessary for AI training, large-scale simulations, and collaborative research at distant experimental facilities.

Our Research Pillars:

  • New Models and Methods
  • Learning for Scientific Discovery
  • Supercomputing-Scale AI
  • Secure Machine Learning & ML for Security
  • Statistics and Applied Mathematics
  • Data and Science Informed Learning
An artistic illustration of a mixture of Gaussian processes and a light or particle beam passing through. The image represents the inner workings of the algorithm inside gpCAM, a software tool developed by researchers at Berkeley Lab's CAMERA facility to facilitate autonomous scientific discovery. The illustration features a glowing white beam intersecting a surface with multiple peaks and valleys, symbolizing the complex mathematical computations involved.

gpCAM is an open-source software tool that uses artificial intelligence to help scientists automatically plan, collect, and analyze data from experiments and simulations, making research faster and more efficient.

A small green plant growing inside a transparent, rectangular plastic chamber with metal fasteners, placed on a black background.

A deep learning system that automates the analysis of plant root images, providing researchers with fast, precise insights to advance agricultural and environmental science.

Screenshot of the pyCBIR deep-learning tool interface displaying a grid of colorful X-ray scattering images from the Advanced Light Source. The left panel shows options for selecting image analysis and retrieval methods, while the main area displays multiple heatmap-like images matched from the database, helping researchers find similar patterns.

A deep learning-based image search tool that allows scientists to find and compare scientific images across large datasets, enhancing data exploration and discovery.

Screenshot of the pyCBIR deep-learning tool interface displaying a grid of colorful X-ray scattering images from the Advanced Light Source. The left panel shows options for selecting image analysis and retrieval methods, while the main area displays multiple heatmap-like images matched from the database, helping researchers find similar patterns.

CAMERA develops and maintains a suite of open-source AI and machine learning software—including pyMSDtorch, MSDNet, and other advanced tools—that enable autonomous experimentation, advanced data analysis, and image reconstruction across scientific domains.

A digital background featuring circular, futuristic blue interface elements and geometric patterns along the left and right edges, with a large solid blue rectangle in the center.

The ENDURABLE project develops advanced tools for robust data aggregation and deep learning model training, making it easier for scientists to build complex machine learning datasets and harness the full potential of AI for scientific discovery.

Screenshot of the pyCBIR deep-learning tool interface displaying a grid of colorful X-ray scattering images from the Advanced Light Source. The left panel shows options for selecting image analysis and retrieval methods, while the main area displays multiple heatmap-like images matched from the database, helping researchers find similar patterns.

FunFact is a Python package that automates matrix and tensor factorization algorithms, supporting applications such as neural network compression and quantum circuit synthesis, and is built on modern machine learning frameworks for scalable, flexible scientific computing.

Visualization of ion collisions captured by BNL’s STAR detector. Multicolored lines trace particle tracks, created by quarks and gluons from the collision. GPTune-enhanced collisions produce more particles, offering scientists a deeper look into subatomic behavior. The image's radiant lines form a bullseye pattern against a black background, highlighting the intricate paths of the particles.

GPTune is an AI-driven autotuning tool that leverages Bayesian optimization and transfer learning to efficiently solve complex black-box optimization problems in scientific computing.

Close-up image of a GPU (graphics processing unit) chip mounted on a circuit board, showing the intricate metallic and silicon components and connections.

GraphDot is a GPU-accelerated library for machine learning on graphs, enabling fast and customizable graph similarity computations using advanced kernel methods and graph convolution techniques for scientific data analysis.

A stylized network diagram with interconnected circles of various sizes and colors on a blue background, representing nodes and connections in a graph or data network.

Scalable Graph Learning for Scientific Discovery develops efficient, distributed-memory algorithms and methods to enable large-scale, memory-efficient graph representation learning (GRL) and hypergraph learning for advancing research in fields such as structural biology, computational chemistry, and particle physics.

A close-up, stylized image of a glowing electronic circuit diagram, with bright yellow and orange lines and symbols representing interconnected components on a dark background, evoking the complexity and energy of advanced microchip or processor design.

This project investigates the unique performance characteristics of AI training and inference compared to traditional HPC applications, focusing on how computational methods, precision, and specialized hardware architectures impact efficiency and scalability. By analyzing the interactions between scientific workloads, AI frameworks, and hardware, the project aims to identify bottlenecks and guide the optimization of current and future AI systems.

3D molecular visualization showing a cluster of densely packed, interconnected molecules represented as green and yellow spheres, arranged in a complex, irregular pattern against a light blue background.

pyAMReX provides GPU-enabled Python bindings for the AMReX mesh-refinement framework, allowing seamless integration of AI and machine learning models with large-scale scientific simulations for in situ analysis and rapid prototyping.

Simulation image of low-Mach number hydrodynamics in stellar hydrostatic flows, showing turbulent, cloud-like structures in blue and orange hues against a black background, representing fluid dynamics and thermal diffusion modeled by MAESTROeX.

AI-Accelerated Astrophysical Reaction Networks use machine learning to replace computationally intensive reaction network solvers in astrophysical simulations, dramatically speeding up calculations while maintaining scientific accuracy.

Visualization of a computational simulation of compressible (radiation) hydrodynamics with self-gravity. The image shows a colorful heatmap with a bright yellow and green central region surrounded by blue and purple, overlaid with both coarse white gridlines and a finer, irregular black grid, illustrating adaptive mesh refinement used in advanced hydrodynamics modeling.

Data-Driven Modeling of Complex Fluids combines machine learning with multiscale modeling frameworks to predict the behavior of complex, microstructural fluids, enhancing the accuracy and efficiency of simulations.

Diagram showing three side-by-side chat panels illustrating MatterChat, an AI tool for predicting material properties. Each panel displays a material structure image at the top, followed by a series of user questions and MatterChat’s accurate responses about chemical formula, space group, stability, bandgap, magnetic order, and energy metrics. The panels demonstrate MatterChat’s ability to interpret and answer diverse materials science queries using both text and graph-based data.

The ManyChat multimodal large language model for materials discovery develops advanced language models tailored for materials science, integrating diverse data types to accelerate materials analysis, prediction, and design.

Digital simulation generated with the WarpX code depicts blue and yellow shockwaves against a black background.

CCSE integrates cutting-edge AI with domain science, enhancing modeling, prediction, and data-driven exploration in complex scientific systems.

INDIE logo featuring modern, bold lettering with interconnected nodes and lines symbolizing wireless networks and data flow, using a blue and teal color palette to represent advanced technology and connectivity.

INDIE (Intelligent Distribution for Advanced Wireless Networks with Scientific Data Microservices) is a software platform that provides intelligent data management, computational task composition, and workflow coordination for scientific applications across advanced wireless networks like 5G and 6G.

VIAS project logo featuring bold, modern lettering and a graphic element combining stacked microchips and an AI neural network motif, symbolizing vertically integrated circuits and artificial intelligence for advanced sensing and high-performance computing, in blue and silver tones.

The Vertically Integrated Artificial Intelligence for Sensing and High-Performance Computing (VIAS) project develops next-generation detectors that use vertically integrated, chiplet-based circuits and advanced AI hardware to enable fast, efficient data analysis in challenging environments.

Computing Chemistry Color Geometric Shape

Berkeley Lab collaborates with Meta to co-lead the development of Open Molecules 2025, an unprecedented dataset of over 100 million 3D molecular simulations that aims to revolutionize machine learning tools for accurately modeling complex chemical reactions, thereby transforming research in materials science, biology, and energy technologies.

AMCR’s Silvia Crevelli serves as the principal investigator of a project that develops scalable AI methods in collaboration with the VA, leveraging billions of clinical records to create advanced clinical risk models that improve precision medicine and public health outcomes, including suicide risk prediction and treatment efficacy in lung cancer, by integrating structured data from Electronic Health Records and utilizing large language models for comprehensive analysis.

Computer visualization of the 2020 hurricane season. Visualization of a Camera autonomous experiment. Photo of a researcher standing at the base of the AmeriFlux experiment tower, which stands alone in a field of green grass against a blue sky with clouds.
Last edited: December 22, 2025