The methods and applications that are driving transformative change in science through AI require vast computing resources to train and optimize ever bigger models, and couple these with existing large-scale scientific simulations and data pipelines.
To do this we can exploit supercomputing expertise and resources developed in the DOE open-science community over many years, but doing so also requires considerable new development in scalable algorithms, software, AI hardware design, system deployment, and benchmarking.
Berkeley Lab hosts NERSC, the mission supercomputing center for DOE open-science. NERSC’s latest system Perlmutter is a world-leading AI supercomputer, with over 6,000 A100 GPUs, high-performance file systems and networking, and optimized software for AI in science. NERSC is also planning for its next system “NERSC-10”, and beyond, working with vendors to ensure these systems are optimized to build AI into complex scientific workflows.
Our researchers also develop scalable libraries for AI, as well as software tooling to exploit these large-scale computing resources, and are heavily involved in industry-wide benchmarking. We also work to empower the science community for AI at supercomputing scale through outreach and training.
The Perlmutter system is a world-leading AI supercomputer consisting of over 6,000 Nvidia A100 GPUs, an all-flash filesystem, and a novel high-speed network. The National Energy Research Scientific Computing Center (NERSC) at Berkeley Lab also works closely with vendors to ensure optimized software for AI at large computing scale, as well as consulting, joint projects, and training to enable the community to exploit these resources. Contact: Steven Farrell
MLPerf HPC is a machine learning performance benchmark suite for scientific ML workloads on large supercomputers. It measures the time to train deep learning models on massive scientific datasets as well as full system scale throughput for training many models concurrently. MLPerf HPC has had two successful annual submission rounds featuring results on systems around the world, including the Perlmutter system at NERSC. Contact: Steven Farrell
The goal of this project is to investigate using machine learning techniques to generate automated metadata that will enable search on data. Contact: Lavanya Ramakrishnan
FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at high resolution. FourCastNet accurately predicts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It can generate forecasts with extreme computational savings compared to standard numerical weather prediction models. It has important implications for planning wind energy resources and predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. Contact: Shashank Subramanian
As supercomputers become ever more capable in their march toward exascale levels of performance, scientists can run increasingly detailed and accurate simulations to study problems ranging from cleaner combustion to the nature of the universe. The challenge is that these powerful simulations are “computationally expensive,” consuming 10 to 50 million CPU hours for a single simulation. The ExaLearn project aims to develop new tools to help scientists overcome this challenge by applying machine learning to very large experimental datasets and simulations. Contact: Peter Nugent (Nugent on the Web)
Researchers from Berkeley Lab, Caltech, and NVIDIA trained the Fourier Neural Operator deep learning model to emulate atmospheric dynamics and provide high-fidelity extreme weather predictions across the globe a full five days in advance. Read More »
NERSC today formally unveiled the first phase of its next-generation supercomputer, Perlmutter, at a virtual event that included government dignitaries, industry leaders, and Dr. Perlmutter himself. Read More »
ExaLearn is a new machine learning project supported by DOE’s Exascale Computing Project that is developing new tools to help scientists deal with massive experimental datasets and simulations through the use of machine learning. Read More »