HPC Innovation Excellence Award Showcases Physics-based Scientific Discovery in Large Data Sets
Project DisCo offers an alternative to deep learning for scientific discovery from data
August 16, 2019
A collaboration that includes researchers from Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center (NERSC) was recently honored with an HPC Innovation Excellence Award for their work on “Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems.” The award was presented in June by Hyperion Research during the ISC19 meeting in Frankfurt, Germany.
Adam Rupe, a PhD student at the University of California, Davis, who has been doing research at NERSC for the last three years, is applying his expertise in physics – specifically, physical principles associated with organized coherent structures – to enable unsupervised discovery of these structures in spatiotemporal (space-time) systems. Through a collaboration with UC Davis physics professor James Crutchfield, Karthik Kashinath and Prabhat of NERSC’s Data & Analytics Services group, and engineers from Intel, the team created the first distributed HPC implementation of a physics-based, data-driven technique known as local causal states. This collaboration, dubbed Project DisCo (Discovery of Coherent structures), led to the HPC Innovation Excellence Award.
With the growing data deluge in climate research, cosmology, materials science, and other science domains, new data-driven methods are required that discover and mathematically describe complex emergent phenomena, uncover the physical and causal mechanisms underlying these phenomena, and better predict these phenomena and how they evolve over time, Rupe explained.
“Local causal states, which uncover a system's spatiotemporal structure by tracking how information is processed locally through space and time, have the potential to do exactly this, and directly from unlabeled data,” Rupe said. “Computational barriers, however, have kept them from reaching this potential on real-world domain science problems.”
As with other data-driven methods, there has long been a disconnect between theoretical development and practical application. DisCo bridges this gap by developing an optimized, distributed HPC implementation of local causal state reconstruction written entirely in Python using standard libraries. “And because it is written in Python, the code can be easily updated as theoretical development continues, giving domain scientists an easy-to-use, high-level API interface to this tool,” Rupe said.
“The applications are wide-ranging,” Kashinath added. “From discovering vortices in turbulent flows to extreme weather events in large climate data sets, DisCo is designed to be a tool that scientists can use to understand the underlying physical processes that drive a system.” In addition, DisCo has predictive capability; the local causal states encode a minimal description of the dynamics of the system.
In the work that won the award, DisCo was applied to CAM5.1 climate simulation data on 1,024 Intel Haswell nodes on NERSC’s Cori supercomputer, processing 89.5 TB of data in 6.6 minutes end-to-end. In addition, they obtained 91% weak scaling and 64% strong scaling efficiency. The result was a state-of-the-art segmentation of coherent spatiotemporal structures in complex nonlinear turbulent flows from both observational and simulated data.
“Supervised machine learning and deep learning work well only if you have labeled data and you are confident that the labels accurately represent some ground truth. But for these kinds of scientific data sets we essentially do not have ground truth,” Rupe said. “This is why there has been keen interest in unsupervised physics-informed discovery approaches like DisCo.”
Hyperion Research’s HPC Innovation Excellence Awards are presented twice a year, at the ISC conference in June and the Supercomputing Conference in November. The program’s main goals are to help other users understand the benefits of adopting HPC and justify HPC investments, to demonstrate the value of HPC to funding bodies, to expand public support for increased HPC investments, and to showcase return on investment and scientific success stories involving HPC.
NERSC is a DOE Office of Science User Facility.
To learn more about DisCo, visit the DisCo YouTube channel.
Topology, Physics & Machine Learning Take on Climate Research Data Challenges
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.