DL4Sci Puts New Data Analysis Tools into Researchers' Hands

July 29, 2019

Mustafa Mustafa, a machine learning engineer at NERSC, led the Berkeley Lab team that organized and participated in the DL4Sci summer school workshop.

Deep learning is enjoying unprecedented success in commercial applications and is finding growing appreciation in the scientific community as well, as machine-learning tools increasingly help scientists contend with some of the most challenging data analytics problems across multiple domains. For example, extreme weather events pose great potential risk on the ecosystem, infrastructure, and human health, and analyzing extreme weather data from satellites and weather stations and characterizing changes in extremes in simulations is increasingly important. Similarly, upcoming astronomical sky surveys will obtain measurements of tens of billions of galaxies, enabling precision measurements of the parameters that describe the nature of dark energy.

To address these challenges, a growing number of scientists are employing HPC systems and machine learning tools to analyze increasingly large data sets. To support their efforts, Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC) hosted its first Deep Learning for Science School (DL4Sci) July 15-19 at Lawrence Berkeley National Laboratory. More than 175 scientists and students from DOE labs and university research groups participated in the week-long workshop, which was inspired by the popularity of a number of related events that have taken place in the past year, including the ML4Sci Workshop, the Deep Learning Scaling tutorials at SC18, the ECP All Hands Meeting, and targeted sessions at CUG 19 and ISC 19.

“Deep learning provides powerful tools for scientific applications and interest is growing rapidly, but there remains a substantial knowledge gap in the community,” said Steven Farrell, a machine learning engineer at NERSC who helped organize the event. “Events like the DL4Sci school are needed to help educate scientists about the capabilities and practicalities of using these new tools."

A range of topics were covered in-depth over the school’s five days, including neural networks and neural network training, deep learning reproducibility, fairness and ethics in machine learning, sequential and generative models, deep learning for molecular engineering and quantum chemistry, hyperparameters optimization, feature-wise transformation, object detection and image segmentation, and geometric deep learning. (You can find slides from the talks and links to tutorials here; videos of the sessions are available here.) There was also a poster session where many of the attendees presented their work on applying machine learning to different scientific domains.

“We realized there is a critical need for a venue that focuses on topics in deep learning that are most relevant to these domain sciences, and we hoped that such a venue would give an overview of these methods and the practical issues that arise during building and training these models, while helping domain scientists connect with machine learning experts,” said Mustafa Mustafa, a machine learning engineer at NERSC who led the team that organized and participated in the summer school.

In addition to Mustafa and Farrell, the team included NERSC’s Wahid Bhimji, Becci Totzke, Seleste Rodriguez, Mariam Kiran, Prabhat, and Katie Antypas; and Michela Paganini of Facebook AI Research.

"Many of the open challenges that domain scientists are currently facing will be tackled in the coming years using AI,” Paganini said. “To build successful, heterogenous teams of scientists and machine learning practitioners, it is important to work towards closing the knowledge and jargon gap that separates the two communities. This school brought together some of the top machine learning researchers from both industry and academia and a new generation of scientists that will revolutionize the future of their fields with AI."

“We are grateful for the generosity of the speakers who prepared highly refined lectures,” Mustafa added. “Every speaker was a top-notch expert in their field who elucidated cutting-edge research to an audience of diverse scientific backgrounds.”

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.