NERSC Summer Student Puts MPI Under the Microscope
October 25, 2022
By Elizabeth Ball
The supercomputers at the National Energy Research Scientific Computing Center (NERSC) support all kinds of research across the scientific spectrum, but sometimes, they’re also the subject of research in their own right, or part of the question to be answered. This summer, Muna Tageldin, a Ph.D. candidate in electrical and computer engineering at Marquette University, collaborated with NERSC staff as part of the Berkeley Lab Computing Sciences Summer Program, developing a microbenchmark to analyze variances in message-passing interface (MPI) performance on NERSC systems and looking for the best statistical methods to characterize the results.
MPI is a standard protocol commonly used by parallel applications to send and receive data over high-speed internal networks on supercomputers, and analyzing its performance is essential for understanding applications’ scalability and portability. Tageldin worked with NERSC application performance specialists Kevin Gott and Brandon Cook, and NERSC User Engagement Group lead Rebecca Hartman-Baker to investigate some complexities of MPI performance.
“We find that the MPI all-to-all collective performance measurements form a multimodal distribution on a system running a production workload,” said Tageldin. “My project is understanding MPI performance variation and also finding statistical tests that can correctly describe this variation.”
According to Tageldin, frequently used summary statistics like minimum, maximum, median, and mean don’t always capture the intrinsic characteristics of MPI performance variations that can have quite complex forms. To find statistical methods that more accurately characterize the distributions of run times, she spent her summer recording collective MPI_Alltoall measurements on both NERSC systems, Cori and Perlmutter, using the microbenchmark she developed. Her process included iteratively adjusting MPI parameters like the size of message transmitted and communication size (the number of processors involved in the communication). She found that multimodal distributions were common across many configurations, and she worked to find statistical methods like time domain features that can describe the variance in MPI performance data. Though summer has come to an end, her goal is to analyze multimodal distributions using statistics and associate those distributions with different MPI configurations, work that may continue at NERSC even after she’s returned to Marquette.
Tageldin says the Summer Student Program has been an opportunity to branch out, explore, and gain experience working on topics slightly outside of her primary area of research. “My dissertation is on analyzing high performance computing (HPC) systems performance using probabilistic models, ” she said. “And in this internship, I’m tackling HPC performance from a statistics and coding perspective. It’s kind of interesting because MPI is an area I haven’t delved into in detail, and now we’re doing detailed performance analysis.”
Additionally, she notes that her probabilistic modeling and statistics background is turning out to be more connected to hands-on research than she previously expected, and she hopes to leverage that link in the future. With the summer over, she’ll finish her dissertation and consider how she wants to apply her skills and experience, possibly in a research career.
Tageldin’s experience in the Summer Student Program wasn’t just a benefit to her; according to NERSC staff, working with Tageldin and students like her is an infusion of energy and new ways of thinking about their work.
“The summer student program has been an amazing experience, providing me with fresh perspectives and new insights into the research being done at the Lab and its place in the broader scientific community,” said Gott, who served as Tageldin’s mentor this summer. “My summer interns have added a lot to my understanding of HPC today. I hope they've learned as much here as I have from them.”
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.