Superfacility Framework Advances Photosynthesis Research
Integrating experimental instruments with high-speed networking and computational resources yields real-time feedback
May 2, 2019
Keri Troutman, [email protected], 510-486-5071
For more than a decade, a team of international researchers led by Berkeley Lab bioscientists has been studying Photosystem II (PSII), a protein complex in green plants, algae, and cyanobacteria that plays a crucial role in photosynthesis. They’re now moving more quickly toward an understanding of this three-billion-year-old biological system, thanks to an integrated superfacility framework of experimental instrumentation with computational and data facilities. PSII researchers working at the SLAC National Accelerator Laboratory’s Linac Coherent Light Source (LCLS) recently began using the Energy Sciences Network (ESnet) at Lawrence Berkeley National Laboratory (Berkeley Lab) to enable real-time processing of experimental data at the National Energy Research Scientific Computing Center (NERSC).
PSII is the only known biological system able to harness sunlight for the oxidation of water into molecular oxygen. Scientists have been seeking an atomic-scale understanding of how PSII splits a water molecule during photosynthesis for decades now. This key understanding would help advance the development of artificial photosynthesis, a promising source of abundant and clean energy.
To gain insight into how PSII works the research team, which is led by Berkeley Lab bioscientists Vittal Yachandra, Junko Yano, and Jan Kern, uses X-ray free electron lasers (XFELs) at LCLS to capture images of PSII throughout the stages of its reaction cycle. At the core of PSII is an oxygen evolving complex (OEC; Mn4CaO5) that, when energized by solar photons, catalyzes a four photon-step cycle of oxidation states that ultimately yields molecular oxygen. Using XFELs to study the protein complex at specific time points in between each cycle helps them understand structural changes in PSII, consequently understanding the mechanism of bond formation between two oxygen atoms.
Historically, the ability to capture these images has been hindered by the fact that most X-ray crystallography technology destroys the samples before meaningful data can be collected. Scientists need to observe X-ray diffraction of the intact Mn4CaO5 complex in action, but the molecule is highly sensitive to radiation. However, the advent of XFELs and sophisticated data processing methods have changed this; last year researchers were able to capture the most complete and highest-resolution picture to date of PSII (the results were published in Nature in 2018).
LCLS upgrades have led to faster and higher resolution imaging results, which means the computational resources for data processing have also expanded. Concurrent developments at NERSC and ESnet have moved this research to the next level. With ESnet in place between SLAC and NERSC, the PSII researchers are now running their experiments with live data analysis feedback, which allows them to use their LCLS shift time more effectively.
“The high performance, reliable data placement service over ESnet is a fundamental building block of the superfacility model,” said Eli Dart, a network engineer in the ESnet Science Engagement Group. “This architectural construct is about removing the constraints of geography from the scientific process.”
The data analysis team can now tell the researchers whether they’re getting statistically significant results from a certain sample batch almost immediately, which means more samples can be tested and the beamtime is utilized efficiently. With LCLS beamtime data collection rates now at 20-30 images/second, the researchers are collecting 60-100 GB in every 5-minute data run. Each of these runs is transferred via ESnet within one minute, so that the data analysis team can use NERSC to process it immediately and give feedback within 5-10 minutes.
“As computational staff, our responsibility is to give the PSII researchers feedback on how they are performing, because without our involvement they are just collecting data blindly,” says Asmit Bhowmick, a postdoctoral researcher in Berkeley Lab’s Molecular Biophysics and Integrated Bioimaging (MBIB) Division. “This is critical when you are trying to push the resolution of these structures.”
The LCLS data that’s processed at NERSC is used to create electron density maps, which allows the researchers to evaluate structural differences in PSII between different time points in the reaction cycle. And the higher the resolution of the electron density maps, the closer researchers get to being able to see the oxygen bonds clearly. The bond length between oxygen atoms in the relevant intermediate state is about 2 angstroms—in 2016, PSII researchers published electron density maps with 2.25 angstrom resolution, which were considered unprecedented. Now, with NERSC computing power, researchers have been able to push the resolution down to 2.05 angstroms.
“More high-quality data is what allows us to push the resolution to the next level,” says Bhowmick, who works in the laboratory of MBIB senior scientist Nicholas Sauter. “Having NERSC available to process live data is what is getting us there.”
While LCLS produces data rates of 30-120 images per second, that rate will increase 10-10,000 fold when LCLS-II—the next generation of the LCLS—comes online in 2020. ESnet and NERSC will be absolutely necessary for LCLS-II. “Even now, at LCLS, we cannot do the speed and resolution of analysis we’re able to do at NERSC; the LCLS computing infrastructure just cannot keep up,” says Bhowmick. “With ESnet and NERSC, we have hit some major milestones in the past few months.”
NERSC and SLAC are DOE Office of Science User Facilities.
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery, and researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 13 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.