Imagine a worldwide network of experimental facilities and computing centers, connected by a dedicated high-speed network specifically for science – an integrated and automated system for gathering scientific data, transporting it anywhere in the blink of an eye, and analyzing it in real time. Research teams could verify their data during experiments and make informed decisions in the moment. Analysis of massive datasets would take minutes, not days or weeks. The pace of scientific discovery would accelerate. This is the promise of the superfacility model, and it’s happening now, with Lawrence Berkeley National Laboratory (Berkeley Lab) leading the charge.
Superfacility is a conceptual model of seamless connection between experimental facilities and high performance computing resources, though it will come to fruition through physical infrastructure such as light sources, telescopes, and microscopes; computing and data centers; and high-speed networks. Primarily, bringing this new, connected future into being requires new workflows, technology tools, and ways of thinking about the ecosystem of science facilities. Staff at Berkeley Lab are working to standardize, automate, and scale up those processes at Berkeley Lab and, through collaboration, across the U.S. Department of Energy (DOE) and beyond.
Standing up the Superfacility
Famous for its history of innovation through collaboration, Berkeley Lab is a natural starting point for putting the superfacility model into practice. In addition to the Energy Sciences Network (ESnet) used to transport data and systems at the National Energy Research Scientific Computing Center (NERSC) for analysis and simulation, it’s home to experimental facilities like the Advanced Light Source (ALS) and the Joint Genome Institute (JGI) – all the makings of collaboration between institutions onsite. Engineers at NERSC and ESnet connected experimental facilities to high performance computing for individual experiments long before the term “superfacility” was coined. More recently, they’ve begun to standardize and expand those connections.
In 2019, Berkeley Lab began the three-year Berkeley Lab Superfacility Project, an initiative to align Berkeley Lab efforts with DOE Office of Science research goals, identify needs going forward, enable new capabilities, and lay the groundwork for ongoing superfacility engagements. Team members identified possible projects that might benefit from superfacility concepts and tools and worked with science teams to understand their needs and help with implementation. Facilities included in the project stretched geographically from South Korea to the Bay Area to South Dakota to Chile and included light sources, telescopes, microscopes, nuclear fusion reactors, and a genomics facility. By the end of that initial project in late 2021, five Superfacility Project science engagements were able to consistently use the superfacility setup in their work, transferring and analyzing large amounts of data without routine human intervention. Others made measurable progress toward that goal. The results of the project can be found in the Superfacility Project Report, released in 2022.
Along with experimental results from the Superfacility Project comes another form of data: the understanding that comes with experience. Science teams figured out how to take advantage of the integration of systems that is part of the superfacility, while project organizers learned to optimize those systems for day-to-day use, from the 30,000-foot view down to the granular details of user experience.
“I think the big success of this project is the mutual learning – taking the expertise of a compute facility and really getting engaged with all the expertise of the skilled researchers developing these scientific workflows,” said NERSC computer systems group lead and Superfacility Project deputy lead Cory Snavely. “We’ve been really talking at a deeper level and collaborating to come up with ideas and make sure that they’re practical and easy to use.”
Opening up the Landscape
For science teams, superfacility expands what’s possible, offering access to compute resources beyond their local systems and making space for collaboration.
One early superfacility partner with Berkeley Lab is the Linac Coherent Light Source (LCLS) across San Francisco Bay at the Stanford Linear Accelerator (SLAC). As far back as 2016, researchers working at LCLS have transported large and complex datasets to NERSC and back via ESnet on an ad hoc basis. That partnership has only blossomed.
“It’s really broadened our perspective quite a lot because it’s opened up the landscape,” said Jana Thayer, director of the data division at LCLS. “In the past, experiments have been this local thing, where all of the computing sits right next to the beam line and the data comes in, it gets analyzed, it gets churned out, and the data itself never really leaves. But with the superfacility, through ESnet, you can connect all of the light sources and other facilities, NERSC included. It enables a lot of new features that we wouldn’t have considered if we had stayed local.”
Those capabilities include automation and integration between systems. According to Thayer, the change has been transformative: Automated workflows and the speed of ESnet reduce the data analysis turnaround from days-weeks-months to seconds-minutes-hours, allowing researchers to verify their data and make informed decisions midstream, drastically speeding up the pace of scientific discovery.
And it will only become more so: LCLS currently operates at 120 pulses per second, but coming upgrades will bring that number up to one million pulses per second, dramatically increasing the amount of data collected. Currently, about 5% of LCLS user projects require more computing resources than LCLS can provide locally, making them good candidates for using ESnet to send their data to NERSC and potentially other computing centers for analysis. As more experiments capture these massive amounts of data, the demand for superfacility is sure to grow as well.
Connecting Through Federated Identity
As the Superfacility Project progressed and the needs of science teams became clear, NERSC staff developed and implemented specific pieces of software infrastructure to ensure that connected projects run smoothly. Among those innovations was a pilot federated identity program that allows NERSC users at peer DOE facilities to log in using their local institution login page, offering easier access to the compute resources they need and allowing automation across platforms.
Getting federated identity up and running with the proper balance of effectiveness and security presented both technical and policy challenges. “Building a federated identity system involves a network of trust where our systems honor another institution’s authentication process,” said Snavely, whose team implemented the underlying authentication systems for the pilot program. “Luckily, these trust networks and technologies exist, so much of the groundwork was already established.”
NERSC’s federated identity pilot leverages the InCommon Federation, a third-party organization that authenticates user and institutional identities for education and research purposes cryptographically and through a communication process. InCommon uses the Security Assertion Markup Language (SAML), a protocol that passes authentication information between an identity provider and a web application. Key to NERSC’s participation is a set of baseline security practices – for one, institutions connecting with NERSC through InCommon must use multi-factor authentication or be subject to NERSC’s own additional authentication factor. Authentication must also be accompanied by contact information for the institution’s security team, so that they can be contacted if something is amiss.
Overall, federated identity seems to be a win for facilities and for users as well: “It’s an increase in convenience, it’s a more standards-based approach to distributed workflows, and there’s greater security as well,” said Snavely.
Coordinating Through API
In addition to the federated identity pilot, NERSC also introduced a new application programming interface (API) to manage compute services, facilitate automation, and make project information accessible to users.
An API consolidates HPC services in one interface where users can see and access them as they would any other website: they can adjust experimental parameters and submit jobs, monitor the job’s status, and access the results, all in one place.
To build the Superfacility API, engineers at NERSC built a front end based on industry standards like the OAuth authentication protocol and REST architecture, so it can be used with toolsets across contexts – a step toward use across institutions and workflows. The Superfacility API went into service in 2021 and has been adopted by users from over 40 science teams, with more coming on board all the time. The API has handled over 7 million requests in 2022.
The current iteration of the NERSC API is just the beginning; NERSC staff continue to work to make it more powerful and more flexible. One coming upgrade includes changing the interface to allow customizable uses, which will give more users the opportunity to try the API while increasing overall security at NERSC.
“We’re going to allow wider access so more users can use it in its current form,” said NERSC engineer Bjoern Enders, who helped develop the NERSC API and continues to refine it. “A security review will be available for a smaller subset of people who need one- to 30-day read-write-execute access, like those who manage workflows for large institutions and ongoing research projects.
In addition to making those changes, NERSC staff will also help teams still using the previous system make the switch to the new, more standardized API. And finally, NERSC staff are working with other institutions to build a common API that can be replicated at other facilities to help researchers operate their workflows more easily at different facilities.
“The more people adopt a standard API, the more powerful the interface becomes,” said Enders. “Even if it’s not the same user group, just having something that’s the same always helps.”
The Future of Superfacility is the Future of Science
With the initial Superfacility Project now complete, many involved are considering where things go from here. It’s increasingly acknowledged that the superfacility model of interconnected science workflow is the future of data collection and analysis, but there is still work to be done.
“It’s not super easy yet,” said NERSC data science engagement group lead Debbie Bard, who spearheaded the Superfacility Project. “We’re not yet at a place where you push a button and it all just works. But we’ve made huge progress in making it even feasible to design and implement these automated workflows. And that was really only possible because we had this level of coordination between all the work that lots of individuals were doing.”
Superfacility Day Brings Stakeholders Together |
On October 19, 2022, the Superfacility Working Group hosted users and stakeholders for a Superfacility Day gathering to celebrate progress and consider next steps, including more detailed service statistics, streamlined access to NERSC, and increased outreach and engagement. |
At Berkeley Lab, superfacility work continues under the Superfacility Working Group now focused on improving integration and automation for a seamless and more efficient user experience. Upgrades to the NERSC API and federated ID will come with time, and planning for NERSC-10, the upcoming supercomputer to follow Perlmutter, has already begun. Due to come online in 2025, it will be conceived and built with superfacility in mind.
The superfacility model will also be increasingly essential as two important trends in data-driven science coincide. The newest instruments at the ALS, JGI, LCLS, the Lux Zeplin Dark Matter Experiment (LZ), the Dark Energy Science Collaboration (DESC), and other instrument facilities are steadily producing more data as telescopes, light sources, microscopes, and other massive detectors are upgraded with higher precision and resolution. Meanwhile, exascale computing – compute systems performing at least one quintillion (1018) operations per second – is becoming a reality. Science teams at these instrument facilities conventionally perform computation on-site but with greater data volumes, and they increasingly require seamless, performant integration with exascale-class computing facilities. Part of that seamlessness is made possible by ESnet – and with the unveiling of ESnet6 in 2022, which brings 400Gbps to 11Tbps bandwidth and the capacity to transfer massive amounts of data from instruments to supercomputing sites, the future of the high-speed network has recently become much closer.
“There’s one set of workflows where ESnet doesn’t need to change anything; all that needs to be done is for the edge systems to adopt current best practices such as the Science DMZ model – which many sites and facilities have already done,” said ESnet network engineer Eli Dart of the status of ESnet for use by science teams. “Many superfacility workflows in use today fit under this category, and the network is ready for them.”
The future, though, lies in the adaptability and closer integration made possible by ESnet6, says Dart – for example, making an API call to the network and getting behavior adapted to a specific workflow, a capability ESnet6 comes closer to providing.
“This second round has a lot of potential,” said Dart. “We’ve got this high-performance network and it has sufficient capacity to accommodate many very high-speed data flows. It also has advanced automation and provisioning capabilities. The goal now is to collaborate on the integration of our automation with the software stacks running at the scientific facilities, so that everything works well as an integrated whole. One example of this is the integration of ESnet’s SENSE network orchestration capability with the ExaFEL project funded by the Exascale Computing Project (ECP).”
As integration and automation become the name of the superfacility game, one next step seems clear: scale up. DOE is doing just that, exploring superfacility concepts and implementation across the national laboratory system through its ASCR Integrated Research Infrastructure (IRI) Architecture Blueprint. IRI will tie together facility resources at national labs in a strategic effort to support and bring about an integrated future and integrated capability, building on what Berkeley Lab has done and improving the data capabilities of the Office of Science as a whole.
Overall, it’s clear that science is moving in the direction of greater connection, and the work that has already been done to implement the superfacility is a series of first steps toward those goals – but there’s more to be done, both at Berkeley Lab and across the entire Office of Science.
“There’s a recognition across the DOE that connecting facilities to the resources and infrastructure they need is going to be increasingly important in the future,” said Bard. “Superfacility is a model for how that could work.”
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.