Science Gateways Pave Way for ‘Team Science’
Computational scientists at NERSC work with researchers around the globe to develop online tools that are changing the way they compute and collaborate
March 12, 2014
Contact: Kathy Kincade, +1 510 495 2124, email@example.com
For nearly a decade, computational scientists at the Department of Energy's National Energy Scientific Research Computing Center (NERSC) have been working with researchers around the globe to develop online tools that are changing the way they compute and collaborate at NERSC.
The result is a growing body of “science gateways” -- custom interfaces and analytics tools that make it easier for NERSC users to access large scale computing and to share and analyze data, simulation results and information regardless of where they are located. Using these portals, scientists are discovering new materials, gaining understanding of matter and unlocking secrets of our universe.
“The goal of these science gateway projects is to allow users to access their data, perform interesting computations and interact with the NERSC resources using common web-based interfaces and technologies,” said Shreyas Cholia, Acting Group Leader of NERSC’s Data and Analytics Services Group. “This makes it easier for scientists to use NERSC, while creating opportunities to build new collaborative tools to share this data with the rest of the scientific community.”
NERSC engineers help science teams design database models, build web-browser interfaces, develop analytic tools and deploy the gateway infrastructure, he noted. Because each project is unique, the teams have the option of creating a public portal, which allows anybody to access their data, or an authenticated portal, which restricts access to collaborators only.
While the concept of science gateways is not new, NERSC is at the forefront of making them as user friendly as possible, noted David Skinner, Strategic Partnerships Lead for NERSC. And as one of the world’s largest supercomputing centers, NERSC is well positioned to make the data more widely accessible. The gateways also support the growing trend toward “open-source” data sharing and analysis, which will help ensure that each new project or user does not spend time and resources reinventing the same tools or working on redundant research, Skinner noted.
“Our goal is to get science teams to collaborate around the data,” Skinner said. “If we can provide useful computing resources and data to scientists in a way that they can simply click a button and get results right away, that opens up a whole new kind of utility. But the real impact is when it enlarges the research agenda. If you can shorten the time to discovery enough, people start asking different kinds of questions.”
Enabling Broad Collaborations
The first gateway developed at NERSC was QCD (quantum chromodynamics), designed to help scientists gain new insights into particle physics using QCD, the mathematical theory that describes the strong force that binds quarks together into protons, neutrons and other less-familiar subatomic particles. QCD gateway users have used these numerical calculations to study how particles such as protons and neutrons interact with each other and how complex structures such as nuclei emerge from these interactions.
QCD was followed in 2009 by DeepSky, an astronomical image database comprising images from the Palomar-Quest and Near-Earth Asteroid Tracking transient surveys and the Palomar Transient Factory (PTF). The PTF probes gaps in the transient phase space and search for theoretically predicted, but not yet detected, phenomena such as fallback supernovae, macronovae, Type Ia supernovae or the orphan afterglows of gamma-ray bursts. In fact, DeepSky was instrumental in the 2011 discovery of the closest supernovae to Earth in 30 years.
“Deep Sky was set up so that as images from the telescopes were being processed, multiple scientists from around the world could look at them and discuss them,” Cholia said. “We needed a common portal that would allow people across the collaboration to look at a common set of images to tag and classify them. The common platform available to everyone was the web browser.
“We are providing tools, expertise and infrastructure to enable the creation of new gateways, but the gateways themselves are almost always driven by the users,” Cholia said.
Parallels in Data
The Materials Project gateway, which is accessible to a spectrum of users, further encourages collaborative science. Launched in 2011, this gateway is designed to enable scientists using supercomputers and quantum mechanical equations to design new materials atom by atom, before ever running an experiment.
Using conventional approaches, it takes about 18 years to conceptualize and commercialize a new material. The Materials Project is meant to address this bottleneck by using a genomics approach to materials science—it uses supercomputers to characterize the properties of inorganic materials and thus takes some of the guesswork out of materials design.
“Our vision is for this tool to become a dynamic ‘Google’ of material properties that continually grows and changes as more users come on board to analyze the results, verify against experiments and increase their knowledge,” said Kristin Persson, a Berkeley Lab chemist and one of the founding scientists behind the Materials Project.
OpenMSI is another NERSC gateway that is changing the way scientists think about data and data analysis. Mass spectrometry imaging (MSI) allows scientists to study tissues, cell cultures and bacterial colonies in unprecedented detail at the molecular level. As a result, a typical MSI file ranges anywhere from 10 to 50 gigabytes—equivalent to about 20,000 digital photos.
Using OpenMSI, researchers can interact with the MSI datasets over the Internet, in real time, without downloading anything. It offers a user-friendly graphical user interface so that even researchers without any programming skills can easily access and analyze the data; a standard system for organizing, tagging and storing raw and processed MSI data; a tool for retrieving this data over the web; and an interface that visualizes an MSI sample and its corresponding spectrum, side by side, in a single web browser. They can also share this data with collaborators simply by sharing a link, so no files have to be downloaded—all of the computer processing and storage occurs on NERSC systems.
Two other portals deal with a different type of imaging data. SPOT Suite provides high-performance computing capabilities for analyzing and simulating large datasets generated at Lawrence Berkeley Lab’s Advanced Light Source, while the Coherent X-ray Imaging Data Bank (CXIDB) enables scientists from around the world to deposit and share images generated by coherent x-ray light sources.
As web-based gateways continue to develop and evolve, Skinner and Cholia see new opportunities emerging for scientists from different disciplines to share not only their data but their data analysis and visualization tools. With this evolution comes new possibilities as to what scientific computer applications look like and the questions they can help address.
“There are a lot of common patterns emerging in terms of the kinds of things people are trying to do with their data,” Cholia said. For example, OpenMSI and SPOT Suite both deal with images and spectrographic data associated with points on the image. “So I think there is an increasing convergence taking place. In the long term, we want to start to capture these common patterns and help people build the tools that transcend the different domains.”
>>Read more about these and other NERSC science gateways and the team science achievements they have facilitated:
20th Century Reanalysis Project:
Coherent X-ray Imaging Data Bank
Researchers Discover a New Kind of Neutrino Transformation
A Massive Stellar Burst Before the Supernova
Berkeley Lab Researchers Prepare U.S. Climate Community for 100-Gigabit Data Transfer
Collaboration Shines in Materials Project Success
OpenMSI: A Science Gateway to Sort Through Bio-Imaging’s Big Datasets
Brace for Impact: Why Does Matter Dominate Our Universe?
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.