Collaboration Shines in Materials Project Success
Many Hands at Lab Lift 'World-Changing Idea' to New Heights
December 14, 2013
Margie Wylie, firstname.lastname@example.org, +1 510.486.7421
Scientific American featured a Computing Sciences-powered project on its December cover as a top world-changing idea of 2013. The Materials Project, an open, web-hosted service, allows scientists using supercomputers and quantum mechanical equations to design “new materials atom by atom, before ever running an experiment,” the magazine noted.
“Materials science is on the verge of a revolution,” wrote co-authors Kristin Persson of Lawrence Berkeley National Laboratory (Berkeley Lab) and Gerbrand Ceder of Massachusetts Institute of Technology (MIT). “We can now use a century of progress in physics and computing to move beyond the Edisonian process.”
The Materials Project aims to take the guesswork out of finding the best material for a job—be it a new battery electrode or a lightweight spacecraft body—by making the characteristics of every inorganic compound available to any interested scientist. With 35,000 materials and 5,000 users, the once small, experimental project has grown to become what is likely the largest and arguably the most sophisticated open materials database yet fielded. The ultimate goal is to cut in half the amount of time it typically takes to bring new materials to market, which is currently about 18 years.
Persson, of Berkeley Lab’s Environmental Energy Technologies Division (EETD), brought the project with her from MIT where as a post-doctoral researcher she cofounded it with Ceder. At Berkeley Lab, Computing Sciences’ world-class talent and resources combined to help grow the already promising project into the world-changing idea it is today.
First, the project had to build up a database of the results of thousands of quantum mechanical calculations. Scientists use these to screen compounds for specific characteristics, such as density, hardness, shininess or electronic conductivity.
Anubhav Jain, who had previously worked with Persson on the project at MIT, was an Alvarez fellow with the Computational Research Division (CRD) when he built Fireworks. The workflow automation framework streamlines the process of running these calculations in a supercomputer environment.
Typical supercomputing calculations are big, requiring hundreds or thousands of processors running in parallel. The Materials Project, conversely, requires thousands of calculations, but each runs on only a handful of processors. (This is sometimes called “high-throughput” or “ensemble” computing.) Naturally, the system for submitting calculations in most supercomputing environments is geared towards fewer, larger jobs. Jain’s Fireworks adds intelligence and automation to the process of submitting large batches of smaller jobs. And because the framework was written to be general purpose, other projects are now considering using it for research that requires an ensemble-computing model.
Today, Jain works with Persson in EETD, where he has prepared and executed all the density functional theory calculations for the project since mid-2012, leveraging the pymatgen library developed for Materials Project by Shyue Ping Ong of the University of California, San Diego, and with help from others on the project, including Wei Chen, a post-doctoral researcher with EETD.
As the Materials Project transitioned from its experimental roots to a full production service for thousands of users, its SQL database emerged as a key difficulty. Dan Gunter of CRD’s Advanced Computing for Science (ACS) Department led the transformation of the project’s data model from traditional SQL to MongoDB, with assistance from Shreyas Cholia, deputy leader of the outreach, programming and software group at the National Energy Research Scientific Computing Center (NERSC) and Monte Goode, also of ACS.
“Traditional SQL is focused on upfront design; first you come up with a schema, then you enter the data,” said Cholia. “In science, however, you rarely know everything you’re going to do with the data before you start.” The Mongo DB model lets researchers add new data and then generates its structure (or schema) from that. This technology is used as the workhorse of the Materials Project, to schedule and track quantum mechanical calculations of materials properties on supercomputers, to store and search the results of these computations, and to perform advanced analytics on the computed materials properties.
Working with Ong, the site’s primary designer, NERSC’s Cholia and David Skinner, with early assistance from Annette Greiner, built the NERSC Science Gateway that hosts the Materials Project. Using an interactive, web-based interface, researchers can peruse compounds, access applications to explore and visualize materials, and even submit new calculations to NERSC computers.
“When you search for a molecule, you can get a 3D visualization of its structure,” says Cholia. “With the crystal toolkit, you can tweak that molecule’s properties and then and look for something like that in the database. There’s a lot of interactivity.”
The Lab’s Miriam Brafman of ACS is working on the next generation of innovative web interfaces for the project. The aim is to dynamically explore the data and launch new analyses. “Powerful interfaces are key to successful use of the diverse data needed for materials design: researchers need to search and draw correlations across multiple dimensions of materials properties,” said Gunter.
The Materials Project also provides an HTTP REST application programming interface (API)—developed by Ong with help from Cholia, Gunter and Jain. This web-based API makes it possible to programmatically interact with the project’s database directly. Using the API, researchers can run their own analyses directly on the data, an extremely powerful tool for collaborating with other projects that wish to consume Materials Project data.
In addition, Maciej Haranczyk of CRD’s Scientific Computing Group is extending the project to materials with defects and porous materials. Haranczyk and post-doctoral researcher Bharat Medasani are developing algorithms and tools to help scientists easily analyze how structural defects can affect a given material's properties. Haranczyk is working with the Nanoporous Materials Genome Center at the University of Minnesota to integrate their materials libraries and computational characterization tools into the Materials Project. Gunter and Haranczyk are building an interface for exploring porous materials' properties.
NERSC serves as the computing and data engine for the project. It provides the software and hardware infrastructure for the web gateway and databases that serve up the Materials Project data. In addition to supporting calculations on supercomputers, the NERSC division maintains cluster nodes purchased by and dedicated solely to the Materials Project. ESnet connectivity enables access to the NERSC science gateways, and serves as the platform to access data resources at NERSC.
Collaboration comes naturally at Berkeley Lab, the birthplace of team science. The collaboration between disciplines, across divisions, within Computing Sciences, and across institutions too numerous to mention here are key to the continuing success of The Materials Project, notes Persson: “The Materials Project is collaborative not just within LBNL, but also across institutions, and most obviously with its co-founder MIT. Although the core teams are very much centered at Berkeley Lab, we also work together with Universite de Louvain, University of Kentucky, Duke, UC Berkeley, and many more institutions.”
About Berkeley Lab Computing Sciences
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe. ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 5,500 scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation.