A-Z Index | Directory | Careers

New Storage 2020 Report Outlines Future of HPC Storage

NERSC Releases Detailed Roadmap to Help HPC Community Address Next-Generation Data Storage Challenges

November 6, 2017

A new report released by Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center (NERSC), “Storage 2020,” provides a detailed roadmap to help the high performance computing (HPC) community address potentially overwhelming data storage challenges over the next decade and beyond.

Storage systems play a critical role in supporting NERSC’s mission by enabling the retention and dissemination of science data used and produced at the center. Over the past 10 years, the total volume of data stored at NERSC has increased from 3.5 PiB to 146 PiB, growing at an annual rate of 30%, driven by a 1000x increase in system performance and 100x increase in system memory. In addition, there has been dramatic growth in experimental and observational data, and experimental facilities are increasingly turning to NERSC to meet their data analysis and storage requirements.

At the same time, the technologies underpinning traditional storage in HPC are rapidly changing. Solid-state drives are being integrated into HPC systems as a new tier of high-performance storage, shifting the role of magnetic disk media away from performance, and tape revenues are slowly declining. Economic drivers coming from cloud and hyperscale data center providers are altering the mass storage ecosystem as well, rapidly advancing the state of the art in object-based storage systems over POSIX-based parallel file systems. In addition, non-volatile storage-class memory is emerging as a high-performance, low-latency media for storage. The combination of these factors broadens the design space of future storage systems, creating new opportunities for innovation but also introducing new uncertainties.

“The future of storage in HPC is getting complicated, and NERSC has published a vision for how all of the new and emerging elements can be most effectively utilized in the next 10 years,” said Damian Hazen, group lead of the Storage Systems Group at NERSC. “Our goal was to provide a roadmap for storage through 2025 that will ensure users can make optimal use of future storage technologies and that those storage technologies will continue to meet the needs of the DOE Office of Science user community.”

To support this effort, NERSC conducted a broad survey of scientific workflows and user requirements at the center, identifying four logical tiers of data storage: temporary, campaign, forever and community. They then examined futures for many types of storage media, workloads coming from exascale and experimental data and storage software and middleware to determine how these four tiers can map to physical storage systems. The roadmap sets a target of implementing three tiers by 2020 and two tiers by 2025, ultimately combining different types of storage media to simplify data management for users, noted Glenn Lockwood, storage architect at NERSC and a contributing author of the report. The performance and scalability requirements of future systems will drive the industry toward object stores by 2025, and HPC centers such as NERSC will rely on middleware to provide familiar interfaces like POSIX and HDF5 for users who aren't ready to change the way they perform I/O.

“With this roadmap and long-term strategy, we identify areas where NERSC is positioned to provide leadership in storage in the coming decade to ensure our users are able to make the most productive use of all relevant storage technologies,” said NERSC Division Director Sudip Dosanjh.

Because of the diversity of NERSC user workloads across scientific domains, this analysis and the reference storage architecture should be relevant to HPC storage planning outside of NERSC and the DOE, the report notes.

NERSC is a U.S. Department of Energy Office of Science User Facility. As one of the largest facilities in the world devoted to providing computational resources and expertise for basic scientific research, NERSC is a leader in accelerating scientific discovery through high performance computing and data analysis. 

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.