Data Science

Transforming data-driven discovery and understanding by developing and applying novel data science methods, technologies, and infrastructures with scientific partners.

Data science is an interdisciplinary field that incorporates skills from computer science, statistics, information science, mathematics, visualization, data integration, and complex systems to extract insights from typically large datasets and apply those insights to solve problems in various domains.

From climate modeling to genomics to nuclear physics, increasingly precise sensors and ever-more powerful supercomputers are delivering data to scientists at furious rates. These enormous data volumes and incredible speeds are allowing scientists to make progress on some of humanity’s greatest challenges—from investigating the origins of our universe to finding a cure for cancer. But to make those breakthrough discoveries, scientists must first manage and make sense of this data deluge.

Data Management

Data Processing

Math for Data

Analysis and Visualization

Cybersecurity, Differential Privacy

Data Repositories and Sharing

At Berkeley Lab, our researchers work with domain scientists to develop novel methods, technologies, and infrastructures to prepare raw data for analysis, define the data science problem (i.e., categorize data, identify patterns, identify anomalies, show correlations, predict outcomes, etc.), analyze data, develop data-driven solutions, and present or visualize the findings to inform decision making. Our work in cybersecurity ensures that our datasets are secure from the moment it’s generated to their archival. And we enable broad scientific research through our work on making data Findable, Accessible, Interoperable, and Reusable (FAIR).

Berkeley Lab is also home to the Department of Energy’s (DOE) National Energy Research Science Computing Center (NERSC) and Energy Sciences Network (ESnet). As the mission, high performance computing facility for the DOE Office of Science, NERSC provides computational and data resources and expertise to more than 8,000 scientists each year, who use NERSC to perform open scientific research across a wide range of disciplines.

ESnet is a high-performance, unclassified network built to support scientific research. The network provides services to over 50 DOE research sites, including the entire DOE National Laboratory system, supercomputing facilities, and major scientific instruments. ESnet also connects to 140 research and commercial networks, permitting DOE-funded scientists to collaborate productively with partners worldwide.