A-Z Index | Directory | Careers

Science Data Pilots: An Infrastructure to Harness Big Science Data

All talks are in booth #1939

Science breakthroughs in the 21st century will depend on how well researchers in a variety of disciplines—from biology to the physical sciences—harness the massive datasets that have been cultivated over decades of experiments, observations and simulations. But for many researchers, taking full advantage of these scientific troves requires technologies and a robust computational infrastructure that doesn’t currently exist for their field.

Recognizing this need, the Department of Energy’s Office of Science—the single largest supporter of basic research in the physical sciences in the United States—is bringing together researchers, network engineers and computer and computational scientists to build the tools and infrastructure for modern scientific discovery. These following projects, which are lead by Lawrence Berkeley National Laboratory researchers, reflect some of that progress so far.

X-rays and Supercomputing - Opening New Frontiers in Photon Science 

Tuesday, Nov. 18 10:30-11 a.m.

This data technology project demonstrated the ability to use a central scientific data facility serving data from multiple experimental facilities. Data from experiments at the Advanced Light Source (ALS), Advanced Photon Source, the Linac Coherent Light Source and the National Synchrotron Light Source was moved to the National Energy Research Scientific Computing Center (NERSC) via ESnet, DOE’s Energy Sciences Network. To accurately reflect the data variety of scientific information produced by user facilities operated by the Office of Basic Energy Sciences, the project used three distinct X-ray methods, four different light source facilities and four different beamlines.
Participants are Craig Tull, Eli Dart, Dilworth Parkinson, Nicholas Sauter and David Skinner, LBNL; Amber Boehnlein, SLAC; Francesco De Carlo and Ian Foster, ANL; and Dantong Yu, BNL.

Investigating Organic Photovoltaics in Real-Time With an ASCR Super-Facility 

Tuesday, Nov. 18 11-11:30 a.m.

This project conducted a multi-facility data technology demonstration illustrating a concept known as a "super facility," which supports the seamless integration of multiple, complementary DOE Office of Science user facilities into a virtual facility which presents a fundamentally greater capability for users. The facilities involved are the ALS, NERSC, the Oak Ridge Leadership Computing Facility and ESnet. Enabled by the network connectivity provided by ESnet between ALS, NERSC and OLCF and using specialized software, the project demonstrated the capability for researchers in organic photovoltaics to not only expose their samples at the ALS and see realtime feedback on all their samples through the SPOT applicatoin running on NERSC, but also to see near realtime analysis of their samples running at the largest scale on the Titan supercomputer at OLCF. This allowed researchers, for the first time, to understand their samples sufficiently during beamtime experiments to adjust the experiment to maximize their scientific results.
Participants are Craig Tull, Shane Canon, Eli Dart, Alex Hexemer and James Sethian, LBNL; Ian Foster, ANL; and Galen Shipman, ORNL.

EXDAC – EXtreme Data Analysis for Cosmology 

Wednesday, Nov. 19. 2:30-3:15 p.m.

In recent years astrophysics and cosmology have undergone a renaissance, transforming from data-starved to data-driven sciences. A new generation of ongoing and near-future survey experiments will gather massive data sets that will provide more than an order of magnitude improvement in our understanding of cosmology and the evolution of the universe. Their analysis requires leading-edge high performance computing resources and novel techniques to handle the many petabytes of data generated throughout these surveys. Furthermore, interpreting these observations is impossible without a modeling and simulation effort that will generate orders of magnitude more simulation data, which will be used to directly understand and constrain systematic uncertainties in these experiments. This project developed an example of this pipeline and what a future set of data facilities in the DOE complex could deliver in terms of significantly enhanced scientific reach and turnaround time.
Participants are Peter Nugent and Shane Canon, LBNL; Salman Habib, ANL; Michael Ernst and Anže Slosar, BNL; Bronson Messer, ORNL.

Granular Data Processing on HPCs Using an Event Service 

Wednesday, Nov. 19. 3:00-4:00 p.m.

HPC facilities present unique opportunities and challenges for high energy physics event processing. The massive scale of many HPC systems means that fractionally small utilizations can yield large returns in throughput. Parallel applications which can dynamically and efficiently fill any scheduling opportunities the resource presents benefit both the facility (maximal utilization) and the compute-limited science. We will demonstrate an enabling framework for such applications, a novel fine grained data processing system for HEP-like event processing tailored to HPCs, called Yoda. Yoda is a specialization of an Event Service workflow engine designed for the efficient exploitation of distributed and architecturally diverse computing resources. It was developed in the ATLAS experiment where the compute-limited physics program stands to benefit greatly from opportunistic computing resources, which can enable computationally intensive physics such as rare searches that would otherwise be impossible within the available resources. The Event Service is also designed for highly efficient data handling in data intensive processing, utilizing dynamic data movement across powerful networks to minimize expensive disk storage demands. The data intensive, network centric, platform agnostic computing embodied by the Event Service and Yoda represents an increasingly important paradigm within the scientific computing community. We expect the system to mate well with emerging data intensive platforms.
Participants are Torre Wenaus, BNL and Vakho Tsulaia, LBNL.

Virtual Data Facility Service Infrastructure Demonstration 

Wednesday, Nov. 19. 4:00– 4:30 p.m.

Major facilities and science teams across the DOE laboratory system are increasingly dependent on the ability to efficiently capture, integrate and steward large volumes of diverse data. These data-intensive workloads are often composed as complex scientific workflows that require computational and data services across multiple facilities. ASCR’s current computational environment will need to be expanded to include new services to enable this. This project demonstrated a few core services that illustrated how a Virtual Data Facility could build upon ASCR’s computational infrastructure to better meet the needs of the DOE experimental and observational facilities and research teams.
Participants are Shane Canon and Brian Tierney, LBNL; Dan Olson, ANL; Michael Ernst, BNL; Kerstin Kleese—Van Dam, PNNL; and Galen Shipman, ORNL.


About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.