CRD Receives Funding to Develop Tools for Science Research
August 31, 2009
Contact: Linda Vu, CSnews@lbl.gov
In the next few months, researchers from the Lawrence Berkeley National Laboratory's Computational Research Division (CRD) will receive funding to develop tools that will help facilitate science breakthroughs in a variety of disciplines from astrophysics to geology and beyond.
Monitoring Workflows on Distributed Systems
Dan Gunter of CRD's Advanced Computing for Science Department will receive $250,000 in funding from the National Science Foundation's (NSF) Strategic Technologies for Cyberinfrastructure program to develop tools that will monitor workflows on distributed systems. The NSF funding comes from the American Recovery and Reinvestment Act, and is part of a nationwide multi-institutional collaboration lead by Ewa Deelman of the University of Southern California's Information Sciences Institute.
"The idea of this project is to develop an online monitoring system that will alert researchers and the workflow manager if a large workflow is running slower than it should. By giving scientists and other cyberinfrastructure access to useful information in real time, they can react intelligently to the problem and use their valuable supercomputing resources more efficiently," says Gunter.
He notes that the project will build on protocols developed for the perfSONAR network monitoring system and tools developed by the Center for Enabling Distributed Petascale Science (CEDPS), which is supported by the Department of Energy's Scientific Discovery through Advanced Computing (SciDAC) program.
"Scientific communities in astronomy, biology, earthquake sciences, physics and others will immediately benefit from the proposed system," adds Gunter. "Because our approach relies on standard logging formats, it is applicable to a range of workflow management systems as well as subcomponents of those systems such as job managers and data transfer tools."
End-to-End Resource Provisioning and Management System for High Performance Data Transfers
Arie Shoshani of the Lawrence Berkeley National Laboratory's (Berkeley Lab) CRD and Dantong Yu of the Brookhaven National Laboratory will collaborate to develop an end-to-end storage and network resource provisioning and management system for high performance computing (HPC) data transfers, with funding from the Department of Energy's (DOE) Office of Advanced Scientific Computing Research (ASCR).
The new system will enable the Storage Resource Manager (SRM), a framework that is widely used to manage shared storage systems, to interact with DOE's TeraPaths system. TeraPaths reserves bandwidth on the local networks connecting to the DOE's wide area Energy Sciences Network (ESnet), and also interacts with OSCARS (On-Demand Secure Circuit and Advance Reservation System) to dynamically reserve bandwidth on ESnet. The SRMs on the source and target sites will provide storage reservations as well as reserve bandwidth from the storage systems into the network.
ESnet connects thousands of researchers at DOE Laboratories and universities across the country with their collaborators worldwide. ESnet comprises two networks—an IP network to carry day-to-day traffic, including e-mails, video conferencing, etc., and a circuit-oriented Science Data Network (SDN) to haul massive scientific datasets. OSCARS allows researchers to reserve bandwidth on the SDN.
"Currently, when researchers send terabytes of data from one storage system to another across a wide area, there are many uncontrolled factors that determine the speed of transfer. Consequently, the researcher does not know how fast the systems on each end of the transfer send and receive data, or if they will get consistent transfer rate on the local-area and wide-area networks," says Shoshani. "This uncertainty means that a collaborator on the receiving end could be waiting anywhere from two hours to several days for one terabyte of data to arrive."
Shoshani notes that the ultimate goal of this ASCR project is to ensure that DOE's networking resources are being used efficiently so that researchers on the receiving end of a transfer can rely on end-to-end bandwidth reservations with guaranteed time of delivery. Furthermore, this system will guarantee high bandwidth for users transferring time-sensitive data. With support from ASCR, Shoshani and Yu will primarily focus on getting Berkeley Lab's implementation of the SRM, called BeStMan (Berkeley Storage Manager), to interact with TeraPaths.
"This system will provide a true end-to-end service—it will dynamically determine how fast the storage systems at the beginning and end of the transfer can send and receive data, negotiate with TeraPaths to efficiently navigate the data set through the local area networks, and reserve bandwidth on OSCARS so that researchers are guaranteed a certain amount of bandwidth through the duration of their transfer and a certain time for completing the delivery," says Shoshani. ASCR funded the development of BeStMan, TeraPaths and OSCARS. The SRM concept was initiated by the CRD's Scientific Data Management Group, which Shoshani leads.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.