Python/Globus Tools Speed Up Development of Data Grid for LIGO
July 1, 2004
BERKELEY, Calif. Programming tools developed at the U.S. Department of Energy’s Lawrence Berkeley National Laboratory by Keith Jackson and his colleagues in the Computational Research Division’s Secure Grid Technologies Group have been used to set up an efficient system to distribute new data that will put the predictions of Einstein’s General Theory of Relativity to the test. To date, more than 50 TB of data from LIGO has been replicated to nine sites on two continents, quickly and robustly.
LIGO, the Laser Interferometer Gravitational-Wave Observatory, is a facility dedicated to detecting cosmic gravitational waves — ripples in the fabric of space and time — and interpreting these waves to provide a more complete picture of the universe. Funded by the National Science Foundation, LIGO consists of two widely separated installations — one in Hanford, Washington and the other in Livingston, Louisiana — operated in unison as a single observatory. Data from LIGO will be used to test the predictions of General Relativity — for example, whether gravitational waves propagate at the same speed as light, and whether the graviton particle has zero rest mass.
Because gravitational waves have never been directly detected (although their influence on distant objects has been measured), LIGO is conducting blind searches of large sections of the sky and producing an enormous quantity of data — almost 1 TB a day — which requires large- scale computational resources for analysis.
The LIGO Scientific Collaboration (LSC) scientists at 41 institutions worldwide need fast, reliable, and secure access to the data. To optimize access, the data sets are replicated to computer and data storage hardware at nine sites: the two observatory sites plus Caltech, MIT, Penn State, the University of Wisconsin at Milwaukee (UWM), the Max Planck Institute for Gravitation Physics/Albert Einstein Institute in Potsdam, Germany, and Cardiff University and the University of Birmingham in the UK. The LSC DataGrid uses the DOEGrids Certificate Authority operated by ESnet to issue identity certificates and service certificates.
The data distribution tool used by the LSC DataGrid is the Lightweight Data Replicator (LDR), which was developed at UWM as part of the Grid Physics Network (GriPhyN) project. LDR is built on a foundation that includes the Globus Toolkit®, Python, and pyGlobus, an interface that enables Python access to the entire Globus Toolkit. LSC DataGrid engineer Scott Koranda describes Python as the “glue to hold it all together and make it robust.”
pyGlobus is one of two Python tools developed by Jackson’s group for the Globus Toolkit, the basic software used to create computational and data grids. The pyGlobus interface or “wrapper” allows the use of the entire Globus Toolkit from Python, a high-level, interpreted programming language that is widely used in the scientific and Web communities. pyGlobus is included in the current Globus Toolkit 3.2 release.
“What’s great about using pyGlobus and Python is the speed and ease of development for setting up a new production grid application,” Jackson said. “The scientists spend less time programming and move on to their real work — analyzing data — faster.”
Another Python tool just released in the Globus Toolkit 3.9.2 Development Release (alpha test version for next year’s GT 4.0) is the Python WS Core, a Python implementation of the Web Services Resource Framework (WS--RF) specifications. When GT 4.0 is released, the grid community will be moving from homegrown protocols and specifications to industry standard Web Service protocols for client and server support and secure messaging. Moving to the new standards will simplify the creation of Web services that can interface efficiently with many resources.
Jackson’s development team for pyGlobus and the Python WS Core includes Joshua Boverhof, Noah Edelson, Monte Goode, David Konerding, David Robertson, and Matt Rodriguez. For more information about the Secure Grid Technologies Group, see <http://www- itg.lbl.gov/SGT/>, or contact Keith Jackson at <firstname.lastname@example.org>.
About Computing Sciences at Berkeley Lab
The Computing Sciences Area at Lawrence Berkeley National Laboratory(Berkeley Lab) provides the computing and networking resources and expertise critical to advancing Department of Energy Office of Science (DOE-SC) research missions: developing new energy sources, improving energy efficiency, developing new materials, and increasing our understanding of ourselves, our world, and our universe. ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities. NERSC and ESnet are both Department of Energy Office of Science National User Facilities. The Computational Research Division (CRD) conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation.
Berkeley Lab addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science. The DOE Office of Science is the United States' single largest supporter of basic research in the physical sciences and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.