A-Z Index | Directory | Careers

Python/Globus Tools Speed Up Development of Data Grid for LIGO

July 1, 2004

BERKELEY, Calif. Programming tools developed at the U.S. Department of Energy’s Lawrence Berkeley National Laboratory by Keith Jackson and his colleagues in the Computational Research Division’s Secure Grid Technologies Group have been used to set up an efficient system to distribute new data that will put the predictions of Einstein’s General Theory of Relativity to the test. To date, more than 50 TB of data from LIGO has been replicated to nine sites on two continents, quickly and robustly.

LIGO, the Laser Interferometer Gravitational-Wave Observatory, is a facility dedicated to detecting cosmic gravitational waves — ripples in the fabric of space and time — and interpreting these waves to provide a more complete picture of the universe. Funded by the National Science Foundation, LIGO consists of two widely separated installations — one in Hanford, Washington and the other in Livingston, Louisiana — operated in unison as a single observatory. Data from LIGO will be used to test the predictions of General Relativity — for example, whether gravitational waves propagate at the same speed as light, and whether the graviton particle has zero rest mass.

Because gravitational waves have never been directly detected (although their influence on distant objects has been measured), LIGO is conducting blind searches of large sections of the sky and producing an enormous quantity of data — almost 1 TB a day — which requires large- scale computational resources for analysis.

The LIGO Scientific Collaboration (LSC) scientists at 41 institutions worldwide need fast, reliable, and secure access to the data. To optimize access, the data sets are replicated to computer and data storage hardware at nine sites: the two observatory sites plus Caltech, MIT, Penn State, the University of Wisconsin at Milwaukee (UWM), the Max Planck Institute for Gravitation Physics/Albert Einstein Institute in Potsdam, Germany, and Cardiff University and the University of Birmingham in the UK. The LSC DataGrid uses the DOEGrids Certificate Authority operated by ESnet to issue identity certificates and service certificates.

The data distribution tool used by the LSC DataGrid is the Lightweight Data Replicator (LDR), which was developed at UWM as part of the Grid Physics Network (GriPhyN) project. LDR is built on a foundation that includes the Globus Toolkit®, Python, and pyGlobus, an interface that enables Python access to the entire Globus Toolkit. LSC DataGrid engineer Scott Koranda describes Python as the “glue to hold it all together and make it robust.”

pyGlobus is one of two Python tools developed by Jackson’s group for the Globus Toolkit, the basic software used to create computational and data grids. The pyGlobus interface or “wrapper” allows the use of the entire Globus Toolkit from Python, a high-level, interpreted programming language that is widely used in the scientific and Web communities. pyGlobus is included in the current Globus Toolkit 3.2 release.

“What’s great about using pyGlobus and Python is the speed and ease of development for setting up a new production grid application,” Jackson said. “The scientists spend less time programming and move on to their real work — analyzing data — faster.”

Another Python tool just released in the Globus Toolkit 3.9.2 Development Release (alpha test version for next year’s GT 4.0) is the Python WS Core, a Python implementation of the Web Services Resource Framework (WS--RF) specifications. When GT 4.0 is released, the grid community will be moving from homegrown protocols and specifications to industry standard Web Service protocols for client and server support and secure messaging. Moving to the new standards will simplify the creation of Web services that can interface efficiently with many resources.

Jackson’s development team for pyGlobus and the Python WS Core includes Joshua Boverhof, Noah Edelson, Monte Goode, David Konerding, David Robertson, and Matt Rodriguez. For more information about the Secure Grid Technologies Group, see <http://www- itg.lbl.gov/SGT/>, or contact Keith Jackson at <krjackson@lbl.gov>.


About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.