StarGate Demo at SC09 Shows How to Keep Astrophysics Data Out of Archival "Black Holes"
Reserved Bandwidth on ESnet Makes Possible Mulit-Gigabit Streaming Between Argonne, SC Conference in Portland
December 20, 2009
Berkeley Lab Contact: Jon Bashor, email@example.com, 510-486-5849
PORTLAND, Ore.—As both an astrophysicist and director of the San Diego Supercomputer Center (SDSC), Mike Norman understands two common perspectives on archiving massive scientific datasets. During a live demonstration at the SC09 conference of streaming data simulating cosmic structures of the early universe, Norman said that some center directors view their data archives as "black holes," where a wealth of data accumulates and needs to be protected.
But as a leading expert in the field of astrophysics, he sees data as intellectual property that belongs to the researcher and his or her home institution — not the center where the data was computed. Some people, Norman says, claim that it's impossible to move those terabytes of data between computing centers and where the researcher sits. But in a live demo in which data was streamed over a reserved 10-gigabits-per-second provided by the Department of Energy's ESnet (Energy Sciences Network), Norman and his graduate assistant Rick Wagner showed it can be done.
While the scientific results of the project are important, the success in building reliable high-bandwidth connections linking key research facilities and institutions addresses a problem facing many science communities.
“A lot of researchers stand to benefit from this successful demonstration,” said Eli Dart, an ESnet engineer who helped the team achieve the necessary network performance. “While the science itself is very important in its own right, the ability to link multiple institutions in this way really paves the way for other scientists to use these tools more easily in the future.”
"This couldn't have been done without ESnet," Wagner said. Two aspects of the network came into play. First, ESnet operates the circuit-oriented Science Data Network, which provides dedicated bandwidth for moving large datasets. However, with numerous projects filling the network much of the time for other demos and competitions at SC09, Norman and Wagner took advantage of OSCARS, ESnet's On-Demand Secure Circuit and Advance Reservation System.
"We gave them the bandwidth they needed, when they needed it," said ESnet engineer Evangelos Chaniotakis. The San Diego team was given two two-hour bandwidth reservations on both Tuesday, Nov. 17, and Thursday, Nov. 19. Chaniotakis set up the reservations, then the network automatically reconfigured itself once the window closed.
At the SDSC booth, the live streaming of the data drew a standing-room-only crowd as the data was first shown as a 4,0963 cube containing 64 billion particles and cells. But Norman pointed out that the milky white cube was far too complex to absorb, then added that it was only one of numerous time-steps. In all, the data required for rendering came to about 150 terabytes of data.
In real time, the data was rendered on the Eureka Linux cluster at the Argonne Leadership Computing Facility and reduced to one-sixty-fourth of the original size for a 1,0243 representation, making it more manageable and able to be explored interactively. The milky mesh was shown to contain galaxies and clusters linked by sheets and filaments of cosmic gases.
The project, Norman explained, is aimed at determining whether the signal of faint ripples in the universe known as baryon acoustic oscillations, or BAO, can actually be observed in the absorption of light by the intergalactic gas. It can, according to research led by Norman, who said they were the first to determine this. Such a finding is critical to the success of a dark energy survey known as BOSS, the Baryon Oscillation Spectroscopic Survey. The results of his proof-of-concept project, Norman said, "ensure that BOSS is not a waste of time."
Creating a simulation of this size, even using the petaflops Cray XT5 Kraken system at the University of Tennessee can take three months to complete as it is run in batches as time is allocated, Norman said. The data could then be moved in three nights to Argonne for rendering. The images were then streamed to the SDSC OptiPortal for display. Norman said the next step is to close the loop between the client side and the server side to allow interactive use. But the hard work — connecting the resources with adequate bandwidth — has been done, as evidenced by the demo, he noted.
But it wasn't just an issue of bandwidth, according to ESnet's Dart. "We did a lot of testing and tuning," said Dart.
ESnet is managed by Lawrence Berkeley National Laboratory (LBNL) Other contributors to the demo were Joe Insley of Argonne National Laboratory (ANL), who generated the images from the data, and Eric Olson, also of Argonne, who was responsible for the composition and imaging software. Network engineers Linda Winkler and Loren Wilson of ANL and Thomas Hutton of SDSC worked to set up and tune the network and servers before moving the demonstration to SC09. The project was a collaboration between ANL, CalIT2, ESnet at the Lawrence Berkeley National Laboratory, the National Institute for Computational Science, Oak Ridge National Laboratory and SDSC.
About Computing Sciences at Berkeley Lab
The Computing Sciences Area at Lawrence Berkeley National Laboratory(Berkeley Lab) provides the computing and networking resources and expertise critical to advancing Department of Energy Office of Science (DOE-SC) research missions: developing new energy sources, improving energy efficiency, developing new materials, and increasing our understanding of ourselves, our world, and our universe. ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities. NERSC and ESnet are both Department of Energy Office of Science National User Facilities. The Computational Research Division (CRD) conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation.
Berkeley Lab addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science. The DOE Office of Science is the United States' single largest supporter of basic research in the physical sciences and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.