Science DMZ Opens Doors to More Science, More Collaboration
From genomics to astronomy, University of Utah's Center for High Performance Computing speeds data, cooperation
September 17, 2015
Contact: Jon Bashor, email@example.com, +1 510 486 5849
At the University of Utah in Salt Lake City, the campus’s Center for High Performance Computing (CHPC) and the campus network staff have deployed a Science DMZ which has turned the facility into a virtual beehive of collaborative research. Several years ago, the university IT staff began to see problems with large data flows and developed their own work around, using a fiber connection, VLAN and MPLS (Multiprotocol Label Switching) to steer data flows onto a detour around the campus firewall.
“That was around the time when the first notions of the Science DMZ were coming together,” said Joe Breen, assistant director for networking at the CHPC. “I spoke with Eli Dart at ESnet and we leveraged his work with our own efforts to deploy a new Science DMZ.” Since then, CHPC and the university network staff have expanded the original Science DMZ to a 100 gigabit per second (Gbps) dedicated infrastructure with multiple 10 Gbps and 40 Gbps data transfer nodes (DTN). CHPC has become an important data hub for a number of research projects, making the unimpeded flow of large datasets even more critical.
“We moved 100 terabytes last week over 2-3 days,” Breen said, “and move terabytes of data every day, often 15 to 70 terabytes at a time. We have about six petabytes of active data storage.”
One of the biggest areas is astronomy data. Prof. Adam Bolton is currently the Principal Data Scientist for the Sloan Digital Sky Survey IV (SDSS), a multi-institution program that consolidates raw data from a number of telescopes and then distributes processed data to other universities. The information is also made available to the research community at large through periodic public data releases. “All of the data is stored here and we have dedicated SDSS DTNs for doing the big pushes to other institutions,” Breen said.
DTNs play a large part in the transfer of the large data sets over the University of Utah’s Science DMZ, and CHPC system administrators Sam Liston, Brian Haymore and Wayne Bradford all work closely with faculty to help them facilitate their transfers.
[Our Science DMZ] is a huge enabler of science and it has spurred a lot of collaboration, both within the University of Utah and with external collaborators. - Joe Breen, University of Utah
The university is also home to multiple genomics research groups and a cancer institute which regularly move large data sets in and out of CHPC. Large-scale research projects in carbon sequestration, weather and medicinal chemistry also use the data storage. As a member of the NSF-sponsored XSEDE, or Extreme Science and Engineering Discovery Environment, Utah often exchanges large datasets with Blue Waters at the National Center for Supercomputing Applications in Illinois and the Texas Advanced Computing Center. Utah Prof. Carleton DeTar, the Physics and Astronomy Department Chair, has projects that often require moving up to 15 TB of data to and from the Department of Energy’s Fermilab in Illinois in a short time.
“The Science DMZ makes our job easier because we know for the most part that we can push data at up to 40 Gbps to any site we need to,” Breen said. “Our researchers do the computation or analysis here, and send the data to other universities and labs, but there can still be bottlenecks.”
That situation is common enough that Breen has started helping others deploy their own Science DMZs. He’s co-taught a workshop tutorial, developed a wiki, and forged connections with several labs. One resource he regularly points newcomers to is fasterdata.es.net, a site developed by ESnet’s Brian Tierney for the networking community. The perfSONAR network measurement toolkit is another asset for tracking down bottlenecks.
Breen’s knowledge and willingness to help others also enabled Utah to become part of a collaboration between Clemson University in South Carolina and the National Institutes of Health around Washington, D.C. The collaboration, led by Clemson Prof. Alex Feltus, is investigating how to utilize new technologies to optimize the migration of plant genomic data sets. Working with Chris Konger at Clemson and Don Preuss at NIH, one of the goals is to provide a network “slice” to secure a specific amount of network bandwidth at any point on the network to support data transfers. The three sites form a national experimental triangle, Breen said, allowing bandwidth and latency testing, and the ability to troubleshoot bottlenecks. The group is now working with network researchers led by Prof. K.-C. Wang at Clemson to experiment with software-defined networking to optimize data transfers.
The University of Utah Science DMZ environment has also played a strong role in enabling the Cloudlab project, a multi-site cloud development testbed. By providing an open programmable infrastructure shared by network researchers and other domain scientists, the Utah Science DMZ has enabled collaborators from the University of Utah Flux group, the University of Wisconsin, Clemson University, Raytheon/BBN, University of Massachusetts-Amherst and US Ignite to “stitch” to and from the other geographically disparate sites that make up the testbed, and across Internet2’s Advanced Layer 2 Services to other resources.
“And it’s all possible because of the Science DMZ.” Breen said. “It’s a huge enabler of science and it has spurred a lot of collaboration, both within the University of Utah, and, with external collaborators.”
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.