Berkeley Lab Team Achieves 10.6 Gigabits/second Data Throughput in 10-Gigabit Ethernet Test

July 3, 2002

Contact: Jon Bashor, 510-486-5849, jbashor@lbl.gov

Members of the team demonstrating 10-gig throughput were (from left) Wes Bethel, John Christman, John Shalf, Chip Smith and Michael Bennett.

BERKELEY, CA – Although there has been a lot of discussion recently about 10-Gigabit Ethernet capability, actually achieving that level of performance in the real world has been difficult. Until now.

Last week, a team from Lawrence Berkeley National Laboratory, which operates some of the world’s most powerful computing, data storage and networking resources for the U.S. Department of Energy, teamed with Force10 Networks (switches), SysKonnect (network interfaces), FineTec Computers (clusters), Quartet Network Storage (on-line storage) and Ixia (line rate monitors) to assemble a demonstration system that runs a true scientific application to produce data on one 11-processor cluster, then sends the resulting data across a 10-Gigabit Ethernet connection to another cluster, where it is rendered for visualization.

Though held in the US, the demo included contributions from Europe – the network interface cards were made by SysKonnect, headquartered in Ettlingen, Germany, and the scientific application used for the demonstration was the Cactus simulation code developed by the Numerical Relativity group led by Ed Seidel at the Albert Einstein Institute/Max Planck Institute in Potsdam, Germany.

The result? The team was we able to sustain 10.6 gigabits/sec aggregated across two 10-gigabit interfaces, 9.8 gigabits per second on one interface and 960 megabits per second on the other. The measurements were taken from fiber optic taps using Ixia 400 performance analyzers with 10 gigabit Ethernet interfaces. A total of 58 Terabytes of data were transferred over 12 hours of pre-demonstration testing and the demo itself.

10 gig monitors2

With the IEEE’s adoption of Standard 802.3ae for 10-Gigabit Ethernet equipment in June, the speed of Ethernet operations has increased by an order of magnitude – at least on paper. But achieving that 10-fold increase in actual Ethernet performance remains a challenge that can be met only with leading-edge equipment and expertise.

The system was built as a prelude to Berkeley Lab’s entry into the High-Performance Bandwidth Challenge at the SC2002 conference of high-performance computing and networking, to be held in November in Baltimore, Maryland. Berkeley Lab teams have won the High-Performance Bandwidth Challenge for two consecutive years. At the SC2001 conference held last November, the LBNL team took top honors, moving data across the network at a sustained rate of 3.3 Gigabits in a live computational steering/visualization demonstration involving the Albert Einstein Institute's "Cactus" simulation code (www.cactuscode.org) and Berkeley Lab’s Visapult parallel visualization system (vis.lbl.gov/RDProjects/visapult/index.html).

The demonstration was originally put together to demonstrate real-world applications of 10-Gig E capability for a conference scheduled for June. However, the conference was delayed and the Berkeley Lab team decided to put on a public demonstration before taking the system apart and returning the loaned equipment to the vendors.

“The demo turned out to really successful. Force 10 loaned us the switches, FineTec donated enough computers to make it interesting and we worked with SysKonnect to get very high performance from their network interfaces,” said network engineer Mike Bennett. “Quartet provided the network storage for storing the data to be visualized and Ixia supplied the monitoring equipment. The result is we proved that 10-Gig E is a reality, not just a bunch of back-of-the-envelope calculations.”

According to Bennett, most demonstrations of 10-Gig E to date have been done to showcase interoperability of components made by different vendors, which is the aim of the IEEE standard. That standard doesn’t mean, however, that a system will achieve peak performance.

“What we are demonstrating is that it does work in the real world,” Bennett said.

John Shalf, a member of the Berkeley Lab Visualization Group, said that 10-Gig E capability is important for scientific applications.

Codes like Cactus can easily consume an entire supercomputer, like the 3,328-processor IBM SP at our National Energy Research Scientific Computing Center, or NERSC. The Cactus team ran the code at NERSC for 1 million CPU-hours, or 14 CPU-years, performing the first-ever simulations of the inspiraling coalescence of two black holes,” Shalf said.

A high-bandwidth connection allows users to keep up with the huge data production rates of such simulations – about a terabyte per time step – and ensure that the code is running properly. Otherwise, mistakes may not be detected until the run is finished – and wasted lots of computer cycles generating bad data.

Remote monitoring and visualization require a system that can provide visualization capability over wide area network connections without compromising interactivity or the simulation performance. The team used Visapult, developed by Wes Bethel of LBNL’s Visualization Group for DOE’s Next Generation Internet/Combustion Corridor project several years ago. Visapult allows users to use a desktop workstation to perform interactive volume visualization of remotely computed datasets without downsampling of the original data. It does so by employing the same massively parallel distributed memory computational model employed by the simulation code in order to keep up with the data production rate of the simulation. It also uses high performance networking in order to distribute its computational pipeline across a WAN so as to provide a remote visualization capability that is decoupled from the cycle time of the simulation code itself.

To achieve the 10.6 gigabits per second performance, George “Chip” Smith of the team had to work with SysKonnect to overcome a problem resulting from running Linux on the clusters. “When you run Linux with the SysKonnect card, the libraries in the kernel for the SysKonnect cards have a default behavior and run with an average line rate of 600-700 megabits per second,” Smith said. “Working with Syskonnect, I was able to change one of the libraries in the kernel and using a recent virtual Ehernet interface module, I was able to get 950 to 1000 megabits off the single interfaces. This enabled us to run this demonstration with one-third fewer machines than it would have without the work on the kernel.”

Bennett said the main obstacle to achieving even better performance wasn’t the lack of bandwidth, but rather the lack of resources, including the number of machines in each cluster.

"One of the most exciting things is that it scales. If we would have had 50 boxes in the cluster, we could have delivered 50 gigabits," Bennett said. "Now that we've done 10 Gig, it's time to start looking at 100."

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.