Erich Strohmaier and the TOP500: A list that’s taken on a life of its own
June 14, 2012
Jon Bashor, Jbashor@lbl.gov, 510-486-5849
After completing his Ph.D. in physics at the University of Heidelberg, Erich Strohmaier didn’t have much time to ponder his next career move – he had exactly one day off before starting a new job at the University of Mannheim. So, he used his free time to purchase a suit. On his first day at the new job, he attended a small university conference focused on supercomputing, the “Mannheimer Supercomputer Seminar” organized by Prof. Hans Meuer and Dr. Hans-Martin Wacker.
One of his duties while working for Prof. Meuer was to assemble some statistics on worldwide supercomputers in preparation for the annual meeting of the conference. In 1993 the idea was born to abandon a fixed definition of what a “supercomputer” was and to use an adaptive measure instead. The two men knew there were at least several hundred vector supercomputers around the globe, but were pretty sure there weren’t a thousand. So, they decided to list the top 500 systems ranked with an actual benchmark result to eliminate non-functional systems. Thinking it would be a one-time deal, Strohmaier created a database on his computer for just that.
But then Meuer and Strohmaier decided to see how much the list would change in five months and recalculated the list in time to present the results at the 1993 Supercomputing conference held in November in Portland, Oregon. This time, however, Strohmaier decided to create a new database with the ability to track systems over time—just in case they wanted to keep the list going.
The TOP500 list of the world’s top supercomputers was born and Strohmaier’s career trajectory was launched in a new direction. In June 2012, he will present the 39th edition of the list during the opening session of the International Supercomputing Conference in Hamburg, Germany – the international meeting that grew out of the Mannheim conference and now draws more than 2,000 attendees annually.
“When we started this, it was to gather statistics for a small conference. We never expected the scope and popularity to grow as it did,” said Strohmaier, who heads the Future Technologies Group at Lawrence Berkeley National Laboratory in California. “And we never guessed it would come to be viewed as a larger community service for the HPC community.”
Strohmaier said it took two or three years for the list to find its footing. Initially, a number of manufacturers were reluctant to provide the necessary data.
“At first, only those who were sure they would have a good showing submitted their data to us,” he said.
But as the list gained visibility, another challenge arose. “Some vendors called to say, ‘We want to be on the list – what size system do we have to build to make it?’” Strohmaier recalled. “By that point, things were getting a little out of hand.”
With the release of each list, many of the manufacturers and host institutions of the systems issue press releases proclaiming their positions and claiming bragging rights. But Strohmaier acknowledges that the list isn’t comprehensive. In the early days, some centers didn’t want to be identified as having a large system, as they thought it might endanger funding for future machines.
“Some companies don’t want to be listed because they see their systems as giving them a competitive advantage and don’t want their competitors to know either the size or type of their machines,” Strohmaier said. “And some centers are conducting classified research and say, ‘Thou shall not publish our system.’”
As systems get larger, some institutions are reluctant to devote their entire supercomputer to running the LINPACK benchmark, by which all systems on the list are measured. One of the criticisms of the methodology is that LINPACK, developed by Jack Dongarra of the University of Tennessee, does not represent a real workload and therefore skews the performance levels. But by using the same benchmark to characterize every one of the thousands of systems ever to make the list, the TOP500 archive provides a unique resource for tracking the development of HPC performance and architectures since 1993.
In fact, looking at the lists over the years has yielded one of several surprises, Strohmaier said.
“The biggest surprise is the continual steady growth and development—it’s a very consistent trend,” he said. “Originally we expected much less turnover, but change has been faster than we anticipated. The changes in performance level have been pretty steady.”
Another surprise has been the number of systems to claim the top spot. In the 38 lists to date, 14 systems have been ranked number one, though some of them remained on top through upgrades and buildouts. Five machines have taken the top slot only once, while two of them have kept the title for seven lists.
From Heidelberg to Tennessee to California
Strohmaier earned his Ph.D. in theoretical physics in 1990, but the job market for physicists was so bleak that he didn’t even consider looking for a post-doc position. But his thesis focused on numerical methods in elementary particle physics, for which he used the largest supercomputers available. So when he learned about a research position in HPC at the University of Mannheim for a new project comparing the performance of a number of physics applications on a Fujitsu VP2600 supercomputer, he took the job.
In 1995, as that research funding was ending, he decided to look for a position in the U.S. and ended up working with the LINPACK author and fellow TOP500 editor Jack Dongarra at the University of Tennessee. After 15 years in Heidelberg, he was looking for an “American experience” and planned to spend two, maybe three years in the States. By 1998, though, he’d decided to stay in the U.S awhile, cementing his decision by getting a personalized “TOP500” license plate for his car.
But he kept looking westward, especially toward California. He’d known Horst Simon through his work on the TOP500 list and had visited Berkeley Lab several times. On his third try, Strohmaier landed a job at Lawrence Berkeley National Laboratory in 2001. In addition to his current role as head of the Future Technologies Group, he is also the principal investigator of the Department of Energy-funded CACHE Institute, a joint mathematics and computer science institute focused on Communication Avoiding and Communication Hiding at Extreme Scales.
But twice a year, Strohmaier still turns his attention to the TOP500 list, sending out a call for participation eight weeks before the publication date. The data for the lists comes from two sources—the individual sites where the systems are housed, and the companies that build the computers. If there are differences between the figures, they get resolved.
The serious work for Strohmaier and TOP500 colleague Anas Nashif begins about four weeks out when the list starts to take shape. In the final two weeks, there is a two-stage review of the list by the people who submitted the data and a group of colleagues who provide a final sanity check. Then the server is updated with the list, certificates are printed, news releases written and an overview presentation prepared.
And once the list is officially public, many institutions make their own announcements—even those smaller universities with systems well down the list have been known to proclaim they host “one of the world’s top supercomputers.”
“That’s all in the spirit of the game—we have a number of big players, but also many of the smaller players are very proud, and that shows how important HPC has become to the research community,” Strohmaier said. “And I feel bad if a small institution puts a lot of effort into its submission but doesn’t make the list, but it is all part of the game.”
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.