Berkeley Lab Researchers Analyze Performance, Potential of Cell Processor
March 15, 2007
Though it was designed as the heart of the upcoming Sony PlayStation3 game console, the STI Cell processor has created quite a stir in the computational science community, where the processor’s potential as a building block for high performance computers has been widely discussed and speculated upon.
To evaluate Cell’s potential, computer scientists at the U.S. Department of Energy’s Lawrence Berkeley National Laboratory evaluated the processor’s performance in running several scientific application kernels, then compared this performance against other processor architectures. The results of the group’s evaluation were presented in a paper at the ACM International Conference on Computing Frontiers, held May 2-6, 2006, in Ischia, Italy.
The paper, “The Potential of the Cell Processor for Scientific Computing,” was written by Samuel Williams, Leonid Oliker, Parry Husbands, Shoaib Kamil and Katherine Yelick, of Berkeley Lab’s Future Technologies Group and by John Shalf from NERSC.
“Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency,” the authors wrote in their paper. “We also conclude that Cell’s heterogeneous multi-core implementation is inherently better suited to the HPC environment than homogeneous commodity multicore processors.”
Cell, designed by a partnership of Sony, Toshiba, and IBM Cell, is a high performance implementation of software-controlled memory hierarchy in conjunction with the considerable floating point resources that are required for demanding numerical algorithms. Cell takes a radical departure from conventional multiprocessor or multi-core architectures. Instead of using identical cooperating commodity processors, it uses a conventional high performance PowerPC core that controls eight simple SIMD (single instruction, multiple data) cores, called synergistic processing elements (SPEs), where each SPE contains a synergistic processing unit (SPU), a local memory, and a memory flow controller.
Despite its radical departure from mainstream general-purpose processor design, Cell is particularly compelling because it will be produced at such high volumes that it will be cost-competitive with commodity CPUs. At the same time, the slowing pace of commodity microprocessor clock rates and increasing chip power demands have become a concern to computational scientists, encouraging the community to consider alternatives like STI Cell. The authors examined the potential of using the forthcoming STI Cell processor as a building block for future high-end parallel systems by investigating performance across several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations on regular grids, as well as 1D and 2D fast Fourier transformations.”
According to the authors, the current implementation of Cell is most often noted for its extremely high performance single-precision (32-bit) floating performance, but the majority of scientific applications require double precision (64-bit). Although Cell’s peak double precision performance is still impressive relative to its commodity peers (eight SPEs at 3.2GHz = 14.6 Gflop/s), the group quantified how modest hardware changes, which they named Cell+, could improve double precision performance.
The authors developed a performance model for Cell and used it to show direct comparisons of Cell against the AMD Opteron, Intel Itanium2 and Cray X1 architectures. The performance model was then used to guide implementation development that was run on IBM’s Full System Simulator in order to provide even more accurate performance estimates.
The authors argue that Cell’s three-level memory architecture, which decouples main memory accesses from computation and is explicitly managed by the software, provides several advantages over mainstream cache-based architectures. First, performance is more predictable, because the load time from an SPE’s local store is constant. Second, long block transfers from off-chip DRAM can achieve a much higher percentage of memory bandwidth than individual cache-line loads. Finally, for predictable memory access patterns, communication and computation can be effectively overlapped by careful scheduling in software.
“Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency,” the authors wrote. While their current analysis uses hand-optimized code on a set of small scientific kernels, the results are striking. On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors, despite the fact that Cell’s peak double precision performance is fourteen times slower than its peak single precision performance. If Cell were to include at least one fully utilizable pipelined double precision floating point unit, as proposed in their Cell+ implementation, these speedups would easily double.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.