Analysis: Lifting the Hood on Sun’s New UltraSPARC T2 Chip
Sun’s next-gen multicore design boasts a number of enhancements -- including dedicated FPUs for each core -- which make it a better overall processor.
Sun Microsystems Inc. launched its first aggressively multicore chip design—the UltraSPARC T1 (formerly code-named "Niagara")—nearly two years ago (in October 2005). That chip, which boasts eight discrete cores—each of which is capable of simultaneously processing up to four threads (for a total of 32 threads)—outstrips the multicore strategies of Advanced Micro Devices (AMD) Inc., IBM Corp., Intel Corp., and other Sun competitors.
UltraSPARC T1 was certainly ambitious, but Sun is now rolling out its successor, the aptly-named UltraSPARC T2. Sun’s next-gen multicore design retains the same number of cores but ups the thread processing ante—in this case, to 64 simultaneous threads.
Moreover, analysts say, it boasts a number of non-multicore enhancements—including dedicated floating point units (FPU) for each core—which make it a better overall processor, in addition to a multi-threaded beast.
"The T2 remains extremely multithreaded by conventional standards," writes Gordon Haff, a principal IT advisor with consultancy Illuminata. "However, … the T2’s additional hardware represents a significant evolution from a design that was solely focused on [thread-level parallelism] to one that’s well-equipped to handle a considerable range of thread-rich workloads."
Sun likes to tout the T1 as an eco-friendly or "green" chip, thanks to its relatively low power draw of 72 watts. While 72 watts is low—or "green"—compared to other, heftier server chips, Haft concedes, it isn’t all that green compared to smaller, lower-powered versions of competitive designs.
"[T]he T1’s strength is in the performance it delivers on its target workloads relative to the power it consumes. It was that performance that was the real design target for the T1 rather than its power draw," he says.
"Multi-threaded performance at the system level ameliorated the effects of delays associated with getting instructions and data from memory. That the T1 could be recast in terms of power efficiency was something of a felicitous side effect—albeit one that Sun marketing exploited to highly effective—and amusing—effect."
In addition to its eco-friendliness, there’s another sense in which Sun’s aggressively multi-threaded design is ahead of the curve, Haft argues. "The fundamental issue is that while memory capacity has roughly kept pace with processor performance … [,] memory performance has not. Bandwidth … has more or less kept up, but latency—the time that it takes to return a result from memory—has improved much more slowly," he explains.
"Memory latencies over the past decade have lagged processor performance by at least an order of magnitude. As a result, even using techniques such as on-processor caches and out-of-order execution, processors have tended to spend more and more time spinning idly, waiting for memory to give them data to process."
That’s where Sun’s Niagara design has an advantage, Haft says.
"Thread-level parallelism … is one approach to deal with this disparity between processor speed and memory speed. With TLP, the chip handles several chains of instructions at once, efficiently switching away from tasks that are waiting for data to arrive, working instead on tasks that already have their data ready for processing," he observes. "TLP can’t speed up memory, but it can help optimize for a world in which memory is slower than the processors."
In this respect, Haft continues, the UltraSPARC T2 is even more optimized than its predecessor. "T2 doubles the number of threads that it can handle relative to its predecessor, while keeping the core count the same. This provides twice the number of "slots" that the chip can keep in play while waiting for data to arrive from memory," he notes.
"The T2 is still fairly aggressive relative to mainstream competition in terms of the number of cores that it contains—but it’s in the number of threads that each core can switch among where it truly puts the hammer down. The T2 also adds specialist co-processors for floating point, security, and networking acceleration. In short, the T2 is all about boosting performance in a multi-threaded environment."