POWER6: Last of the Big-Time RISC Chips
With months to go before its launch, the industry’s already buzzing about POWER6, with good reason
IBM Corp.’s POWER5 processor was a significant improvement over its predecessor, POWER4+, boasting an integrated memory controller and improved threading capabilities—Big Blue’s much-ballyhooed Simultaneous Multithreading Technology.
Viewed in this light, POWER6—which IBM expects to ship next year—should be a more evolutionary than revolutionary upgrade over POWER5.
This doesn’t mean Big Blue’s next-generation POWER CMOS is a stopgap revision. For one thing, IBM expects that POWER6 will reach impressive clock speeds—possibly maxing out at 5 GHz. POWER6 will also ship with an improved memory controller, improved SMT capabilities, a new floating-point engine, and other performance tweaks.
With months to go before its launch, the industry’s already buzzing about POWER6, which—with PA-RISC being phased out and Sun chasing down an aggressively multicore strategy with its Niagara-class chips—is the last of the big-time RISC chips. Count veteran industry watcher and hardware guru Gordon Haff among the curious. Haff, a senior analyst with consultancy Illuminata Inc., says POWER6 will buck at least one prevailing trend—and should do so in style, if IBM can execute on its technology roadmap.
“For a post-frequency world in which improved performance is supposed to come from adding cores, POWER6’s approximate doubling of the frequency—depending upon where exactly its clock rate ends up—is a major uptick,” Haff indicates. “The clock boost comes without radically increasing pipeline length—a common trick to get higher frequency. In fact, the POWER6’s approximately 16-stage pipeline is essentially the same length as for both POWER5 and the Core microarchitecture that’s the basis for all of Intel’s new x86 processors.”
This is especially important in the performance-intensive market segments in which POWER has traditionally been an important player.
“Why this matters is that longer pipelines are far more affected by events that cause them to have to empty their contents and restart an operation,” Haff explains. “[I]n practice, this means that applications often don’t run as much faster as the frequency increase would suggest. Therefore, a similar pipeline length and design gives confidence that real performance should scale fairly well with frequency. Furthermore, in IBM’s processor design, most of the interconnects between chips and modules run at a multiple of the processor clock; therefore, these should scale up with the CPU frequency increase as well.”
Other industry watchers are similarly intrigued. “[T]he new [POWER6] processors promise to deliver twice the performance of previous-generation Power chips within the same power envelope as POWER5+,” says industry veteran Charles King, a principal with consultancy Pund-IT. “[Customers] will enjoy double the system throughput they have today with no discernable increase in power consumption—cool stuff for businesses increasingly nervous about ballooning data center electrical consumption and unstable power prices.”
King says POWER6 will be an especially important deliverable for IBM’s thriving System p Unix business. It shares the POWER CMOS with Big Blue’s System i group, and—even though System i was first to market with Power 5 in May of 2004—the Unix market is far more performance-hungry than the comparatively said System I segment, King says.
“The System p Group inhabits a UNIX market addicted to maximum speed but squeezed by the same power-consumption woes as any business unit,” he wrote. “Given Power’s ongoing hold on the lead in UNIX performance metrics, we expect Power 6 will not be happy news for IBM competitors who are dependant on Itanium and UltraSPARC platforms. Both trail far behind Power’s performance with little promise of catching up anytime soon.”
POWER5 introduced improved virtualization and power management features, but—in the area of power management, especially—POWER6 will up the ante significantly. For example, notes Illuminata’s Haff, IBM is grooming its PowerExecutive features—which it first announced in tandem with its major BladeCenter revision in February—to micromanage server power usage.
“For now, it’s mostly a monitoring and profiling tool—although IBM is planning to add more dynamic control capabilities, such as capping the power consumption of a given server, over time,” Haff comments. “IBM’s ultimate direction is to optimize overall compute environments, which requires this sort of detailed system energy and temperature profile data, which can then be used to create and execute energy management policies based on many factors.”
There are many common scenarios in which organizations can significantly reduce power consumption. Consider off-peak consumption, where a drop in voltage typically results in improved power efficiency. In this case, the downside to doing so—an attendant drop in performance—is mitigated by relatively low demand. In the case of POWER6, Haff indicates, a five percent drop in performance could translate into power savings of about 20 percent.
Subtle Improvements in SMT and Virtualization
POWER5 was the first revision of IBM’s flagship RISC chip to support aggressive multithreading, in the form of SMT, which (in both POWER5 and POWER6) forces the operating system to detect each physical processor as two logical CPUs. Multithreading is now the norm, Haff says, and some vendors (such as Sun) are pursuing ambitious multithreading efforts. POWER6 subtly improves upon this capability, he indicates.
“IBM’s implementation differs from the norm because it devotes considerable chip real estate to features that keep resource usage balanced among multiple threads. Others devote far less space, but get correspondingly modest performance gains from SMT in return,” Haff writes. “IBM … is estimating about a 40 percent performance gain with an integer application, and up to 55 percent improvement on an OLTP workload.”
There are other similarities, too. For example, both POWER5 and POWER6 track how threads use shared resources (e.g., cache slots and Global Completion Table entries), and are able to make on-the-fly adjustments to their allocations. While IBM has brought most of its POWER5 SMT implementation forward unchanged, it’s also introduced an enhancement or two.
“The POWER6 doesn’t make any radical changes in this area. However, the larger L2 cache and increase in cache associativity improve performance a bit. IBM has also increased the chip’s dispatch bandwidth, thereby increasing the potential number of instructions available to execute during a given cycle,” Haff observes.
Ditto for virtualization, where both System p and System i are already strong players. “POWER6 likewise doesn’t radically reinvent what was already a strong set of technologies. But, in conjunction with AIX updates, it will increase the number of micropartitions that can be configured, and introduce virtually partitioned memory—which is analogous to the shared processor partitions that share the processors in a pool,” Haff indicates.
Even More Mainframe Technology Trickle-Down
Mainframe hands are doubtless familiar with “instruction retry,” an availability feature which ensures that the system can recover from serious errors. If an error occurs, the system is restarted, using the machine state from a previously saved checkpoint. This is a complicated process, of course: the system has to be able to checkpoint fast enough so that nothing gets out of the core to pollute memory or change anything that can’t be backed up, Haff indicates. Nevertheless, it’s a commonplace in mainframe environments and even has a precedent in the RISC/Unix world (Fujitsu’s SPARC-powered PrimePowered Unix servers.)
It will soon be a commonplace on AIX, too; POWER6 will support a new feature called “processor recovery.” It’s been a long time coming, says Haff—which begs a compelling question. “Why [is IBM doing this] now, after holding off on going down this path for so long? In part, it’s a consequence of shrinking process technology. With each shrink, susceptibility to soft rays increases; 65 nm is therefore just that much more likely to benefit from this type of mechanism than 90 nm or 130 nm,” he explains.
“However, that’s only part of the story; IBM doesn’t believe that soft-error rates are going to particularly skyrocket at 65 nm. It’s more a response to customer demand … as well as a reflection of the increasing mainframishness of large Unix systems. As more and more workloads get consolidated on fewer and fewer processors—through virtualization and other means—there’s increasing pressure to eliminate or reduce single points of failure throughout the stack from top to bottom.”