In-Depth

Studies in Scale: Updated Processor Can Take Analytics to New Heights

The next generation of Intel-based systems has the potential to transform how organizations manage, process, and consume data.

Business intelligence (BI) watchers don't typically pay much attention to what's happening in the commodity hardware market. They should. The commodity servers of today are considerably more powerful than their predecessors of just two or three years ago.

In fact, when Intel Corp. releases its next-gen Nehalem-EX processor this month, non-clustered commodity systems will start scaling into RISC (i.e., UltraSPARC) or EPIC (i.e., Itanium) territory, with support for system memory configurations of up to 1.5 TB. That's several orders of magnitude greater than just a half decade ago, when x86 systems running on 32-bit Windows maxed out at just 3 GB of system memory.

The size and scale of commodity x86 systems -- deployed in both the enterprise back-end and on the client desktop -- has the potential to transform how organizations manage, process, and (perhaps most importantly) consume data.

A Case Study in Scale

The difference in scale is staggering. Consider the case of the IBM Cognos TM1 OLAP engine. TM1's biggest differentiator is its ability to run entirely in memory. On plain-vanilla 32-bit platforms, that means an architectural limit of 4 GB of memory. Practically-speaking, the memory available to TM is considerably less than that. On 32-bit Windows systems, for example, TM1 can access a maximum of 3 GB of memory because Windows allocates 2 GB of memory to itself by default, leaving a maximum of 2 GB for applications. (Microsoft first introduced support for up to 3 GB of memory with its Windows NT Server 4.0 Enterprise Edition in late 1997.)

Although 3 GB might have seemed like an inexhaustible configuration in 1997 or 1998, by 2003 and 2004 it had become considerably more mundane. The former Applix Inc. did market a 64-bit version of TM1 for RISC-Unix platforms, and in mid-2003 introduced a variant designed to run on 64-bit Windows Server 2003 (which at that time was an Itanium-only product). Customers that preferred to run TM1 on commodity x86 hardware were increasingly bumping up against the 4 GB limit, however.

Today, the addressable limit for TM1 and other in-memory analytic applications is basically non-existent, chiefly because 64-bit systems aren't so much constrained by architectural as by economic limitations. The effective architectural limit of a 64-bit Nehalem-EX CPU is 256 TB (which translates into a 48-bit address space); practically-speaking, however, the size of a 64-bit system is more likely to be limited by a combination of factors, including OS-specific limitations and manufacturing capacity, to say nothing of physical considerations -- such as the number of memory slots OEMs can squeeze on to system boards.

Hungry Analytic Apps

There's another reason the next generation of Intel processors is poised to scale where no x86 architecture has scaled before: the 64-bit address space is no longer a technology proposition in search of a market. These days, industry watchers say, there's plenty of appetite for 64-bit performance, in both very large data warehouse (VLDW) and advanced analytic configurations.

Michael Corcoran, chief marketing officer with Information Builders Inc. (IBI), sees a growing demand for advanced analytic technologies. He cites the popularity of the open source R programming language, which IBI -- along with JasperSoft Inc., Netezza Inc., and several other vendors -- tap as a predictive analytic engine for their bread-and-butter BI or DW products.

R is an in-memory analytic environment, says Corcoran. It's one of several analytic tools that uses an in-memory architecture.

In-memory is garnering more attention. In a research bulletin published last month, for example, IDC analyst Carl Olofson argued that existing DBMSes -- i.e., DB2, SQL Server, Oracle, and Sybase ASE -- will either be shifted to run entirely in memory or will otherwise be "augmented" by an in-memory facility.

Corcoran says the industry has come full circle on the question of in-memory's advantages. Half a decade ago, he concedes, an in-memory topology might've been viewed as a drawback because most commodity servers topped out at 4 GB of system memory. Thanks to cheap memory, increases in processor density -- Intel's Nehalem-EX processors pack as many as eight cores into a single socket, and the ubiquity of 64-bit hardware and software, the reverse is now the case, Corcoran argues.

The salient point, he continues, is that a technology such as R drastically lowers the bar -- both terms of both cost and expertise -- for an organization to deploy predictive analytic technology. Because it's an open source project, it's nominally free -- IBI, for example, markets an R front-end GUI for which it charges a fee; the base R environment is free, What's more, Corcoran observes, the R community has developed thousands of application- or industry-specific models.

Finally, Corcoran says, advanced analytic technologies such as R benefit greatly from a 64-bit address space. "The R technology is so well adopted, it's almost a no-brainer for customers," he observes. "If you look at it, SAS now integrates with R … because they have to. … so it's not just universities anymore that are using [R], it's financial services, it's retail. Where a few years ago [R] might have been at a disadvantage [relative] to SAS because it's in memory, now that's an advantage because you have these systems with gigabytes of memory."

A 64-Bit Behemoth on Every Desktop

Ian Fyfe, senior director of products with open source software (OSS) BI specialist JasperSoft agrees. JasperSoft is the commercialized offshoot of the OSS JasperReports project. It, too, touts an in-memory analytic implementation based on R.

"We call it in-memory analysis, but we've actually made it configurable, so you can actually choose whether it's done in memory [on the desktop] or sent back to the data source," Fyfe explains. Some customers or individual business units actually prefer to host analytic workloads on the desktop, he says.

This is an altogether new wrinkle, auguring what some have described (or decried) as a return to "Workgroup BI."

In many cases, today's desktop computer is the equivalent of a back-office server of half a decage ago. It bristles with processors (dual- and quad-core chips are common) and is often equipped with 4 or more GB of RAM. That's helped fuel the rise of end-user analytic offerings such as QlikView (which touts an in-memory analytic capability), Lyza (which marries an in-memory analytic engine with a columnar data store), and Microsoft's Project Gemini effort, among others.

Fyfe sees the advent of in-memory desktop analytics as a case of vendors such as JasperSoft giving users what they want. "We certainly value our partnerships with vendors like Vertica and Infobright, so we'll continue to support that [data warehouse-based analytic] experience, but we have to be Switzerland in some ways. We don't take a stand [as to] which is the best or the only way [to do reporting or analytics]. We just want to help [customers] solve their problems."

Peta-scale Data Warehousing

Not surprisingly, analytic database players are very excited about Intel's forthcoming Nehalem-EX update. Consider data warehouse appliance specialist Netezza Inc., which -- with its TwinFin make-over late last year -- committed itself to a mostly-commodity hardware strategy. (Netezza, like analytic database upstart KickFire Inc., uses a proprietary FPGA add-in card to help accelerate performance.)

Phil Francisco, vice president of product management and product performance with Netezza, describes a commodity hardware strategy based on Intel's Nehalem chips as a "no-brainer." While Francisco -- who sat down with BI This Week at last month's TDWI Winter World Conference in Las Vegas -- didn't specifically address the forthcoming Nehalem-EX chip, he did discuss the evolution and maturation of Intel's x86 server-class silicon.

"These [Intel-based] systems are just going to get bigger and faster. Almost every time [Intel] announce[s] a new [microprocessor], they're basically doubling the number of cores they can fit [on to a single chip], so you have double the [processor] cores in the same physical space. That means we're able to almost double our performance just by staying current with what [Intel is] doing. That alone is letting us scale into the petabytes," said Francisco.

Thanks to the commoditization of 8 GB and 16 GB memory DIMMs, as well as the revamped Nehalem-EX's base support for up to 1 TB of system memory, some analytic database players are enthusiastic about the mainstreaming of petabyte-scale data warehouse configurations.

Barry Zane, CTO with analytic database specialist ParAccel Inc., is particularly excited about the scalability potential of Intel's Nehalem chips.

Zane knows about hardware: he spent time with in-memory OLAP specialist Applix during the 1980s and 1990s, prior to landing with Netezza in the early 2000s. "Our perspective has changed so much over just the last few years," he comments. "We used to think of [warehouses of] 10 or 20 terabytes as [defining] the high-end [of the DW market], … but we [ParAccel] have customers with hundreds of terabytes.

"That just wasn't possible with the technology you had five years ago," he continues, arguing that "it's partly because of Intel, which keeps putting more and more [processor] cores on their chips, and partly because of our database technology. You have to have software that's able to effectively use all of that [processing] hardware."

John Thompson, CEO of U.S. operations for analytic database veteran Kognitio, likes the commoditization of hardware for several reasons. For one thing, he argues, if everybody’s using the same hardware -- i.e., if each and every vendor lines up on the same playing field -- individual players have an opportunity to differentiate on the basis of features, functionality, or database-specific performance.

“The hardware has become faster [and] the data loads have become bigger than ever before, so in-memory analytics makes more sense than ever before,” comments Thompson. “There’s going to be this point, and we’re probably already there, where the drop [in query response times] becomes rather insignificant, maybe from half a second to one-third of a second. That’s really rather insignificant, isn’t it?”

In such an environment, Thompson likes Kognitio’s chances with its WX2 database. “One of the things that’s different about ... WX2, [is that] it was designed as an in-memory database. That was its original design remit, so you have an in-memory database with unlimited in-memory aggregation that you can scale [in clustered configurations] into the tens or even hundreds of terabytes,” he says.

Such examples usually assume a demand for triple-digit-terabyte or even peta-scale data warehousing, but the virtues of 64-bit processing power and a 64-bit address space extend into other areas, too.

That's where KickFire comes into the picture. Its sweet spot is in the sub- and single-digit-terabyte market, says vice president of sales and marketing Richard Nieset.

At this point, KickFire scales its rack-mounted appliances out to 5 TB. Nevertheless, Nieset, too, talks up the scalability and performance advantages of Intel's 64-bit chips. With the previous generation of Nehalem chips, Kickfire could pack 16 cores into a single 5U system. Although the base KickFire appliance consumes up to 32 GB of RAM, its FPGA-powered Query Processing Modules (QPM) -- which plug into the KickFire appliance via a PCI-X expansion module -- can be populated with up to 256 GB. To a degree -- and quite aside from what it's able to achieve with its proprietary FPGA ASIC -- part of Kickfire's scalability strategy involves staying current with Intel's Xeon improvements.

"Every time they come out with a new [Xeon-class processor], we increase our capacity," says Nieset, who notes that Nehalem-EX is an octa-core beast. "In the same [2U rack] enclosure, we'll be able to fit twice as many [processor cores]," he points out.

On the other hand, KickFire's pitch is less dependent than those of its competitors on triple-digit-terabyte or even peta-scale data warehousing, Nieset says. "Our sweet spot is really those small [configurations], where a customer has maybe half a terabyte in an Oracle or SQL Server data warehouse and they have just unsatisfactory [query] performance," he explains.

That said, Nieset stresses, KickFire's analytic database appliances certainly benefit from both a 64-bit address space and manufacturing economies of scale. (Its systems use both an enormous PCI-X RAM cache -- from 32 to 256 GB -- and serial-attached SCSI drives.)

Data warehousing vendor Teradata Inc. is perhaps uniquely sensitive to both the promise and the pratfalls of a commodity Intel strategy. The company has had an all-Intel policy for two decades now. Over the same period, it's established a claim to leadership in the high-end data warehousing segment. Teradata's competitors -- including not just Netezza, ParAccel, and KickFire, but Aster Data Systems Inc., Dataupia Inc., Greenplum Inc., Kognitio, Vertica Inc., and others -- now field systems based on the same Intel parts. (Kickfire uses an FPGA to accelerate its queries; Netezza not only uses an FPGA, but resells systems based on IBM and NEC hardware. Both OEMs market their own value-added Nehalem chipsets.)

Is Teradata concerned that the rising scalability tide will lift all competitive boats? Not according to vice president of product and services marketing Randy Lea.

"No, I don't think so. We have seven … customers in our petabyte club, and they're all Intel servers," Lea comments, arguing that Teradata -- a decade before Netezza, Dataupia Inc., Kickfire Inc., and ParAccel Inc. -- first demonstrated the scalability potential of industry-standard Intel processors. "We broke the mold and proved that Intel servers can work [and] we're riding the wave [of improving scalability] with Nehalem and other chips," he continues, conceding that Teradata has plenty of competition in this regard.

"Does that mean [Teradata's competitors will] be able to do great things with this [Intel] technology? It still comes down to the database," Lea argues. "We have the industry's best workload management [capability]. No one even comes close to us there. We're really the only ones that have a parallel-everything architecture … so I think you'll see potentially large [data warehouse] configurations, but Teradata is still best when it comes to [managing these configurations in an] enterprise data warehouse."

Must Read Articles