Reading the IBM/Netezza Tea Leaves

Few industry watchers foresee much in the way of serious overlap between Big Blue's existing data warehousing assets and Netezza's. However, IBM will probably have to make some tough choices, particularly on the software side.

IBM Corp.'s $1.7 billion acquisition of Netezza Inc. raised many questions, starting with the ultimate disposition of Netezza under Big Blue's stewardship. BI pros may also wonder how IBM plans to position Netezza relative to its existing Smart Analytic offerings.

Few analysts foresee much in the way of serious overlap between Big Blue's existing data warehousing (DW) assets and Netezza. Overlap, yes -- but overlap that is mostly benign.

Industry veteran Wayne Eckerson, director of education for TDWI Research, weighed in on just this issue shortly after Big Blue pulled the trigger on Netezza.

"[T]here's a lot of overlap between [Smart Analytics and Netezza]," he conceded, noting, however, that IBM "did position Netezza as a data mart for analytics and for [small- and medium-sized businesses]."

In this and a few other respects, he argued, Netezza and Smart Analytics are highly differentiated propositions.

"[Y]ou get everything including Cognos software for one SKU [with Smart Analytics] and have it shipped to you in two weeks, but it was still [based on] DB2 and it still required [some] configuration. As much as they would pre-configure it, it still took some time to get that up and running."

Although Netezza isn't quite plug, play, load, and go, Eckerson acknowledged, it does come close -- or closer than IBM's Smart Analytics bundles, at any rate.

Eckerson's TDWI colleague, Philip Russom, says the two technologies -- v Netezza's "TwinFin" DW appliances and IBM's Smart Analytics system bundles -- are designed to address two very different analytic segments.

"Netezza overlaps with IBM's Smart Analytics Systems and numerous other hardware/software bundles," noted Russom.

Compared to Smart Analytics -- or to any of IBM's other Big Data offerings, for that matter -- "Netezza is more for really big data analytics, as opposed to IBM's general-purpose analytic products," Russom continued.

He likewise distinguished between Netezza's analytic data warehousing chops and the analytic assets that IBM by way of its acquisition of the former SPSS Inc. "Compared to SPSS … Netezza is solidly for analytics based on what I call 'extreme SQL' -- [this] involves full-blown SQL applications, with thousands of lines of code and very, very complex queries against gigantic labyrinths of data."

Analytic Apples and Oranges

Industry veteran Merv Adrian, a principal with IT Market Strategy, offers a typically trenchant take. First of all, Adrian notes, IBM's Smart Analytics line covers a good bit of ground: it consists of the Smart Analytics System 9600 (based on mainframe hardware), Smart Analytics System 7600 (based on POWER hardware), and Smart Analytics System 5600 (based on x64 hardware).

Netezza says that data warehouse configurations of between 10 and 20 TB comprise its "sweet spot." IBM nominally targets this same 10-20 TB segment with its x64-based Smart Analytics 5600 system.

That's not all. A sweet spot is (by definition) far from bleeding-edge, which means that Netezza's most ambitious customers are scaling into triple-digit-terabyte configurations. The rub is that Big Blue pushes two of its Smart Analytic bundles -- the POWER-based 7600 and the mainframe-powered 9600 -- for such enterprises.

"Sampling and aggregation take too much detail away, [Netezza] says: to do analytics around the 'long tail' of distribution in your data sets, you need more than just a sample. Collecting 10s or 100s of terabytes and running predictive analytics and optimization techniques provides more insight than conventional BI and dashboard reports," Adrian writes. "But those high-end volume numbers are where IBM positions the 7600 and 9600 -- so even those platforms are potentially challenged by the deal."

IBM could have quite a messy overlap on its hands, assuming, again, that we're comparing Netezza's apples with Smart Analytics' apples.

The good news, if you can call it that, is that we aren't, Adrian concludes.

The approach championed by Netezza -- which prescribes both an analytic database and (more recently) an ecosystem of supporting analytic technologies -- differs drastically from IBM's model with Smart Analytics, which is predicated upon a tweaked and hotrodded DB2.

Netezza likewise relies on features in its hardware to a greater degree than does IBM -- particularly in its storage tier.

Adrian cites Netezza's in-database support for several key analytic technologies (the open source Hadoop implementation of MapReduce and the open source R language foremost among them), as well as a unique storage tier that arguably served as a point of departure for Oracle Corp.'s own Exadata storage topology.

"Netezza's storage innovations are based on 'Zone Maps,' which keep track of key statistics, such as the minimum and maximum value of columns in each storage extent," he explains. Because the Zone Map can quickly identify which data falls in a specified data range, it helps avoid "general table scans and the associated enormous I/O overhead they create," Adrian continues.

With the introduction of its TwinFin architecture last August (TwinFin runs on S Series blades from IBM), Netezza switched to a mostly commodity solution. TwinFin eschewed the PowerPC chips that Netezza had long used to drive its Snippet Processing Units (SPU) in favor of field programmable gate array (FPGA) daughtercards. Here, too, Adrian distinguishes between Netezza's apples and Smart Analytics' oranges.

"FPGAs make further 'smart' decisions: they PROJECT only the columns in the SELECT statement and RESTRICT to retrieve only the rows in the WHERE clause," he writes.

On the software side, the issue of overlap becomes muddier.

In addition to Hadoop and R, Netezza offers wrappers that support a handful of popular programming languages (Java, Python, Fortran, C, and C++).

It has also developed wizards to accelerate the creation of UDFs, Adrian writes. In addition, Netezza's new i-Class library "upped the analytic ante by adding … a library of functions that scale to use available memory to become maximally parallelized and callable from any language Netezza supports," he explains.

Elsewhere, Netezza offers support for both the GNU Scientific Library (which consists of some 2,000 functions) and the R-oriented CRAN repository.

All of this brings us back to a $1.7 billion question: how will Netezza fare under IBM's stewardship? Will IBM opt to effectively transplant Netezza as it exists today into its teeming information management universe, or will Big Blue -- especially on the software side -- choose not to bring forward some of the things that gave Netezza its renown? It's a near-run thing, according to Adrian.

"Will IBM port all these functions to DB2? Essentially, they are simply algorithms implemented close to the data in various UDF-like forms -- though 'simple' does a disservice to the quality and power of the work involved," he concludes. "Or will the existing Smart Analytics Systems continue to be marketed primarily on the basis of their convenient pre-integrated setup and library of industry models?"