Big Blue Flexes Its Data Integration Muscles

IBM now fields the industry’s strongest overall data integration stack, industry watchers say, but Big Blue isn’t resting on its laurels.

It’s hard to believe, but more than two and a half years have passed since IBM Corp. acquired the former Ascential Software Corp. That’s equivalent to an eon in technology time.

Over that same period, Big Blue has stitched together the makings of a respectable—perhaps a market-leading—data integration platform, headed by its IBM Information Server product. That offering—which supports data cleansing, extraction, and transformation—draws on a passel products or related technologies, including WebSphere DataStage (the former Ascential ETL tool), along with WebSphere QualityStage (nee Ascential), and others.

As a result, according to market research giant Gartner Inc., IBM now fields the industry’s strongest overall data integration stack: Gartner’s latest "Magic Quadrant" survey puts Big Blue out in front of longtime champ Informatica Corp. in the data integration "Leaders" category—ahead of traditional players such as SAS Institute Inc. and iWay Software (both "Visionaries"), and out in front of Business Objects SA, Oracle Corp., and Microsoft Corp.

At this year’s Information On Demand conference held last week, Big Blue didn’t look to be resting on its laurels. Officials touted a number of new data-integration-oriented news items—including the availability of version 9.5 of its DB2 Warehouse—and also trumpeted IBM’s progress in executing on the "dynamic warehousing" strategy it announced earlier this year (see

Dynamic warehousing is based on a number of tenets, all of which play to Big Blue’s strengths (or perceived strengths) in the broader information integration market. According to Marc Andrews, program director for data warehousing with IBM, dynamic warehousing addresses, first, the need for real-time, operational access to information stored in the data warehouse.

Elsewhere, he continues, dynamic warehousing emphasizes the importance of analytics—and particularly analytics embedded as part of the business process. Other key tenets include the need to get at and extract knowledge from unstructured information, and the creation of a data integration infrastructure—comprising not just information integration, per se, but MDM, data quality, and other technologies—that is tightly integrated with the warehouse itself.

The DB2 Diimension

That’s where the revamped DB2 Warehouse 9.5 enters the picture, according to Andrews. It boasts several real-time (or near-real-time) performance improvements, including a new "extreme" workload management facility. This goes beyond existing techniques (such as priority-based processing) because it involves managing DB2 Warehouse’s use of system resources so organizations can ensure that operational applications get the real-time response times they need, says Andrews.

Elsewhere, he continues, DB2 Warehouse now boasts an embedded OLAP facility, too. "[This enables] more complex analysis of multiple business variables and dimensions, without the need to extract data to specialty OLAP engines," he argues.

Unlike rivals Microsoft and Oracle, DB2 doesn’t ship with a native, IBM-built OLAP engine; until two years ago, Big Blue licensed—and substantially tweaked—OLAP technology from the former Hyperion Solutions Corp., which it resold as DB2 OLAP Server (see About a year after it purchased analytic development specialist AlphaBlox Inc., IBM ended its OEM relationship with Hyperion, which encompassed DB2 OLAP Server for distributed platforms as well as OLAP implementation for its mainframe (System z) and minicomputer (System i) platforms.

The reintroduction of a native OLAP facility is, of course, a key deliverable for Big Blue. On top of this, however, DB2 Warehouse 9.5 also boasts support for analysis of unstructured data. (DB2 Warehouse's new OLAP capabilities are, in fact, based on the AlphaBlox assets Big Blue acquired three years ago. According to Andrews, IBM has made a "significant investment to scale out those capabilities and integrate them directly into the underlying warehouse.")

Estimates vary as to the amount of unstructured information in the typical enterprise—data integration advocates like to cite particularly chilling figures (70 percent or more), with the obvious implication being that organizations are ignoring an enormous chunk of mission-critical data—but no one disputes that the ability to get at and analyze unstructured information will likely grow in importance.

According to a recent study conducted by TDWI Research, for example, unstructured data today accounts for less than one-fifth of the data consumed by enterprise data warehouses. In other words, confirms senior manager Philip Russom, the bulk of the information consumed by data warehouses is structured or semi-structured.

On the other hand, Russom indicates, unstructured data accounts for a much larger percentage of all enterprise data—perhaps as much as one-third. As enterprises work to federate and un-silo this data—using enterprise information integration (EII), BI search, text analytics, and other technologies—they’ll almost certainly work to pipe unstructured information into the warehouse.

Not surprisingly, IBM and other vendors—including arch-rival Informatica (which offers both EII—via an OEM relationship with specialist Composite Software—and unstructured information access capabilities) and Business Objects SA—are already outfitting their information integration platforms to do just that. "[This enables] organizations to extract knowledge from unstructured information, which can be combined with structured data for increased insight into customer and product issues," Andrews confirms.

Information on Demand

IBM also announced several new Information On Demand "Industry Frameworks." These combine DB2 Warehouse 9.5, IBM Information Server, and Big Blue’s new MDM Server with industry-specific data models, process models, business solution templates, and other industry-specific solution assets and services.

The idea, Andrews says, is to let organizations exploit IBM’s Information On Demand capabilities in specific verticals, as well as to address key business initiatives—including multi-channel marketing, customer insight, financial risk management, consumer-driven merchandising, and supply chain management. "These frameworks are focused on enabling organizations to deliver trusted information in real-time to every person across the organization as part of every transaction," he stresses.

Thirty months after its acquisition of Ascential, IBM now pushes a one-stop-shop for any conceivable data integration scenario in its aptly-branded Information Server product. That offering draws on both Ascential and non-Ascential assets, and for this reason IBM officials also point to several related acquisitions, including—significantly—the purchase of the former FileNet Inc. last year and that of the former DataMirror several months ago.

Sean Crowley, program manager for Information Server with IBM, says that Big Blue’s Information Server whole (and its associated universe of related products, including DB2 Warehouse 9.5 and IBM MDM Server) exceeds the sum of its once-disparate parts.

"[I]nformation integration is more than an ETL-only proposition and was a driving factor in the creation of IBM Information Server … [which] combines the capabilities of data profiling, data quality, ETL, federation, replication, and change data capture on a single platform, supported by [a] series of shared services for parallel processing, connectivity, common administration, metadata management and service oriented deployment," he says.

That’s not all, according to Crowley, who praises Information Server’s ability to quickly integrate with other information integration technologies, such as Big Blue’s MDM, content management, and data warehouse assets.

"With the capability to integrate with IBM's MDM, content management, data warehousing, and other [Information on Demand] technologies, the benefits of having a common administration platform upon which to build integration patterns and deploy in a variety of usage scenarios—[including] batch, real-time/event-driven, [and] SOA—provides an effective way to accelerate the delivery of trusted business information while maintaining a consistent approach to information integration."

Must Read Articles