In-Depth

IBM’s Data-Integration Strategy Takes Shape

With canned ETL already on tap from Microsoft and Oracle, what—if anything—does IBM have planned for DB2?

Last week, IBM Corp.’s information integration strategy came a bit more clearly into focus with announcements of beta programs for the next versions of its WebSphere Information Integrator product (code-named “Serrano”), as well as the former Ascential Software Corp.’s data integration suite, code-named “Hawk.”

Less clear, of course, is just what IBM has in store for its DB2 database on the information integration front. Even though extraction, transformation, and loading (ETL) capabilities are shipping by default in competitive offerings from Microsoft Corp. and Oracle Corp., DB2’s own ETL story remains obscure.

WebSphere Information Integrator is a federated data-access tool designed to facilitate access to heterogeneous data sources, comprising both structured (e.g., SQL) and unstructured (e.g., e-mail) data. In the business intelligence (BI) world, this is known as enterprise information integration, or EII for short.

Ascential’s data integration stack, on the other hand, is anchored by an ETL tool called DataStage, which is complemented by additional data-integration-oriented features, such as data quality (Quality Stage) and metadata management. It features connectivity with a range of structured and semi-structured data sources, along with powerful transformational (i.e., reformatting) and cleansing capabilities.

They might sound similar, but EII is a decidedly different animal from ETL: EII describes a means of connecting to data residing in heterogeneous host systems or applications, while ETL typically describes the process of extracting data from these data sources, where it’s loaded into a data warehouse (for batch-oriented operations) or an operational data store (a less common, but emerging, scenario for real-time or near-real-time analysis).

DB2 and ETL

DB2 today does have a limited, SQL-based data-integration capability, but this is a far cry from the aggressive ETL make-over Microsoft is prepping in SQL Server 2005—and which Oracle is also rumored to have on tap for the next version of its Warehouse Builder component.

Mark Register, the vice-president of marketing for Big Blue’s information integration program, demurs when asked about the possibility of some of IBM’s information expertise—from WebSphere Information Integrator or Ascential’s data integration stack—finding its way into DB2.

In fact, Register sees it as an issue of IBM tightly coupling the (once-autonomous) Ascential technologies with DB2.

“Specifically on the DB2 question, there’s a whole lot of architectural teams working on the ideal strategy going forward, but we’re not optimizing our product for DB2. That’s one of the things I want to emphasize,” he comments. “The strength of Information Integrator is the heterogeneity of the product set, the fact that we can work with DB2, Microsoft, Oracle, Sybase, and Teradata.”

A WebSphere Facelift for Former Ascential Techs

For the record, Ascential’s DataStage ETL product is now known as WebSphere DataStage. Ditto for Ascential’s data-quality offering, Quality Stage.

Ascential itself used the code name “Hawk” to describe its long-incubating, next-generation data integration suite, a convention that IBM has retained.

IBM last week announced two new products: IBM Rational Data Architect and WebSphere Information Analyzer. The former is comprised of several productivity enhancing tools, scripts, and services—mostly exposed via wizard-driven interfaces—while the latter is positioned as an “end-to-end” data-profiling tool.

Rational Data Architect, for starters, is based on the open source Eclipse platform. It’s a tool that helps data architects model, discover, map, and analyze data across multiple information sources.

“When you define data-transformation rules, these can be published using a common service deployment model, [so] you can deploy those things now into an SOA,” Register comments. “It’s done very simply without having to have J2EE or Web services skills. Instead, it’s all done through a wizard.”

Like the other Hawk technologies, WebSphere Information Analyzer (which went by the internal code name “Sorcerer”) shares a central repository with IBM WebSphere DataStage and IBM WebSphere QualityStage. “It’s a data auditing, free-form analysis product [that helps business users] understand data and source systems, be able to actually see the content and quality of that data, and build definitions from that,” Register explains.

It ships with what Register says is a substantially overhauled UI, designed to make it intelligible to both data integration professionals and business users. “We really started with a blank slate, built this entirely new type of interface, and threw all of the old rules out the window. So we’re not only simplifying information integration, but putting it closer to the hands of the business analyst,” he comments.

The next versions of WebSphere Information Integrator and the Hawk technologies are due out by the end of the year, IBM promises.

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.

Must Read Articles