In-Depth

Informatica Unveils Next-Gen ETL Suite

Officials call it the most important release in company history. But don’t look for PowerCenter 8 this week….

Informatica Corp. this week announced its long-awaited PowerCenter 8 release. The new PowerCenter 8 (nee “Zeus”) boasts new federated data access capabilities, support for unstructured data, grid computing features, developer-friendly enhancements, and availability improvements.

Informatica officials call the revamped PowerCenter 8 the most important release in company history. And to the extent that the rules of the data integration arena have changed drastically over the last seven months, that’s true. But don’t look for PowerCenter 8 this week. Informatica expects to ship its next-generation data integration suite by the end of the year.

“Corporations are performing data integration on an unprecedented scale, as they migrate legacy systems, consolidate multiple instances, synchronize customer and product data across systems, and continue to build up data warehouses,” says Philip Russom, senior manager of research and services at TDWI. “In other words, data integration has moved into the realm of enterprise IT infrastructure because it now supports a wide range of mission-critical and time-sensitive applications. Informatica’s PowerCenter 8 addresses data integration’s newly intensified infrastructure requirements by providing high availability and scalability through grid capabilities, plus right-time information delivery options through federated queries and Web services.”

As in previous versions, PowerCenter 8 is a hybrid offering of sorts. It taps Informatica’s bread-and-butter ETL and metadata management capabilities, for starters, but it also leverages enterprise information integration (EII) technology developed by Composite Software. And although PowerCenter 8 does include a data quality component, this feature, too, is provided by third-parties, namely, Firstlogic (now a subsidiary of Pitney-Bowes Inc.) and Harte-Hanks Trillium. ETL competitors IBM Corp. and SAS Institute Inc., on the other hand, offer in-house data quality solutions (QualityStage and DataFlux, respectively), along with—in IBM’s case—homegrown EII capabilities, too.

But Ivan Chong, vice-president of business development with Informatica, says his company’s partner-centric strategy is an advantage—and not a market-determined compromise. IBM’s all-in-one approach is a monolithic offering that actually turns a lot of customers off, Chong says, and SAS is primarily a player in its own accounts. And other independent ETL purveyors—e.g., Ab Initio and SunOpsis—don’t have homegrown data quality or EII solutions of their own, either. “We really think it’s about giving [customers] the flexibility to do this on their own terms, without dictating to them what [technologies] they can and can’t use,” Chong says, noting that customers can also opt to plug other data quality technologies into PowerCenter 8’s service-oriented underpinnings. “So in [PowerCenter 8], if they want to use Firstlogic [for data quality], we already have that [integration] in there, so as far as the user is concerned, it looks just like the rest of the PowerCenter interface.”

The strength of this approach, Chong maintains, is that it lets Informatica offer customers the data quality technology that best suits their needs. Between Firstlogic and Trillium, he says, Informatica’s data quality capability supports 195 different countries, giving it a breadth of language and dictionary coverage that IBM, SAS, and other competitors will be hard-pressed to match.

The ETL’s the Thing

The ETL market has changed remarkably over the last few months, thanks in large part to the acquisition of the former Ascential Software Corp. by IBM. Ascential’s ETL, data quality, and data profiling technologies helped to round out Big Blue’s federated information access stack and arguably made Armonk the player to beat in the data integration space.

Not from Informatica’s perspective, of course. For the last 24 months, that company has backtracked from its ill-advised forays into analytic applications—Informatica formally abandoned its line of canned analytic apps in August of 2003—and ad hoc query and analysis tooling. (Even now, Chong and other company officials are strangely silent on the question of Informatica’s PowerAnalyzer tool, for example.) Instead, Informatica has played to its biggest perceived strength—that it’s one of the few remaining neutral parties in an increasingly fractious data integration landscape. It’s a strategy that has a lot of merit, says Chong, especially now that the three largest relational database purveyors are also marketing (increasingly ambitious) ETL technologies of their own.

“We’re the Switzerland” of data integration, Chong claims, repeating Informatica’s most field-tested talking point. “But now [with PowerCenter 8], we have the first data integration technology that enables universal access to data, whether it’s federated access [to data in existing repositories], whether it’s moving that data into an enterprise data warehouse, or whether it’s the data integration component of an enterprise SOA [service-oriented architecture] effort.”

With all due respect to Chong’s enthusiasm, IBM can credibly make the same claim. Its DB2 Information Integrator tool was a seminal EII offering, for example, and Ascential’s DataStage ETL technology was the most popular independent alternative to Informatica’s PowerCenter. But Chong and other Informatica officials qualify their claim by pointing out that Ascential’s ETL and data quality stack (DataStage, QualityStage, and ProfileStage) wasn’t a completely integrated offering when that company was purchased by IBM. Also worthy of note is that IBM itself hasn’t completely integrated Ascential’s technology assets with its own WebSphere Information Integrator federation technology.

At the same time, Informatica hasn’t yet shipped PowerCenter 8. Chong promises that the reloaded PowerCenter will ship by the end of the year, but IBM’s Eric Sall promised earlier this year that Big Blue would ship a substantially overhauled version of Ascential’s technology stack (code-named “Hawk”) by the end of this year, too. IBM has already shipped WebSphere DataStage TX, an EDI-focused version of Ascential’s flagship DataStage ETL tool that’s based in part on Hawk technologies.

So What’s New?

For starters, there’s federated data access. It’s an admitted departure for Informatica, which has long championed ETL (and related services, such as its SuperGlue metadata management offering) as the be-all-end-all of data integration. Not that company officials don’t continue to do so. “We’re hearing from our customers that [federation] is a stop-gap measure,” says Chong. “It’s a way they can temporarily expose data that’s siloed in resources; or if there’s an acquisition, it’s a way they can quickly expose data in [the acquired company’s] new systems. It’s for situations like that.” In the long run, Chong maintains, organizations will embrace full-blown data integration—with ETL as its centerpiece—to more robustly connect their enterprises.

“If you define EII as mere distributed or federated queries, then a long list of IT products can do EII, including just about every ETL tool, report server, and database management system out there.” said Philip Russom. “True EII runs these queries in real-time—not the overnight batches typical of ETL—and also provides a development environment for designing the virtual views of disparate data that the federated queries run against. The Zeus release introduces some of these capabilities to PowerCenter by embedding the EII kernel from Composite Software. More EII features will come in later releases.”

For TDWI’s definition of EII, click here: http://www.tdwi.org/News/display.aspx?id=7692

Composite Software’s data federation technology also lets PowerCenter get at semi-structured and unstructured data, such as spreadsheets, email, word processing documents, presentations and .PDF documents. As a result, organizations can use PowerCenter to construct composite or virtual views of structured, semi-structured, or unstructured data. They can mix traditional batch ETL data, trickle-feed or real-time ETL data, and live views of application-specific data via data federation technology.

Elsewhere, Chong says, PowerCenter 8 ships with improved availability features, such as failover capabilities. Like SAS’ Enterprise ETL tool, PowerCenter 8 will introduce improved support for grid computing, which can be used as a complement to traditional parallel processing. After all, ETL jobs that involve complex workflows or sophisticated transforms can almost certainly benefit from grid’s extra processing muscle. Other improvements include support for custom Java transformations, which lets organizations leverage existing Java libraries from within PowerCenter 8 while still exploiting Informatica’s metadata and data security capabilities.

Finally, PowerCenter 8 will support mapping templates that help increase ETL developer productivity by reducing repetitive tasks. They help automate the generation of process flow maps for common tasks, and they also support tools such as Microsoft Excel and Visio to help improve collaboration between ETL developers and business analysts.

Must Read Articles