In-Depth

ETL: Tools Get Closer to Real Time

New tools emphasize drive for near-real-time enterprise data access, include data quality components

Two enterprise giants last week announced new versions of software that facilitates access to information stored in a variety of different data sources.

Ascential Software Corp. and SAS Institute Inc. each unveiled updates to their extraction, transformation and loading (ETL) tools. Not coincidentally, both vendors positioned their ETL tools as key pieces of their real-time enterprise (RTE) software suites.

RTE (http://www.tdwi.org/research/display.asp?id=6634&t=y) describes an architecture in which heterogeneous systems are linked together to provide business decision makers with access to critical information in real-time or near-real-time. The emergence of real-time as a focal point of activity for many business intelligence (BI) vendors—Business Objects SA, Cognos Inc., Informatica Corp., and SAS, among them—has had repercussions for ETL, an established technology traditionally used to support batch processing in many enterprise environments.

Increasingly, ETL vendors—led by Informatica, which announced its PowerCenter 6.0 last year—have revamped their offerings to support real-time or near-real-time access to enterprise data. As a result, ETL has once again become a hot property.

Ascential's Enterprise Integration Suite

Ascential Software, among the last of the pure-play ETL vendors, is poised to capitalize on the reemergence of that technology. The company last week announced Version 7.0 of its Enterprise Integration Suite (EIS), enhancing it with a separate offering, dubbed Real Time Integration Services (RTI), which exposes EIS 7.0’s data quality, data transformation, parallel processing, and meta data capabilities in real time in the context of a service-oriented architecture (SOA).

Ascential representatives stress that in spite of the company’s roots, EIS 7.0 and the new RTI are about a lot more than ETL. “It’s going beyond ETL and starting to redefine what data integration means. We’re responding to a very broad set of business demands for managing business integration,” comments Steve Brown, executive director of product marketing for Ascential.

Brown acknowledges that several ETL specialists have been acquired by BI players (Acta and Sagent) or have themselves become BI competitors (Informatica), but says that Ascential will complement established enterprise application integration (EAI) and BI vendors by focusing solely on data integration. “You can get your different types of technologies, buy the best-of-breed applications from where you need them, but when it comes to the requirement of complex data integration, you can get the Ascential Suite.”

The Ascential EIS 7.0 suite comprises Ascential’s DataStage ETL tool, along with a data quality tool (QualityStage, formerly Integrity) and a data-profiling tool (ProfileStage, formerly MetaRecon). New features include complete National Language Support, along with an enhanced GUI. EIS 7.0 features new “Intelligent Assistants”—wizards and pre-packaged tasks that help to automate many tasks in the Ascential environment.

The new RTI services comprise what Brown describes as a separate subsystem for EIS 7.0. “This allows the entire set of benefits of this integration suite to be made available in an encapsulated form across an enterprise through the service-oriented strategies of Web services, Java services, or Enterprise Java Beans,” he comments. “RTI is a separate kind of a broker, an integration broker, that manages the communication and dispatching of complex data integration jobs, making them available to other applications.”

SAS's Enterprise ETL Server

Although SAS is best known for its analytic software, the Cary, North Carolina-based BI powerhouse has marketed its own ETL tool since 1996. In a BI market space in which many of SAS’ competitors—such as Business Objects and Cognos—have OEM-ed ETL tools, SAS’ decision to go it alone on the ETL front is noteworthy—even if its ETL tool has been deployed largely among is own installed base, admits Frank Nauta, director of product management for SAS’ warehousing, data integration, and data quality initiatives.

Last week, SAS announced Enterprise ETL Server, a combination of its ETL solution with three new products—SAS ETL Studio, SAS Metadata Server, and SAS Data Quality Solution.

Nauta says that ETL Studio has been designed to automate many tasks in the Enterprise ETL Server environment. To that end, he explains, it features a GUI interface and a variety of different wizards. “Consultants can use the GUI, for example, to join five Oracle tables with an SAP table, and we’ve created a new tool, a completely wizard-driven interface, so that no training is required to build a job joining, splitting, and moving data around,” he explains.

The new ETL Studio—which replaces SAS’s existing Warehouse Administrator tool—has also been designed to support group or team development, with new version control features.

Nauta says his company is counting on Enterprise ETL Server’s new GUI interface and wizard functionality, along with its data quality component, to expand the product’s presence beyond SAS’s traditional installed base. “The only thing I’m going to offer to make non-SAS customers buy this stuff is the message about data quality—especially in a real-time scenario. You can’t make the mistake of pushing junk data to your decision makers and have them screaming at you that they can’t trust the data because the data is questionable.”

ETL Evolution

Judith Hurwitz, a principal with Hurwitz and Associates, believes that the emphasis on real-time integration will reinvigorate ETL. “When ETL came about, it was because you had to move these massive amounts of data in the mainframe world, and in a specialized sense, it’s now become moving any data from any place,” she observes. “When you need to bring pieces together in real time based on a business scenario, you need to be able to transform, move, and load data. So you’re back to ETL.”

One indication of the way in which ETL solutions have matured is that both the Ascential and SAS ETL suites incorporate data quality components. Some analysts—such as Mike Schiff, principal of data warehousing consultancy MAS Strategies—have pointed out that as companies connect systems in real-time, data quality becomes increasingly suspect.

“There’s really no process there that ensures that the data is cleansed or integrated, so they often have to decide between sacrificing the integrity of the data or processing the data so fast,” Schiff has noted.

SAS’s Tauna agrees: “Obviously, with real-time, whenever somebody is doing integration in real-time, data quality is more important than when you’re doing it in batch, where you can clean it better and easier.”

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.

Must Read Articles