In-Depth
Technology Preview: Oracle’s Next-Gen ETL Tool Comes into Focus
In the next-generation release of Oracle Warehouse Builder, due in January, Oracle officials promise a significant improvement
As just about everyone knows, Microsoft Corp. has promised to deliver substantially revamped ETL capabilities in its upcoming SQL Server 2005 database.
Microsoft isn’t the only relational database player with a retooled ETL solution on tap. That’s because Oracle Corp. is currently prepping a new release of its Oracle Warehouse Builder (OWB) ETL tool, which has quietly emerged as a full-blown competitor to enterprise ETL solutions from IBM Corp., Informatica Corp., and SAS Institute Corp. In the next-generation release of OWB (code-named “Paris”), Oracle officials—much like their counterparts in Redmond—are promising a significant improvement over the status quo.
First, the backstory. OWB turns five this year. Ironically, Oracle—which prides itself on its tradition of technology innovation—was actually second to market with integrated ETL: Microsoft kicked off the trend when it incorporated both ETL and OLAP features into its SQL Server 7.0 database, which shipped in 1998. Prior to delivering the first version of OWB in 2000, Oracle OEM-ed a series of data-mart suites powered by ETL technology from the former Sagent.
By all accounts, the first release of OWB was underwhelming. That’s a characterization even Oracle officials are comfortable with. “Early releases of the product, to be honest, probably weren’t what a lot of customers were hoping for,” says Paul Narth, a senior group manager with Oracle who heads up the OWB 10g R2 (“Paris”) product effort. “So they might have had exposure to it in the beginning, and many of them might have written it off. We’re finding that with recent releases, [customers] have started to come around. If they were exposed to it in the early days but haven’t checked it out since then, they owe it to themselves to give it another look.”
Narth isn’t just telling tales out of school, either. Last September, Gartner Inc. placed OWB in the Leader quadrant of its Magic Quadrant metric. Oracle is the only relational database vendor in the Leader field, where it’s rubbing elbows with established heavyweights such as Informatica and IBM/Ascential.
Priced to Move with Plenty of Connectivity
OWB’s biggest selling point—aside from its unparalleled performance on the Oracle 10g database, of course—is probably its price: OWB ships with Oracle’s Developer Suite, which retails for a flat $5,000 per named developer. (A version of OWB is also bundled with Oracle Business Intelligence, a flavor of the Oracle Application Server that’s optimized for BI.) Oracle doesn’t layer any processor or user-licensing restrictions on top of this flat fee, Narth claims—although there is an extra charge for connectivity into non-Oracle data sources, which include IBM DB2 UDB (as well as DB2 on the mainframe), Microsoft SQL Server, Sybase, Oracle’s ERP stack, PeopleSoft’s ERP stack, and—of course—SAP. Even with connectivity into non-Oracle data sources, Narth claims, OWB is a much more affordable option than the best-of-breed ETL tools.
It’s in the area of heterogeneous connectivity that the OWB “Paris” release—which is slated to ship in January of 2006—has a much better story to tell than its predecessors. Oracle facilitates connectivity into non-native data sources by means of its Oracle Transparent Gateways (OTG), a series of data source-specific adapters that are resold separately.
The OTGs in every flavor (e.g., for DB2, Sybase, PeopleSoft) have long supported bi-directional ETL—e.g., organizations can pull data from DB2 into Oracle, and can (theoretically) push data out from Oracle into DB2. Nevertheless, OWB itself doesn’t currently support bi-directional ETL. That is, in the current 10g R1 release of OWB, Oracle doesn’t let customers push data out from the Oracle database into non-Oracle data sources. When Paris ships in January, however, Oracle’s ETL tool will become a full-fledged ETL player—with a catch, of course.
“The reality is that we live in a heterogeneous world,” says Narth. “We could try to ignore that completely, but we would rather embrace that. We would rather that you work on Oracle, no apologies there, but if you have targets in other environments, we will push data to those.”
In this respect, OWB Paris will support bi-directional ETL into a bevy of different target platforms (including all major relational databases)—with the rather show-stopping exception of SAP. The technology to do so is probably in the OTG adapter for SAP, Narth concedes, but for obvious reasons, Oracle isn’t enabling it in OWB. Just to clarify: In the OWB Paris release, Oracle will support ETL from SAP applications into the Oracle database, but not vice-versa.
In Praise of Drag-and-Drop ETL
ETL is a complicated technology practice that requires domain- and tool-specific expertise. For some time now, best-of-breed ETL players such as the former Ascential Software Corp. (now IBM) and Informatica have struggled, in effect, to graft an ease-of-use feature set on top of an inherently complex ETL programming paradigm. To some extent, they’ve succeeded, delivering tools with object-oriented ETL modeling capabilities and drag-and-drop workflow features.
In the OWB Paris release, Oracle aims to deliver the same kind of user experience in its GUI-based ETL modeling tool, Design Time Client. “We also try not to force you to have [Oracle] PL SQL programming knowledge. At the end of the day, our goal is to have a declarative programming model where you can model the data flow and then we’ll generate the best PL SQL and SQL to get the job done,” Narth explains. “Nevertheless, there is a learning curve in terms of understanding how the tool works. In Paris, it’s something we’ve tried to address even more in terms of making these things as easy to use as possible. Generally, the product tries to guide you through as best it can.”
In the revamped Design Time Client, users can drag-and-drop both canned and custom-built objects to create ETL process flows. (Creating custom objects is largely a point-and-click exercise, to boot.) If the data is being extracted from an Oracle target data source, OWB Paris will generate PL SQL that is optimized for the specific version of the source database (e.g., PL SQL optimized for Oracle 9i, or PL SQL optimized for 10g). OWB can also do the same thing with straight-up SQL for non-Oracle relational data stores, says Narth.
Drag-and-drop ETL isn’t everything, of course. In this respect, the OWB Paris release also boasts support for tool command language (TCL) scripting, which brings a pervasive programming paradigm (i.e., TCL) to the often arcane world of ETL programming.
Integrated Data Profiling
Also new in OWB Paris are integrated data-profiling and data-cleansing capabilities. These features, alone, could merit consideration from many Oracle shops. For some time now, after all, industry watchers have talked up the increasing commoditization of data quality and data profiling technologies. The upshot, analysts say, is that data quality and data profiling are fast becoming core BI requirements.
What’s more, few data integration players are sitting this one out. IBM’s acquisition of Ascential brought the latter company’s QualityStage (data quality) and ProfileStage (data profiling) solutions into the IBM fold, and Informatica, for its part, has a strategic partnership with data-quality specialist Trillium. SAS --always ahead of the curve—acquired its own best-of-breed data quality and data profiling software developer (DataFlux) four years ago, and Microsoft has built some limited support for data cleansing into the next version of SQL Server IS.
Add Oracle to that list, too. OWB Paris’ data cleansing should be more robust than what Microsoft plans to deliver in SQL Server 2005 (i.e., mostly data-quality transformations in the context of ETL). Paris will ship with several predefined matching algorithms—e.g., identify and/or reformat U.S. and international telephone numbers, Social Security numbers, etc.—and users can also derive business rules from data once it’s been profiled.
There’s data auditing, too, so that once you’ve done your profiling at design time and generated your business rules, you can also audit your data, such that if it fails one or more of your rules, then you can instigate what Oracle calls “correction maps”—basically, canned logic that decides what to do with the offending data based on a predefined set of conditions.
Oracle is pushing another selling point in Paris, too: It’s an actionable data-quality tool. “The big differentiator is that most of these other technologies tell you that you have problems. They analyze the data, but out of it comes a bunch of reports. You’ve got this problem, this problem, and this problem, but if a customer has moved up from eyeballing the data, we’re taking it to the next level, which is actually helping you do something about these problems.”
Paris ships with a GUI-based profiling tool, called Data Profile Manager. Based on the same point-and-click and drag-and-drop modeling paradigm as the Design Time Client, Data Profile Manager looks to be a powerful and fairly straightforward tool.
More to Come
There’s a lot more to look forward to in the OWB Paris release. In a future issue we’ll discuss Paris’ improved support for scheduling, its revamped security model, and its so-called “Experts” expertise encapsulation feature.