In-Depth
Analyst: Few Products Meet Rigorous Enterprise ETL Requirements
Enterprise ETL tools must address scalability, developmental, connectivity, and availability requirements
Although the requirements for an enterprise extraction, transformation, and loading (ETL) solution are hardly surprising, the list of products able to address them is growing smaller.
That’s one upshot of a recent research bulletin from consultancy Forrester Research Inc., which concludes that an enterprise-class ETL tool worth its salt must address significant scalability, developmental, back-end connectivity, and high-availability requirements.
Not surprisingly, writes Forrester analyst Philip Russom, many ETL tools nominally slated for enterprise-scale usage scenarios fail these important tests. An enterprise-class ETL tool must be an effective data integration Swiss Army knife, brought in to replace dedicated (and, in some cases, widely disseminated) point solutions.
“The practice of enterprise-scale ETL is challenging in a large enterprise, because it heaps upon an ETL tool very large data volumes, complex data transformation processing, long lists of data sources and targets, and a great number of large projects,” Russom writes, citing the emergence of real-time—as opposed to conventional batch—ETL, as well. “Already carrying a heavy workload from data warehousing, the growing diversity of ETL applications beyond warehousing tends to multiply these challenges.”
As far as scalability is concerned, Russom argues, organizations should concern themselves with two characteristics: massive data throughput and fast processing. “Corroborate that an ETL tool is scalable by load testing the tool and by interviewing the vendor’s customer references,” he advises. “Look for parallel processing capabilities in every component of the tool, as well as clustering and load balancing functions for deployments of multiple server instances.”
Similarly, an enterprise-class ETL tool must provide a development environment that can accommodate multiple developers—or multiple teams of developers—at the same time. “Enterprise scale use assumes that multiple developers will share the design environment, creating and updating ETL objects and projects simultaneously,” Russom writes. “Look for check-in/check-out and versioning for both objects and projects. To enable collaboration and project management, look for a central repository containing ETL objects and projects (not just metadata). Geographically dispersed teams need Internet-based access to the repository.”
Connectivity to heterogeneous data sources has always been ETL’s strong suit, but enterprise-class ETL tools must take this essential feature to a whole new level, says Russom. “An ETL tool should be able to access all brands of relational and legacy databases, including those on a mainframe,” he writes. Lest there be any confusion, Russom confirms that ODBC and other connectivity standards won’t pass muster for enterprise-class ETL: “For best performance and functionality, database access should be through native gateways … and should take full advantage of query optimizers and bulk loaders.”
Because ETL is increasingly used in real-time scenarios, high-availability is a key requirement for any enterprise-class tool. There are a couple of concerns here: Business processes depend on a steady flow of mission-critical or time-sensitive data, and—in real-time ETL environments, especially—data integration windows are so tight that jobs can’t often be restarted and completed (in the event of a failure) before a window closes. In this respect, says Russom, ETL “checkpoint”—which enables a tool to recover from a failure without restarting—and fail-over clustering are must-haves for any enterprise ETL tool worth its salt.
Other concerns, says Russom, are platform support—some integration scenarios may require that organizations deploy ETL tools across a variety of different operating environments—along with vendor viability. “Before committing numerous valuable business processes to a vendor’s ETL tool, check vendor viability, as seen in stable finances, steady revenue growth, and high ranking in terms of market share,” he cautions.
So which ETL tools make the enterprise-class cut? Not surprisingly, says Russom, products from market-leaders Ascential Software Corp. and Informatica Corp. “ably satisfy the requirements” of enterprise-class ETL. “[T]hey are also the most common corporate-standard ETL tools, since a standard tool is usually forced to address enterprise-scale requirements,” he says.
Business Objects SA has made great strides of late with its Data Integrator tool, and introduced a new version 6.5 release of that product in April. Russom acknowledges that Data Integrator 6.5 scores well in terms of connectivity developer-friendliness, but says that it’s not yet up to snuff in terms of scalability.
Elsewhere, Warehouse Builder from Oracle Corp. easily passes the scalability test (as a result of its Oracle Data Server underpinnings), and offers robust multi-developer support, but fails the connectivity requirement, says Russom, because it’s designed largely for Oracle-only environments.
About the Author
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.