Traverse metaController: When ETL Is Overkill
Forget ETL: metaController enables a new kind of integration, which Traverse officials describe as “process orchestration”
In spite of the best efforts of Informatica Corp. and the former Ascential Software Corp., ad hoc ETL solutions are still the norm in many organizations.
There’s a good reason for this: many companies simply can’t afford the upfront investments—in terms of software licensing costs, maintenance, and in-house technical expertise—required to implement and manage Informatica’s PowerCenter, Ascential’s DataStage, and other packaged ETL solutions.
This is changing, of course—research conducted by The Data Warehousing Institute (TDWI) suggests that organizations that develop ad hoc ETL usually transition over to packaged ETL products when budget dollars become available. Still, for a significant number of companies, full-blown ETL solutions are overkill.
As a result, says Rohit Amarnath, CEO of data integration start-up Traverse Systems LLC, there’s an opportunity for companies like his to tap into a largely neglected market. Traverse’s flagship tool, metaController, doesn’t do ETL in the traditional sense of the term, nor is it a scheduling (i.e., batch scripting) tool, either. Instead, it’s a hybrid of both traditional ETL (with Web services-based connectivity into relational, OLAP, and other sources) and batch scheduling—a technology commingling he calls “orchestration.”
It’s an approach Amarnath developed and honed in the trenches, so to speak, during a consulting engagement with Deutsche Bank. “We started building out [Deutsche Bank’s] global BI platform using Oracle and Hyperion Essbase, and this was right after they acquired Banker’s Trust here in N.Y. Basically, the impetus here at the time was that they had a small BI group that was servicing the North American region. With the Banker’s Trust acquisition, they were expecting that to grow significantly,” he explains.
One upshot, he says, was that Deutsche Bank’s existing BI infrastructure simply wouldn’t be able scale to support projected demand. “I realized that what we were trying to do was beyond what we could get out of the scheduling systems, but not so much that we needed an Informatica or an Ascential. They were expensive and they were much more than we needed. Basically what we ended up doing was we built out an orchestration platform to manage all of these processes within the data warehouse and building out the cubes, etc.,” he says.
Amarnath’s work at Deutsche Bank was mostly enabled by means of Unix scripts and Oracle stored procedures—in other words, a far-from-portable (much less commoditizable) solution. It occurred to him, however, that other companies with similar pain points might be receptive to a shrink-wrapped version of his hybrid orchestration technology. “I thought there was a real need for this outside Deutsche Bank. So in 2001 I hired a couple of guys and basically took that operational idea and built it from the ground up in Java and using Web services.”
Enter metaController, Traverse Systems’ first official software deliverable. Unlike dedicated ETL and EAI solutions—which Amarnath says often have complex configuration and programming requirements—metaController more or less drops into a customer’s existing environment. It can kick off scripts—ala a conventional scheduler—but it also gives data architects a drag-and-drop means to design and manage process flows between systems (including scripts, along with existing ETL or EAI tools) in the context of upstream and downstream process steps. Amarnath likes to compare the role of metaController to that of a conductor in a symphony orchestra: each of the orchestra’s musicians has his or her specific roles, and the conductor manages them in concert with one another.
If it looks like a scheduler and smells like a scheduler, it’s a scheduler, right? Not necessarily, says Amarnath. “Right now what most people do is they have these scripts and they try to use the enterprise schedulers to try to coordinate between all of these pieces. The problem is these schedulers—they kick off the script but the other automation logic is still buried."
“Another problem is that you have so many different stages in data warehousing that you have to manage for—the data collection, the extraction, the storing, and delivering the data—and at the same time, this is across all different platforms. It’s not one script you’re going to write in [Unix] kshell to make it work. You’ll have to write application-specific scripts, and then you have to write wrapper scripts to coordinate these scripts, and even more wrapper scripts to coordinate your wrapper scripts.”
Amarnath’s background is as a programmer and business process consultant, so metaController is designed to address many common business concerns. It boasts canned support for business rules—Traverse Systems even offers an agent for the popular business rules management system (BRMS) from ILOG Inc.—and includes an integrated workflow engine that supports interactive (i.e., human oversight) workflows.
For example, says Amarnath, suppose a business requires signoff before data publication. If that’s the case, data architects can use metaController to build a process map that includes the data aggregation, load step, and cube-building steps. A data architect could then invoke a stageDirector process, which can be dragged-and-dropped onto the existing process map (prior to the point in the flow at which a report is generated). The idea is that when the flow reaches the stageDirector node, an e-mail is sent to the business analyst to request sign-off. “This lets the business control the processes themselves in terms of kicking them off, approving at checkpoints, stream validation, etc.,” he says. “[In] the traditional way, you have an enterprise scheduler, it goes and kicks off a script, it’s fairly linear, and the logic is buried at multiple levels.”
Of course, metaController is still a work in progress. One limitation is agent support. At the moment, for example, Traverse Systems is providing metaController agents on an as-needed basis. Currently, the company ships agents for Informatica and Ascential ETL tools, Oracle relational databases, Hyperion Essbase, HyperRoll’s OLAP accelerator, and the ILOG BRMS. But customers who need adapters for other BI platforms, relational databases, or enterprise applications shouldn’t be deterred, Amarnath says. “Right now we’re saying if you need Cognos, for example, give us a week and we’ll build that in. Eventually, the goal is to let customers do this themselves. The only requirement is that there be some kind of API or command-line interface that we can wrap and put some Java code around.”
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.