In-Depth
Third-Party DI Tools Make Mapping Easier
At the recent Professional Association of SQL Server (PASS) user conference, Microsoft Corp. wasn’t the only vendor with something to talk about.
One unusual sponsor of this year's PASS event is data integration (DI) upstart Expressor Software Inc., which announced version 3.0 of its DI platform. That a third-party DI player such as Expressor is sponsoring a SQL Server-centric conference isn't completely surprising, however: DI rival WhereScape Inc. was at PASS, too, albeit as an exhibitor.
Both companies also exhibited at Teradata Corp.'s PARTNERS conference in October.
CEO Bob Potter says that Expressor addresses integration problems that SQL Server's built-in DI technology isn't designed to tackle. Press Potter a bit more and he'll concede that WhereScape's RED aims -- albeit in a different way -- to do the same thing.
"The thing about how the [data integration] business has changed, first and foremost, [is that] there's more data, so the data processing engine has to be more scalable," he explains, noting that most traditional ETL tools are handicapped by how they manage data mappings, an issue that Expressor claims to ameliorate. It's a problem that isn't specific to SQL Server.
SQL Server users tend to struggle with other issues, too.
"Now we're also dealing more and more with the physical layout of the data," Potter points out. "If you're a SQL Server Integration Services user, you're writing tons of code to do all of those alignments; with us, you'd write nothing. The business value is faster time to deploy, and the [time that it takes to deploy the] second project and the third project and the nth project is dramatically less."
(Not So) New Kid in Town
Compared to DI stalwarts such as Ab Initio Inc., IBM Corp., Informatica Corp., or even Microsoft Corp. -- which has been shipping its SQL Server Integration Services (SSIS, nee Data Transformation Services) for more than a decade now -- Expressor is a new kid in town.
It isn't alone, however. Expressor and a pair of its unconventional competitors -- namely, Illuminate Inc. and WhereScape -- first appeared on the scene three years ago. Expressor has tried to make a virtue out of its biggest perceived vulnerability: its maturity, or, as competitors might argue, its comparative lack thereof.
Unlike so-called "mature" offerings, Potter claims, Expressor isn't burdened with legacy "baggage." It wasn't designed to address one kind of problem (e.g., batch-centric ETL) and subsequently updated or retrofitted to address new problem types. "It was new technology," he says, referring to Expressor's first release.
In Expressor 3.0, there's new new technology, he maintains.
"About 18 months ago, we decided we really had a good vision, but we didn't have quite the right product architecture to execute that vision," Potter explains. "Our vision was [of a] ubiquitous, cost-effective, easy-to-use, fast data-processing engine, with what we call smart semantics, [or] the ability to rationalize physical constructs to a business term like 'account number.'"
In practice, Expressor was able to achieve most of those goals. At the same time, Potter concedes, "we still required people to go onsite and install the software." The goal with Expressor 3.0 was to develop an offering that could be deployed by in-house IT personnel and largely supported over the phone.
In other words, no costly onsite presence.
"We hired a usability architect out of SAS [Institute Inc.], hired some very experienced data integration engineers that had built first-generation systems, went back to the drawing board, and came up with … Expressor 3.0"
The newest of the new in Expressor 3.0 is Studio, a GUI-based design environment that doubles as a standalone DI tool. (It includes an embedded run-time engine.) Expressor positions Studio as the go-to design environment for each of its three product offerings: a free "Community Edition" offering (which consists of Studio itself), Standard Edition (which Expressor positions as a per-project-oriented DI tool), and Enterprise Edition, a version 3.1 release that Expressor plans to introduce next May.
Pervasive DI?
Studio, Potter says, is designed with the technically-oriented business user in mind. It's designed, he notes, to help crack the usability nut that has long bedeviled DI. There's a reason, Potter contends, that ETL programmers can earn upwards of $100 an hour -- and that (to use just one example) Informatica architects routinely command six-figure salaries: DI is complicated.
"The tools in the market today are used by ETL developers -- very technical people that have no problem coding up solutions," he explains. With Expressor 3.0, Potter claims, "you can have somebody build a data flow and set up a model and [then] through this nice [Studio] user interface … you can have non-ETL developers start building data flows, data integrations."
Expressor is far from the first vendor to take up the mantle of (what might be called) pervasive DI. ETL mainstays Informatica and SAS, among others, have been touting "collaborative" DI -- a data integration practice in which business users closely collaborate with technical personnel both to design and (mostly in a limited sense) manage integration projects -- for years. Informatica made it an important part of its Informatica 9 platform launch earlier this year.
Elsewhere, Data warehousing specialist Kalido says that its Business Information Modeler lets business users participate (in an ongoing fashion) in the DI process.
DI players routinely pay lip service to the idea that business users can participate meaningfully in the development and/or management of data integration projects. Expressor has at least one technology which suggests it isn't telling tales out of school: its special sauce is what it calls a "smart semantic" capability that's built right into the Expressor engine.
In most cases, Potter maintains, data integration is facilitated by point-to-point mappings between a target platform and disparate data sources; Expressor's smart semantic technology makes the task of managing these mappings irrelevant -- in its case, by abstracting data types from data sources.
"These [mappings] can get very complex, can get very cumbersome to maintain; if you add a new source, for example, you have to remap from the new source to all of these target fields," he says, conceding that most enterprise DI tools will do this automatically.
"Still, in many cases, every new project is a brand new project: there's no reusability. What we allow users to do is to build a canonical model or an abstracted model based on types, which means that you map only once, then you reuse over and over and over again."
It isn't a turnkey proposition, of course; IT and the line of business first have to collaborate to build a semantic model.
"We have an internal type that any company can override and use their own nomenclature … and then we have an internal type layout … and we can automatically do all of those mappings, transformations if you will. We can read the schema on the source system [so that we] know what we're writing to; all of this tedious mapping stuff gets done automatically by the system," he explains.
"You apply those rules to the semantic type, not just to the specific mapping, which is what every other ETL system does. You have to have built this architecture from the ground-up, and it has to be a metadata-driven system to be smart enough to allow these users to build this semantic model," he argues.
Looking Forward
The version 3.0 release that Expressor is touting at PASS is available in both its Community and Standard Edition flavors.
An enterprise edition, slated to include a metadata reporting capability, as-yet-unspecified "low-latency" features, and a role-based user model, is expected to ship in April. "We have some low-latency capability in Standard Edition [i.e., version 3.0] and in Expressor 2.4 [the previous version] -- and the engine is fast enough to do real-time -- but we're building out more functionality to support more sophisticated [real-time access]," Potter explains.