In-Depth

Expressor Touts Smarter, Leaner Data Integration

Just when you thought it was safe to close the book on your enterprise ETL options, along comes data integration upstart Expressor Software

Just when you thought it was safe to close the book on your enterpriseETL options, along comes data integration upstart Expressor Software Corp. The company bills itself as a full-fledged competitor to established data integration players such as IBM Corp., Informatica Corp., and SAS Institute Inc.

As IT companies go, Expressor has a rock-star pedigree: its chief scientist (John Russell) is a Yahoo veteran, its CEO (Bob Potter) is a veteran of Kalido (where he helped develop that company's Adaptive Data Warehousing strategy), Iona (the one-time object request broker-ing champ), and Object Design. Michael Walclawiczek, a veteran of Kalido, Iona, and the former Streambase, heads up Expressor's marketing efforts.

What, exactly, are Walclawiczek and Expressor marketing? Nothing less than a top-to-bottom data integration (DI) stack -- with a twist.

Although IBM, Informatica, SAS, and a bevy of other competitors (including Oracle Corp., SAP AG, and Microsoft Corp.) market top-to-bottom data integration stacks of their own, Expressor touts something that none of these vendors offers, according to Russell: freshness.

All of these platforms were built a decade or more ago to solve a then-pressing data integration problem (typically extraction, transformation, and loading) using then-current technology, Russell argues. If today's "legacy" DI platforms scale to address today's complex DI scenarios, Russell claims, they're able to do so mostly because of improvements in processing power, as well as enhancements to storage and I/O throughput. Expressor, he insists, evolved over the last half-decade. It touts a smarter approach to DI, starting with its semantic metadata repository.

"What we provide is a semantic metadata repository. We built a set of tooling around the repository. It's a role-based, collaborative system, [the] aim [of which] is to build a central [collection of] metadata by rationalizing physical metadata from different source systems and maintaining that relationship in the semantic dictionary," Walclawiczek explains. "Everything that we do in the system is based on those common rationalized business terms."

One of the things that Expressor has in common with Kalido (former home of at least two of its principals) is its emphasis on business-friendly data warehousing. The fruit of Expressor's semantic abstraction, Walclawiczek indicates, is an intuitive data integration scheme that lets IT have its cake and that simultaneously caters to business users. In other words, says chief scientist Russell, Expressor speaks both geek andbusiness.

"You work on transactions and reporting, [or] you work with common business terms. You don't work with [values such as] 'account balance' or 'acc bal' -- I don't know what those things are, and frequently neither does the business," he indicates. "We also have reusable business rules. Our rules are defined at the syntactic level. We have this [wizard that] guides you through a process where it's going to learn what your technical metadata looks like.

"It can also learn across your different subject areas and even across your siloed organizational structures," Russell continues, noting that the out-of-the-box efficiency of Expressor's metadata discovery wizard is between "50 and 94 percent." The chief takeaway, he says, is that Expressor abstracts or rationalizes from the particular to the semantic universal.

"There's no data type. There's a part number. If it's going from a numeric type in a relational system to a comma-separated flat file, the engine makes the right choices. The engine automatically fixes things for you," he stresses. "What you end up with [is this] dynamic grid [where] all [of your data is] semantically … catalogued in the metadata system," he told Enterprise Systems.

It's in this sense, Russell argues, that Expressor's approach appeals to business users, analysts, or domain experts. "We handle all of the technical peculiarities of dealing with your environment [inside Expressor itself] and we let you focus on the business rules."

There's another way in which Expressor's approach lets both business and IT stakeholders have their respective pieces of cake and eat them, too: it accommodates siloing. What's interesting, says marketing chief Walclawiczek, is that siloing isn't always a bad thing. It's a fact of life in most organizations, for one thing, and (more often than not) it comes down to an issue of self-preservation: call it inevitable petty-fiefdom-ing.

"People tend to think of their departments as standalone business units. 'If we didn't exist,' they say, 'the rest of the company wouldn't operate.' What this leads to -- in manufacturing [for example], I'm going to create an item, and that's what I refer to that element as. In logistics, which also thinks of itself as standalone, they call that product something else," he points out.

With this in mind, Expressor's semantic abstraction lets individual business units preserve the illusion of autonomy -- i.e., manufacturing gets to call its widget an "element" while logistics gets to retain its "type" designation -- even as it reconciles metadata definitions across an organization. "In those two areas [i.e., manufacturing and logisitics], you're able to set your own context, so they can say, 'We still have our own way of storing that,' he observes.

Another benefit is what CEO Potter describes as Expressor's ace-in-the-hole: its competitive total cost of ownership (TCO), which is a function of both its ease of use and its unique pricing model.

"We believe that TCO really does matter," he says. "Data integration is just way too expensive. We talked about semantic rationalization. What that does is reduce [the] labor [costs] on a project. It's not just about the number of people, it's also about the skill level[s of those people]. You can basically reduce the [required] skill level by having less costly labor on these projects because of our semantic rationalization, and because our parallel processing [engine], you don't have to incur the hardware costs, the systems costs that you had to incur [with these other offerings]."

Expressor's pricing model is bound up -- to a degree -- with its performance model. It claims a significant performance boost, relative to competitive ETL solutions, at a fraction of the price, thanks to its "channel" licensing scheme.

"The way we're going to utilize your servers is based on where do you want to run our software. If you're only going to run it on a few CPUs [inside of a] 64-way [system], we're going to charge you for what you're actually using, not charge you for the total capacity [of that system]," he explains. "We price on a channel basis. We charge a fixed licensing fee per channel. We're not charging at all for the tools, we're not charging at all for the connectors."

The devil frequently lurks in the details, and -- in this case -- Expressor's concept of a "channel" just about screams to be explicated. Potter is suitably obliging.

"A channel is a unit of parallelism. If you need to process [X amount of] data, we would calculate how many channels you need. For example, if you have a 64-CPU machine and you have a lot of data, you buy eight channels. What that means is that on that machine, you could run thousands of applications up to eight-ways parallel," he explains. "The throughput per channel is anywhere from 20 MB per second to 200 MB per second depending on machine type."

There are a few extra goodies in the Expressor approach, starting with what Russell describes as a data-integration economy of scale: while a first-run Expressor DI project should take about as long as any other DI project, subsequent projects -- which benefit from semantic abstractions, captured business rules, and other products of that first project -- should come together much more quickly.

"The first project should take about the same amount of time as any other project. Every project after that which reuses any aspect of what you've already done, that timeline starts to fall off. If you're loading to a data warehouse and you start to add multiple sources in, your semantic and business rules as you transform data to the data warehouse [will already be there]," he indicates.

Must Read Articles