In-Depth

Can Data Quality Elude Commoditization?

It’s tempting to think of data quality as a soon-and-inevitably-to-be-commoditized technology segment. But think again.

Now that IBM Corp., Microsoft Corp., and Oracle Corp. have built (or are in the process of building) robust ETL facilities into their flagship relational databases, the commoditization of ETL itself is all but a fait accompli.

And with Microsoft and Oracle also starting to incorporate data quality features into their database products, it’s tempting to think of data quality as a soon-and-inevitably-to-be-commoditized technology segment, too.

Few experts think that’s going to happen anytime soon, however. Some argue that there are substantial impediments to commoditization of this kind—starting first and foremost with the difficulty of developing and (more importantly) maintaining name, non-name, address, and other kinds of data quality-specific databases. In this respect, some industry watchers say, it might make more sense for data integration and other vendors to continue existing partnerships with data quality providers.

“Data quality is still a relatively new area in tools,” says Philip Russom, senior manager of research and services with TDWI. Unlike ETL, Russom says, the data quality space hasn’t yet experienced the “erosion” of pure-play market share by encroaching commodity vendors. He points to ETL powerhouse Informatica Corp.’s own data quality strategy, which—like that of BI giants Business Objects SA and Cognos Inc.—taps data quality technology from third-party partner FirstLogic Corp., along with (as of July 2005) that of Harte-Hanks subsidiary Trillium.

Firstlogic vice-president Frank Dravis isn’t exactly a dispassionate observer of the data quality marketscape. His company is one of the leading data quality pure plays, for one thing, and—in the wake of an abortive merger with direct mail and campaign management specialist Pitney Bowes Inc.—the subject of takeover rumors. Nevertheless, Dravis argues that the lot of the data quality pure play will improve over time.

“I would even position [being a pure play] as a place of strength. Microsoft in their new [SQL Server 2005] Integration Services ... is indeed coming out with some basic data quality functionality. Okay, that’s fine. But that only works on SQL Server 2005,” he comments. “You bet there are people out there who are a pure Microsoft shop who are going to want to take advantage of the intrinsic functionality that comes with SQL Server 2005. There is some pressure for data quality functions to be absorbed by some applications, too. But, by and large, a lot of the data cleansing is happening in the gaps between those applications in that heterogeneous environment space, and [customers] just want the specialization that they can get from a pure player.”

There’s another reason why data quality could remain a tough nut to crack for all but the most ardent*mdash;i.e., non-commoditized—of practitioners. It’s a highly esoteric space. Consider the case of Language Analysis Systems Inc. (LAS), a specialty provider of multi-cultural name identification, profiling, and cleansing software. LAS has been in business for 20 years now and has a host of customers in the government sector. In the post-9/11 climate, LAS has also expanded its marketing efforts to the commercial sector, too. But LAS still has the multi-cultural name identification and cleansing market mostly to itself, says CEO Jack Hermansen. There are a few reasons for that, Hermansen claims. “You’ve got to have a good search algorithm; you’ve got to know how smart the user is; but the third component is the name database,” he says.

That’s where LAS has invested the bulk of its time and research, he maintains. The company recently released a new name profiling tool, dubbed Name Inspector, that’s designed to buttress its other offerings. “Names of people, places, and businesses—there are no dictionaries for them, there’s no way to look up [a name] and say this is wrong. We run into very, very intractable problems [in transliteration from] other writing systems, so if somebody’s looking for a name coming from the Korean culture, or the Cyrillic—for example, Tchaikovsky with a ‘T’ in front of it, because that’s the French transcription of his name—we have to be able to match that.”

The point isn’t that LAS will weather commoditization (if it comes to that) in the data quality space. Rather, in the regulatory climate of today, there are an awful lot of esoteric (or “boutique”) data quality requirements. As for commodity players who opt for a buy versus build approach, Hermansen says, they need to keep in mind the amount of upkeep that goes into maintaining truly top-flight data quality technology. “We have almost a billion names now [in LAS’ name database] from every country in the world that we’ve been using as our statistical cauldron, that we’re able to profile and determine name characteristics,” he comments.

Nevertheless, some industry watchers believe mainstream commoditization in the data quality segment is inevitable, even if boutique players continue to thrive.

“I think there’s an ongoing trend to recognize the importance of data quality upfront,” says Mike Schiff, a principal with data warehousing and business intelligence consultancy MAS Strategies. “I think [data quality pure plays are] going to be hot commodities, and I think a lot of vendors are going to move to own the technology rather than relying on a partnership.”

As a case in point, Schiff cites Firstlogic, which has OEM relationships with Business Objects, Cognos, and SAS (among others) for its data quality technology. When Pitney Bowes nearly took Firstlogic off the market, Schiff says, that was a wake-up call to these vendors and others.

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.

Must Read Articles