Enterprise-wide Data Quality: A Work in Progress

In addition to customer data, companies want to bring arrant product and financial data to heel, too. Call it a data-quality-work-in-progress.

Enterprise data quality (DQ) initiatives are something of a moving target. An organization no sooner finishes getting its customer data ducks in a row than it’s ready to move on to bigger—and possibly more complex—challenges. Increasingly, for example, organizations have been trying to bring arrant product and financial data to heel, too. Call it a data-quality-work-in-progress.

That’s one upshot of Business Objects SA’s release last month of its Universal Data Cleanse (UDC), a new option available for Release 2 of its Data Quality XI platform. After first venturing into the ETL marketplace in 2002 with its acquisition of the former Acta Technology, Business Objects became a data quality player last year, when it acquired Firstlogic Corp. Firstlogic gave Business Objects best-of-breed data quality technology and a huge customer base—thanks in part to Firstlogic’s ubiquitous licensing arrangements with industry heavyweights—that included Business Objects itself.

Firstlogic’s DQ assets formed the core of the original (customer data-ready) Data Quality XI release. But Business Objects honed its product and financial data chops internally, developing both features after its acquisition of Firstlogic, according to officials. It did so largely on the basis of feedback from customers, officials say.

"Customer data has been very problematic … for many years, and we have addressed that need with Data Quality XI, but we also know that more and more product and financial data is becoming problematic, too," says Kristin McMahon, enterprise information management (EIM) product marketing manager with Business Objects. "Companies and organizations are ready to tackle that challenge and are saying, ‘We’ve got great data quality processes around our customer data, so let’s move on to our product and financial data.’"

It’s a timely technology deliverable, says Philip Russom, a senior manager with TDWI Research, who says companies are already looking beyond customer data. "Although customer data tasks are still the bread-and-butter of data quality, the need for quality product data continues to grow," he comments. "Many organizations are at a point where they’re moving from name/address cleansing and house-holding—the most common starting points, which focus on customer data—to product data tasks, and procurement/supplier data is a common place to start in this new area."

That’s just what Business Objects claims to be hearing from its customers, according to McMahon. "From our customer base, and from the market research we’ve been doing over the last few years, we understand that projects like master data management (MDM) are coming up more often when we talk with customers. We also understand that a lot of organizations have more and more [mergers-and-acquisition] activity," she indicates. "So they need to be able to combine that information and see it in one standardized view. With [Universal Data Cleanse], we really do feel that this is a gigantic step forward in terms of what we can offer our customers."

For example, McMahon explains, consider an electronics retailer that sells iPod-related accessories. Perhaps its suppliers use different character strings—viz., "blue," "bl" or even "ble"—to represent the color blue. Data Quality XI’s UDC option can parse and appropriately reconcile such disparate representations, says McMahon.

"It can today take a string of texts and is able to parse, standardize and cleanse that. So if you’ve got [a string that says] ‘Apple iPod Mini Blue 2 GB,’ UDC can take that and parse it and separate it into its relevant categories," McMahon explains. "And then if we dive down a little deeper, what it also is able to do is standardize, so maybe the color blue is represented in different ways … and we take those [and standardize them] so that we have only one way to represent that [value] throughout the organization."

Elsewhere, she continues, the UDC option can parse semi-structured data up to 8,000 characters. It can also process rules that are associated with data dictionaries—e.g., truth tables. There’s also built-in collaboration with Business Objects Data Insight, the data profiling tool it inherited from Firstlogic. "It can directly incorporate the product specifications from the business analyst, it can do word frequency distribution. [These specifications] can be bulk loaded into the UDC option, where the IT analyst takes that and runs wit it, so no longer does the IT analyst have to guess about what a specific text means," McMahon indicates. "We also offer some pre-loaded dictionaries for our customers when they buy UDC, and these include colors and sizes. They come in various languages. We’ve also preloaded the dictionaries for weight."

In addition to the extra-customer data trend, Data Quality XI R2 and its UDC option address other salient DQ issues, too, says Russom.

"One of the great struggles data quality tool users are experiencing now is to connect the dots. Most of their data quality implementations focus on a specific information system, department, or initiative, and these are essentially silos," he explains. "The goal is to gain greater enterprise scope for data quality solutions, by consolidating these silos into fewer but larger data quality solutions or by leaving them in place but integrating them better. A related goal is to centralize data quality solutions for greater project reuse and more consistent data."

Business Objects’ McMahon, for her part, says siloing is still a problem in many organizations—in spite of enterprise data warehousing (EDW), MDM and other unification or reconciliation pushes. "It all depends on the maturity of the organization. It really depends on the sophistication where they’re at in data quality. A lot of larger organizations are really charging forward with combining and having that one view of data—but that’s also extremely difficult to achieve when you’re talking about larger organizations," she indicates.

The revamped Data Quality XI also does more to internationalize that product, Russom says. He identifies internationalization as another traditional "ghetto" for data quality solutions, most of which are designed primarily for North American—or, at any rate, Western—markets. "As data quality spreads, so does its geographic coverage. From a technology viewpoint, this requires greater internationalization in the form of support for more natural languages and national address standards," he comments.

McMahon describes a not uncommon scenario that involves a customer input string with a Polish name. "[M]ost likely the tool would take the first text and make that the first name. It might take the second and make that the middle name, and the last ad make it last. And we probably wouldn’t be able to assign a gender, either," she concludes. "With the UDC, it takes everything in context. If it wasn’t a Polish name, we would be able to tell if it was a strong male or a strong female or not. We also would be able to tell that a middle name probably wasn’t identified in that middle string. The net of it is that today, with UDC, the customers themselves can name their own fields, so it doesn’t have to be in English [i.e., first, middle, last]—it can be whatever it is within the country …. It brings them way up to speed in terms of regionalizing their customer."

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.