In-Depth

BI Experts: Real-Time Quality for Real-Time Data

Don't just deliver bad data faster. Improve data in real time.

One of the most influential trends in information technology today is the incremental movement toward real-time operation. For data-driven IT and business practices -- such as business intelligence (BI), data warehousing (DW), and data management -- real-time operation means moving data from point A to point B almost instantaneously, whenever it's needed.

The challenge is to do more than simply move data within seconds. In most cases, data must also be repurposed and improved. After all, delivering time-sensitive data within seconds enables fast-paced business processes, such as operational BI, just-in-time inventory, and purchase recommendations in an e-commerce context. Yet, the data itself is of low value or trust unless it has been cleansed, enhanced, and transformed to fit the target purpose.

Hence, the importance of real-time (RT) data quality (DQ) functions increases as the collection and delivery of data in real time increase. Instead of delivering bad data faster, user organizations should include DQ functions that improve real-time data as it's in transit. Although it's true that users most often configure DQ solutions to operate in batch mode, most DQ functions also operate well in real time, as the following examples illustrate:

Standardization. According to TDWI surveys, data standardization is the most commonly used DQ function. For example, name-and-address cleansing is mostly about standardization. Firms that depend on high-quality customer data typically standardize each customer record in real time as it's created or updated.

Validation. As an example, name-and-address cleansing should go beyond data standardization and also validate that the person, postal address, and phone number all exist. Validation of this sort is regularly applied in real time as information is entered by, say, a telemarketer or a customer filling out a form on an e-commerce Web site.

Identification. It's desirable to accurately identify a customer in real time at each touch-point with them. Otherwise, redundant records may be created or information about a customer may be logged in the wrong record. This is sometimes called entity resolution or identity resolution.

Matching. Accurate and timely identification is a foundation for other RT DQ functions, such as matching, merging, and householding. For example, before a new record is committed to a sales contact database, matching it in real time with an existing record can prevent the insertion of a redundant record.

Detection. Some RT DQ functions can check whether a person is on a watchlist of suspected criminals or terrorists. These detection capabilities (coupled with validation) enable a system to spot problematic people and potentially fraudulent behavior -- in real time, while there's still time to stop them.

Monitoring. Although there are different forms, monitoring usually re-profiles data on a daily basis to assure that it complies with corporate standards for data quality. Monitoring typically runs in batch on a 24-hour cycle, but it can also run more frequently during the business day, as needed.

Classification. Once a DQ function has identified a person, a classification function may augment their record by classifying their gender, demographics, preferred customer status, and so on. Classifying people in real time is important when an action at one touch-point (e.g., paying a late bill) affects the level of service at other touch-points.

Augmentation. A common form of augmentation is to append demographic information to a customer record; this helps to complete a 360-degree view of each customer. If the party in question is a company instead of a person, DQ functions may likewise append firmagraphics, identification numbers, or debt numbers. Geocodes and micromarketing codes may also be appended. Users have been augmenting data this way for years, but usually in batch. Nowadays, the trend is to augment a record in real time, as the record is created or updated, usually through a third-party service.

For more information on this topic, read the TDWI Checklist Report, Real-Time Data Quality, available via free download here.

Must Read Articles