Data Quality Management: Getting More from Your BI Investment
Data quality can put your organization on the defense or offense depending on how well you manage it.
By Julianna DeLua
Most organizations understand the impact of data quality on analysis and decision support. The proliferation of business intelligence (BI), with data drawn from disparate systems and applications, can degrade data quality, lowering users’ confidence in BI reports. However, BI deployed with quality data can help an organization compete more effectively and decisively.
Organizations are under heightened pressure to invest in technologies that drive competitive advantage and improve operational results. Successful deployment of BI can help an organization assess its health, establish suitable key performance indicators, and monitor day-to-day operations to drive top- and bottom-line growth.
Intensified demand for more data is propelling the widespread adoption of BI from the executive suite to business users. This extensive adoption has caused the facilities for BI to advance beyond traditional query, analytical reporting, and online analytical processing (OLAP) functions to include operational dashboards, customizable scorecards, and advanced visualization techniques. From the information supply chain perspective, this implies that supporting data needs to be accessed, aggregated, and rationalized to be consumable for BI, irrespective of format, whenever users need it. The stakes get higher every day.
Unlike the traditional BI applications that focused on queries and analytics, BI is being used to aid in operational decisions. Every action users take based on the strength of reports and alerts is influenced by the accuracy of the data used and the users’ trust in that data. How often do we feel that the data appears odd or untrustworthy when we look at a BI report? This queasy feeling, whether justified or not, leads to delays and may even stop us from taking business-critical actions.
As the adage goes, trust must be earned, and trust in data is no exception. Obviously, establishing trust in data between the BI teams and the BI report users is critical. If data is incomplete, inaccurate, or full of duplicates, trust is weakened and people become reluctant to use their BI tools. Beyond the obvious data cleansing and matching is the trust network that must be built around data warehouses, operational stores, and other systems and applications generating continuous streams of data throughout an enterprise. This is why an increasing number of organizations are undertaking data quality initiatives as a core tenet of their enterprise BI initiatives.
Defining Data Quality
Data quality is such an easy term to bandy about. TDWI defines data quality as the quality of data’s content and structure (according to varying criteria), plus the standard technology and business practices that improve data, such as name-and-address cleansing, matching, house-holding, de-duplication, standardization, and appending third-party data (see Taking Data Quality to the Enterprise Through Data Governance, Phillip Russom, TDWI March 2006).
An organization can measure data quality as a means to assign the value of data assets against business expectations defined by specific rules and criteria. A data quality dimension framework (see Figure 1) may include a range of parameters that can be used to identify and categorize data quality issues. If someone says “our data is not good,” we can investigate further and describe the levels of data quality with tangible numbers. For example, data quality equals 80 percent could be an aggregate of completeness of key attributes such as percentage of conformity, duplicates, and so on.
Organizations usually start with this framework and modify it based on maturity and priorities. Both business users and IT teams must agree on the metrics to measure.
As BI deployment becomes increasingly cross-divisional, managing data quality for types of master data beyond customer data has emerged as a best practice. Other types of master data include product data, financial data, asset and other banking data, and HR data, all of which exist in a multiplicity of formats on a wide range of platforms -- from ERP systems and relational databases to semi-structured data sources. Also important is the performance and scalability required of the data quality management techniques due to the increasing data volume after aggregating and consolidating enterprise data for all types of BI.
For this reason, the ability to profile, cleanse. and deliver all types of data with high levels of quality all of the time, regardless of data volume and complexity, is fundamental to the success of BI -- and the pay-off is immense.
With good data quality, business insights from BI become actionable faster, often much faster. With increased confidence in data, executives, managers, and business users can readily recognize and act on new patterns and trends as well as on business warning signs with higher granularity and precision. Identifying overpayments and other cost-mitigation and cost-saving opportunities can be direct results of business users’ ability to use certifiably accurate data from reporting and alerts. End-to-end data quality management also increases auditability and visibility of BI reporting, something that is especially valuable for compliance and risk management purposes.
There is a dark side to data quality, however. Based on a range of studies, the impacts of poor quality data can be wide and deep, such as:
- Inordinate amounts of IT time and resources spent to investigate, cleanse, and reconcile data
- Additional operational overhead to manually pull and correct data for analysis
- Loss of credibility in systems and BI supply chain as a whole
- Slower or erroneous decision-making that negatively impacts customer satisfaction and business performance
- Failures and delays in meeting compliance and risk requirements
Data Quality and Five BI Styles
As discussed, BI has evolved into a variety of styles to address the growing requirements of an organization and the increased level of mission-critical activities. Each of these styles has its own set of data quality requirements.
BI Style #1: Scorecards and Dashboards
Scorecards and dashboards are becoming widely adopted as users increasingly look to gain bird’s-eye views for financial, operational and performance monitoring. With visual graphs, charts and gauges, these delivery mechanisms help track performance metrics and notify staff about trends and decisions that may be required. The data elements required to provide the integrated views typically cut across multiple divisions and disciplines, and need to be absolutely up-to-date to be effective.
Data quality impacts scorecard and dashboard users who must be able to:
- Consume and act on complete data on gauges and dials from dashboards quickly
- Achieve an integrated view and collaborate using standardized data
- Leverage a formal scorecard methodology with consistent data
- Drill down to view accurate data at group or individual level performance
- Pinpoint business processes that are generating notable trends, with minimal duplication of data
- Derive linkages and perform cross-impact analysis through validated data
BI Style #2: Enterprise Reporting
Enterprise reporting provides individuals at all levels with a wide array of operational and other business reporting from ERP, CRM, PRM, invoice and billing systems, and other source systems. Distribution of the reports is broad and compensation and other incentive programs are typically tied to the results reported.
Data quality impacts enterprise reporting in that organizations must:
- Navigate multiple reports and print them in multiple forms that aggregate data from disparate sources
- Select a variety of parameters and customize reports for users with normalized data
- Present multiple tables and graphs with reconcilable data across a variety of performance metrics
- Let business users create their own reports without IT involvement with high fidelity data
- Reduce manual checks and audits with cleansed and matched data for compliance management
- Issue invoices and billing statements directly from BI reporting utilizing financial data with integrity
BI Style #3: Cube/OLAP Analysis
OLAP allows a user to “slice and dice” interrelated subsets of data, or "cubes," interactively on the fly. For example, users can pre-build “sales by region” views for specific time periods, examine performance by product, look at performance by sales person, and so forth. OLAP features such as drill up/down, pivoting, sorting, and filtering can provide underlying details on performance. Analysis cubes help users conduct first-pass analysis and share insights with colleagues. This helps jumpstart deeper assessment using access to data warehouses and other repositories in lieu of more advanced analysis functionality.
Data quality impacts OLAP analysis as users and their organizations need to:
- Drill across all dimensions for in-depth investigation with complete access to target data
- Ease OLAP manipulations for any subset of dimensions with well-formatted, conforming data
- Minimize conflicting reporting and ensure interactivity with consistent, underlying data objects
- Perform user-driven, right-time analysis with correct data in multiple dimensions
- Deliver updated, synchronized data to handle transactional-level data in cube analysis
- Ensure data security when business users create and maintain the cube data across data warehouses
BI Style #4: Advanced/Predictive Analysis
Advanced and predictive analysis empowers sophisticated users to fully investigate and discover the details behind specific business performance, possibly exceeding the typical limits of OLAP Analysis. The approach may involve advanced statistical analysis and data mining capabilities. To drive proactive decisions and improved postures against potential business threats, predictive analysis may include hypothesis testing, churn forecast, supply and demand prediction, and customer scoring. Predictive modeling can be used to anticipate various business events and associated outcomes.
Data quality impacts advanced and predictive analysis as users seek to:
- Create report filtering criteria across any data elements for customizable reports
- Search patterns and predictive insights by standardized data formats to promote proactive decisions
- Achieve confidence in spotting interdependent trends and expected outcomes with consistent data
- Employ multi-variant regression and other techniques on accurate data to achieve better forecasting
- Customize data groupings with minimum conflicts without data duplication
- Test hypothesis and use statistical, financial and math functions with certified data
BI Style #5: Notifications and Alerts
Using e-mail, browsers, networked servers and printers, PDAs, and portals, notifications and alerts are used to proactively share information across a wide range of user touch points. With timely delivery of target information, key stakeholders and decision makers can identify potential areas of opportunity and detect problematic areas on which to take action. This “front-line” BI delivery mechanism keeps the organization aligned and abreast of business risks and opportunities while events are still fresh and meaningful to warrant responses.
In this arena, data quality impacts organizations as they endeavor to:
- Distribute alerts to the widest range of user touch points from all data sources
- Ensure high throughput for a variety of subscription types on standardized and non-conflicting datasets
- Allow users to open attachments or click links while presenting consistent, integrated data
- Mitigate the risk of distributing incorrect alerts and notifications with pre-measured and approved quality of data
- Enable triggering of alerts when multiple event data meets specific thresholds in real-time
- Leverage authenticated data for content personalization and group affiliation
Techniques to Monitor and Manage Data Quality
Data quality improvement can have synergistic benefits to BI as most organizations deploy a combination of these five BI styles, so a successful BI program must include a data quality management component in a metric-driven, programmatic fashion. Maintaining data integrity across extended teams throughout the data lifecycle is required to meet regulatory compliance and governance objectives. To ensure confidence in the validity of enterprise data, information as well as data flows and relationships must be auditable and traceable. For these and other reasons, data quality is better managed as part of an enterprise data integration architecture, with the result that the monitoring and managing of data quality complements the lifecycle of data access, integration, transformation and delivery.As part of the data quality program, organizations need to establish or re-establish a data quality methodology as illustrated in Figure 2.
Improving data quality must be approached as an ongoing cycle. To start, data profiling (1) is a key element in scoping overall data quality initiatives—it enables you to determine the content, structure, and quality of highly complex data structures and to discover hidden inconsistencies and incompatibilities between data sources and target applications. Establishing metrics and defining targets (2) helps IT and the business measure the results of data quality efforts as part of a BI initiative. Design and implementation of data quality rules (3) help define and measure targets and criteria for data quality. As discussed below in detail, integrating data quality rules and activities (profiling, cleansing/matching, automated remediation, and management) with data integration processes (4) is critical to enhancing the accuracy and value of data assets.
Reviewing exceptions and refining rules (5) is best accomplished as a joint effort involving core team members and BI stakeholders. In many cases, BI stakeholders have limited control over the business processes and operational systems that cause poor data quality, which is why it is important to involve key stakeholders and executives throughout an organization in documenting data quality issues and launching a formal data quality program. Finally, proactive data quality monitoring (6) through dashboards and real-time notifications is fast becoming a standard best practice. BI stakeholders themselves, if involved in the data quality process as they ought to be, can be given the tools to do this, as they know best what the quality levels of their data need to be.
Enterprise Data Quality and Data Integration Architecture
With a need to manage data quality end-to-end, new thinking about data quality architectures has emerged. Many leading organizations are now implementing data quality as part of their enterprise data integration architecture. There are numerous advantages to this approach.
For instance, an enterprise data integration foundation enables the discovery, access, transformation, semantic reconciliation, management, and delivery of enterprise data in a secure, consistent, and timely manner. With this approach, IT is able to reuse data access, data transformation, and data quality logic across multiple BI (and other) environments, thus reducing the time to implement new BI projects or deliver enhanced functionality.
Another key advantage is the ability to leverage a single environment for change management and impact analysis. The enterprise data integration framework, with its foundational metadata infrastructure, contains information regarding data relationships and lineage that IT can leverage to perform efficient change management.
Finally, an enterprise data integration infrastructure is able to support users who require data at varying levels of latency — batch, real-time, and near-real time. As business processes evolve and businesses move from traditional decision-support systems to more operational decision making that requires operational BI, the ability to support these latencies continues to grow in importance.
Data Quality Management by BI Type
To see how what we’ve described can be put into practice, let’s look at an example of a large financial institution that implemented a robust data quality management system across multiple key business initiatives. These included initiatives for driving customer cross-sell/up-sell, compliance and risk management, and other operational improvements. The CEO rightly placed a strong emphasis on data quality, indicating that the firm’s ability to execute strategic plans, make sound business decisions, and fully serve customers hinged on the quality of data.
The company also adopted the “Rule of Ten” as a guiding principle for data quality, figuring that it costs ten times as much to complete a unit of work when input data are defective than when they are correct. (see Dr. Thomas Redman’s Data Driven: Profiting from Your Most Important Business Asset) In other words, it takes $10 to process an inaccurate order as opposed to $1 to process an accurate order.
The risks associated with lack of data quality were present across the institution. They included the specter of audit failures and non-compliance penalties for numerous regulatory mandates such as Sarbanes-Oxley, Anti-Money Laundering (AML), USA Patriot Act and Basel II, coupled with resource waste and rework trying to undue the impact of poor quality data. There was also the risk of fraud and misuse, and of the failure of corporate governance and accountability initiatives. The negative impact on customer satisfaction was becoming more evident.
The rewards the institution were aiming for -- and achieved -- included improved service levels; improved understanding of credit, market, and operational risks; and improved customer satisfaction and lower churns. They also encompassed better identification and realization on cross-sell and up-sell opportunities as well as lower operational costs.
The institution ultimately adopted a six-sigma approach to data quality management to put its principles into practice. The firm set its data quality range, and treated data deviating from the acceptable range as “defects,” just as if it were manufacturing tangible goods. The institution also created statistical process control charts on data quality and developed a lifecycle-approach to the data quality management process as illustrated above -- and applied the approach to all major BI types in use across the business. Here’s what the institution accomplished in the context of each BI style, and how it benefited:
- Scorecard and dashboards: The firm enabled continuous monitoring of risks and reporting to auditors and risk officers, with increased visibility and auditability. By doing it as part of the data integration lifecycle, it saved an estimated $3 million in costs compared to implementing a custom AML solution for proactive monitoring. The institution also achieved greater understanding and insight into reserve levels for its executives.
- Enterprise reporting: The firm avoided as much as $20 million in potential regulatory penalties, and minimized duplicative effort including rework and resends that were part of its former manual data cleansing process.
- Cube/OLAP analysis: The institution established validity rules to check hundreds of conditions across dozens of lines of businesses, and accelerated root cause analysis by making a single data quality management platform available for business users and IT.
- Advanced/predictive analysis: The firm met multiple service level agreements for users requiring on-demand, intra-day or daily data for analysis. As a result, it now responds quickly to organizational changes, new lines of business, or new market segments, and has increased the speed with which BI users are able to understand linkages when investigating suspicious activities.
- Notifications and alerts: The institution automated targeted communications to customers based on asset levels or activities, with assurance of the accuracy of the content of those communications. Internally, it implemented real-time alerts on high-value problematic assets and facilitated better inter-departmental communications.
Timely, accurate data used for BI applications is critical to the workings of many organizations. Without the facilities to consistently deliver and act upon trusted, high quality data, BI systems can threaten to undermine the ability of organizations to assess the true state of organizational health and take right actions to run their businesses and compete effectively. By adopting an enterprise data quality approach, BI solution strategists and architects can design and implement the five prominent styles of BI, including score-carding, detailed analysis and on-demand alerts – with much improved confidence.
Data quality solutions can also synergize with existing enterprise data integration processes and solutions that have the ability to access and manage all types of master data in a metric-driven approach. The results can be far superior compared to the use of traditional data quality technology, typically limited to cleansing of customer data.
Successful deployment of data quality at this truly enterprise level helps an organization maximize the returns on its BI investments by improving its ability to leverage BI to drive competitive advantage and market leadership.
Julianna DeLua is enterprise solutions manager at Informatica. You can reach the author at firstname.lastname@example.org.