In-Depth
Data Governance: Managing Data for Profit
Good business decisions depend on the quality and accuracy of your data, which in turn depends on how you apply standards and controls—the process of data governance.
- By Vinod Badami
- 07/08/2003
Webster’s defines governance as “the persons (or committees, or departments, etc.) who make up a governing body and who administer something”. The quality of your governance of an entity can directly impact the value derived from that entity. For example, the standard of education in a state school district is directly dependent on the policies, regulations, and standards adopted and enforced by the governing body for that school district. Similarly, the value derived from enterprise data is directly dependent on the quality and accuracy of the data, which in turn is dependent on standards and controls applied to the data.
However, applying the appropriate governance mechanism to data distributed across an enterprise is not an easy task. In most organizations, the data warehouse becomes the point at which data governance is applied to ensure data quality and accuracy. But this is a reactive step because the problems associated with the data arise much before the data reaches the data warehouse. Data governance, when applied across the enterprise, reduces not only the data errors at the data warehouse level, but also improves the enterprise data supply chain, rendering it more efficient and reducing the high cost of conducting business with inaccurate or uncleansed data.
Why Data Governance?
The schema of an organization’s data and applications inventory resembles a spider-web comprised of stand-alone, piecemeal applications that were never architected with the goal of ensuring enterprise data standardization.
Take for example an insurance company that operates multiple lines of business (such as auto, home, and life). Each line of business may manage a customer independently and maintain a fully separate and independent set of information on the customer. Due to the differences in data entry standards and data controls implemented by applications in each line of business, the customer information may be recorded differently in each application. As a result, multiple downstream applications that use customer data may end up interpreting the three customer records as three different customers, thereby impacting business decisions made using the data.
Situations such as these and others give rise to a host of data issues related to:
- data origination
- data security
- data quality
- data availability
- data usability
- data inconsistency
Data important to an organization is produced both internally and externally. The data supply chain is composed of producers, enrichers, distributors, and customers (users) of the data. Once the data leaves the point of origin, it becomes the raw material for calculated metrics and new data based on business rules (such as status fields and indicators).
The data supply chain described above involves multiple departments and data migration (also known as "extracts") from the source to different applications. When seen from an enterprise perspective, the quality, usability and other issues described are multiplied many times over and add significantly to the cost of doing business. One must not forget to also factor in the lost opportunities. This gives rise to questions such as:
- Which originator’s interpretation of the data is right?
- Which data can I believe?
- Which data are current?
- Which data are correct?
- Why aren’t more details about the data available?
Data Governance Dimensions
It now becomes obvious that for an asset so important to an organization, there is significant lack of control and management surrounding it. Different approaches have been used in the past to help manage the mass of enterprise data, such as enterprise data models, data dictionaries, and, of course, the data warehouse. Unfortunately each of these addressed just one aspect of data governance.
To start defining data governance, we must identify a set of dimensions that describe it. The following dimensions help to understand data governance in a holistic manner:
- Organization and Accountability
- Standards and Procedures
- Audit and Control
- Reference Artifacts
- Technology Support
The quality of data governance can now be measured by the extent to which each of the above dimensions is implemented. The higher the level of implementation of each indicator, the better the quality of enterprise data and information leading to lower cost of business operations and more profitable decisions.
Let’s examine each of the above dimensions in more detail.
Organization and Accountability
This first dimension begins with the creation of the Data Governance (DG) group. This group has the responsibility of defining the level of implementation of each of the dimensions based on the unique characteristics of their organization and, on an ongoing basis, to measure the effectiveness of the implementation.
One of the major obstacles to implementing data governance has been the appointment of a group responsible for data standardization across the enterprise. This always failed because the DG group never had operational control over the various departments and applications that made up the spider web. In the proposed new DG model, the accountability for implementing data governance is now shifted to those identified as the data originators or owners. Each operational area now has to identify a "data champion" who has interwoven into their daily responsibilities the implementation of DG policies at their local level.
Standards and Procedures
Most organizations have a set of data standards that should be used and enforced by the local data champion. At the local level, this normally applies to data entry standards as, for example, enterprise ERP applications implement their own standards once the data has been entered. Other data standards are those that should be applied to the development of new applications and the evaluation of packaged applications. Meta data standards are critical to understand the source, meaning, use and applicability of data.
Procedures at a local level include the steps for applying standards as well as developing and maintaining a list of the locally owned set of data along with their definitions, format, use, completeness and quality. The local champion is also responsible for reviewing this list with the DG group. At the enterprise level, the DG group is responsible for planning the incorporation of new data elements into the data warehouse and publishing it to the user community.
Audit and Control
While standards and procedures should reduce the incidence of data problems and ensure greater data standardization across the enterprise, it is not always adequate. The local champions may not always follow the standards resulting in multiple data issues downstream. The goal of the audit function of the DG group is to measure and report on the level of adherence to the standards & procedures and the severity of errors across the data supply chain. The end result is a scorecard for each department or group that is a producer, enricher, or supplier of data. The advantage of this approach in mandating and enforcing governance is that it solves some of the political issues that arise when the data champions in the business areas feel they are not obliged to report to the DG group. The scorecard is sent to the manager of the business area for which the data champion works, and a copy is sent to the executive committee. It now becomes the responsibility of the business area manager to correct the situation.
The control aspect of this dimension is implemented using technology, the challenge being that even if there is complete adherence to the standards and procedures, legacy, ERP, and packaged applications may lack the design controls to capture all data anomalies. These anomalies must be caught before the data flows to the information infrastructure of the organization viz. the data warehouse and data marts. Using the enterprise data model and business rules as a basis, data quality tools can provide the final layer of data filtering and scrubbing. Reports from these tools can help the IT groups managing the data sources identify and correct the design flaws in their system.
Reference Artifacts
The mother of all reference artifacts for data governance is the enterprise data model. Other artifacts include the data management architecture, business process model and meta data repository that should serve as the reference data dictionary. The DG group must ensure that these artifacts either exist or have a template that guide other IT groups in new development or current system enhancements.
Technology Support
This dimension refers to the technology components required to support the data management and delivery aspect of data governance and includes products such as data modeling tools, EAI and messaging products, ETL tools, data quality products, relational and multi-dimensional databases, meta data products, and OLAP and reporting products. The DG group must describe how these different products connect and support the data supply chain and develop standards and procedures for their use and implementation.
Start with the Data Governance Gap
How do you begin implementing enterprise data governance? It's not easy, but the benefits are clear, and the higher the level of governance, the greater the probability of clean, accurate and standardized information for business decisions.
The first step must be to complete a gap analysis to determine the level of data governance within the organization, similar to a baseline scorecard. Then use the results to define a strategy and roadmap for implementing the level of data governance appropriate for your organization. Approach it in a phased manner and measure the level of success at the completion of each phase.
Understand that data governance is a holistic approach – not just limited to standards, architecture, and meta data. It is about data both internal and external to an organization. Management support for a distributed governance model resulting in a single objective is often easier to elicit rather than the blessing for a single department with ‘enforcer’ powers which may result in significant organizational issues. It is almost a new approach to how organizations view data—their most important and differentiating asset.