In-Depth

Q&A: Making the Move to Better Data Governance

Data governance expert David Loshin discusses ways to embed data governance practices into systems to institutionalize good practices while working to eradicate bad ones.

In the typical organization, the absence of effective data governance over many years has institutionalized poor data practices across business processes. As more organizations move toward implementing solid data governance programs, they encounter the challenge of overcoming those bad practices in order to ensure information is accessible, usable, and of good quality.

In this interview, data governance expert David Loshin discusses ways to embed data governance processes into the system development framework to institutionalize good data governance practices while working to eradicate bad ones.

A consultant and thought leader in BI, data quality, and master data management, Loshin is president of Knowledge Integrity, Inc. and a prolific author on data management, including the best-selling Master Data Management. His most recent book is Practitioner’s Guide to Data Quality Improvement. Loshin is a frequent speaker at conferences and other events; he last spoke about data governance at a TDWI Webinar on Sept. 20 (Operationalizing Data Governance for Business Process Success). He can be reached at [email protected]

BITW: As more organizations realize the need for a data governance program and begin implementation, one of the challenges is breaking out of old patterns of poor data practices. What are some approaches to overcoming that?

David Loshin: That is a really good question, and almost defines its own parameters: How do we even identify what those poor data practices are? Even calling them “poor data practices,” perhaps, isn’t really fair. A lot of the issues emerge from a “function-centric” approach to developing applications, in which the data is subsidiary to the function, yet we are rapidly moving to a time and place where data sets are always subject to reuse, and that means the rules need to change.

The approach we’ve taken is to encourage our clients to take inventory of their data management capabilities and then do a self-assessment of their own levels of maturity. At the same time, we will review the business objectives and their dependence on usable and accessible data and provide suggestions as to the levels of capability and maturity necessary to meet the business needs. By looking at where the organization needs to be and comparing it to the current state, we can help identify gaps in data management. Information governance, in turn, can be used to address the gaps and transition into something that is more self-reliant with respect to information management.

What are some of the problems you encounter when working with an enterprise to clean up data governance, and what are the impacts?

One of the biggest issues we see is that small variations in structure and formats, along with definitions of data element concepts and reference data concepts, become magnified when data sets are shared across functions. Even the mechanical issues of data integration are impaired when what is believed to refer to the same concept is represented in different ways.

For example, if different functions of the business have different understandings of what is meant by “customer,” the customer counts in each area will differ, and they will all differ from the count coming out of a data warehouse. Even simple differences have impacts, such as reference concepts with different value domains -- gender codes or country codes, for example. Often there are multiple copies of what are thought to represent the same domains, only to find currency or consistency issues across the board.

You’ve discussed ways to embed data governance processes and practices into the system development framework. How does that work?

It begins by recognizing that the data is no longer used solely as the raw input for operational or transactional systems, but that it is a resource to be reused multiple times. That being said, the first step is to institute a process for soliciting data requirements from any of the potential users any time a data set is created.

This sounds more complicated than to really is, and, in fact, it helps give the data analysts and data stewards a much more comprehensive understanding of the scope of data use across the organization. We have seen some particular success in using this approach when organizations are considering a master data management program -- it helps to flesh out those attributes that are shared and distinguish them from those that are used for only one purpose.

Where do data governance policies come from?

The approach we take focuses on successive refinement of business policies to “extract” the data requirements, and the data governance policies are refined as a matter of course. A good example starts with government or regulatory requirements that impose rules for statutory reporting. There are often instructions and rules for what those reports must look like. In some cases, those rules are much better described than others, but let’s use that scenario. Compliance with those reporting rules is a business policy, but formatting the report so that it complies with the rules imposes constraints on the information itself.

Ensuring that reports are formatted so that they meet regulatory requirements is essentially a “data governance” policy, implying a set of directives and rules for how the corresponding data sets are managed. The same refinement process could apply for any set of directives, whether they come from business expectations regarding customer service or financial directives imposing rules about prompt invoice payment, and so forth.

For good information governance, where should information monitoring and control take place, and what effect can it have if done correctly?

I have always advocated an approach that abstracts the data rules and lifts them out of the application so that those rules can be managed the same way that the information is managed. That being said, we have used data profiling services that can be integrated into any spot in the information production flow where data sets are shared, as methods for inspecting compliance with data rules.

At the same time, we can augment the work flows with data governance and stewardship: when the data instances do not conform with expectations (namely, when the rules are violated), a data steward can be notified and that data steward can begin the process of analyzing the violation, determining root causes, and attempting to influence process changes that will eliminate the potential for introducing new errors.

Can good monitoring and control actually help streamline cross-functional processes?

That is what we believe. Identifying issues early in the process reduces the need to unwind the process long after they have finished. When violating data records can be extracted from the process flow, transaction can execute without manual intervention, which increases throughput while decreasing latency.

What is involved in defining and then operationalizing data policies?

We see it as a combination of using metadata and instituting methods for monitoring observance of the rules. Synchronizing and potentially harmonizing business-term definitions will actually get you a long way down the road because that process will expose where variation is leading to flawed operations.

When it comes to operationalization, it also implies those data stewardship processes and procedures for analyzing root causes, determining alternatives for improving the processes, and engaging the process owners to make those changes. At the same time, operationalization also will include the integration of inspection, monitoring, and alerts when data issues are detected. Whether this means automated inspection (such as with the use of a data profiling tool) or manual inspection (when it is scalable enough to work), the main idea is the marrying of the data observation with the specified processes to address any potential issues.

What sorts of data governance directives can help augment the system development life cycle process?

As I suggested earlier, the first directive is to document how the data will -- or potentially could -- be used. In this case, a data governance council consisting of representatives from each business function adds significant value, since it allows for one individual to socialize the creation or repackaging of data for different purposes among peers, allowing the opportunity to reflect on how that same data set could be reused.

The second approach is to use metadata management to document the relationship between business policy and the instantiation of data elements within different business processes, and subsequently in their applications. This helps to do impact analysis when policies or business rules change. It helps in figuring out which applications are dependent on which business term definitions, data entity definitions, and data attributes. Using the metadata system for lineage and traceability can significantly reduce the analysis stage of application updating or renovation.

What about metadata? How should it be handled?

Gently? Seriously, we feel strongly that too little metadata is useless and too much metadata is a resource hog. We have to take a step back and get a better understanding of what we want to use metadata for. Like I mentioned before, we think that metadata can be useful in harmonization of business terminology and semantics, and it can provide a mapping between business concepts and the underlying representations, as well as provide lineage and impact assessment -- but that is really a set of metadata disciplines that need to be socialized and brought under control. Again, that shows the value of information governance as a way to harness those disciplines.

What about other new techniques such as text analytics and standardized data quality services? How can they be included in a good data governance program?

Text analytics can be very useful in helping to flesh out underlying semantics within the business context. By determining that “car,” “auto,” “SUV,” and “minivan” all refer to similar concepts, one can use tagging and taxonomies to harmonize the terminology, the attribution, and the relationships associated with these real-world concepts. Think about how many synonyms we use indiscriminately!

When it comes to standardized data quality services, we feel strongly that if you can reduce the complexity of the ways that you seek to ensure quality, you will end up with greater consistency, which I think goes a long way.

There are also practical aspects -- if we can reduce the number of ways we apply technology, we can unify our vendor management, reduce the variation in defined business data quality rules, reduce the number of people involved in tuning unique aspects of each tool, and reduce costs by consolidating licensing or even eliminating vendors from the environment. How can you go wrong with that?

Must Read Articles