In-Depth

Q&A: Survey Highlights Data Governance Best Practices

Data governance survey shows what works, what doesn't.

Data governance is drawing much attention, but there is little hard information to back up that interest. In late 2010, Andy Hayler and The Information Difference teamed up with the Data Governance Institute to conduct an in-depth study of data governance in practice. To date, 134 companies have participated in the study, which is still open for participation. In this interview, Hayler discusses what the survey has uncovered about data governance, including the best practices of successful programs.

You can participate in the survey and benchmark your own organization. Respondents will receive a free summary of the report findings.

Hayler, who founded the software firm Kalido, is now CEO of The Information Difference and is a leading expert on data topics including master data management. He is a regular keynote speaker at international conferences and spoke at a TDWI Webinar in April on the data governance survey. An on-demand archive of the presentation is available.

TDWI This Week: Your survey focused on data governance, which is garnering lots of attention lately. Why did you conduct the survey?

Andy Hayler: We conducted the survey in response to a growing number of client inquiries about how to measure the effectiveness of data governance. Companies that have invested heavily in data governance are under pressure to justify the investment and so are looking for ways to compare their initiatives with those of others.

To whom was your survey directed, and what did you hope to discover?

We initially designed a detailed governance framework, jointly with Gwen Thomas of the Data Governance Institute, and validated this with a panel of multinational companies that had already been acknowledged for their mature data governance programs. We tweaked the model in response to their feedback, then designed an in-depth survey around this structure. We sent the survey to all of our contacts, along with those of the Data Governance Institute, inviting companies that had existing, operational data governance programs to participate.

Because of the considerable effort needed to gather the data to fully complete the survey, we initially targeted 25 companies. We were therefore delighted to get 134 companies to respond from a wide mix of industry verticals. Half of the respondents were from the U.S., almost as many were from Europe, and the remainder were from the Asia Pacific region.

Based on the results, what suggestions can you offer for justifying a data governance program to top management?

Broadly speaking, upper management is interested in three categories of projects: those that generate additional revenue, those that reduce cost, and those that avoid risk. Consequently, it's essential, in our view, to build a proper, quantified business case for data governance. Measuring the cost of bad data is part of that.

Despite that, only 29 percent of respondents said they measure the monetary cost of poor data quality, although that's a key element in helping to construct the business case for data governance. It's interesting that financial companies and pharmaceutical firms were two of the most common verticals responding to our survey -- both of these are heavily regulated industries.

We believe it's likely that many of the companies in these industries were pushed rather than jumped into data governance by regulatory pressure. An example of this is the European Union's Solvency II legislation in insurance, which mandates data governance. Corollaries in the U.S. include legislation such as HIPAA (Health Insurance Portability and Accountability Act) and Sarbanes-Oxley.

Can the costs of data governance be quantified somehow in advance for budgeting and planning purposes?

In our survey, we asked for the amount of resources and costs that were being expended on data governance programs. Obviously, expenditures varied a great deal by the size of the companies responding. However, figures show that the effort is not trivial. The "project office" for data governance has a mean, or average, of four full-time people (the median was two full-time people). Those staff members were supported by a mean of nine part-time people (median of three), plus a mean of nine data stewards (median of four). The costs of setting up a data governance program averaged $3.5 million, with $1.2 million of annual ongoing effort.

What is the role of a data steward? Is that an absolutely necessary position, and among your survey respondents, who tended to fill it?

In our view, it's essential that the business take ownership of the data rather than just handing the issue off to the IT department. Consequently, we believe that the role of data steward is a key one. That said, it's hard to make generalizations about the people that best fit the role, since that answer depends on issues such as the industry and the culture of each organization. Clearly, it's helpful if those involved feel engaged and are willing to contribute actively.

In the companies you surveyed, who tended to be in charge of a data governance initiative -- IT or business? It sounds like you recommend business leadership here?

Most initiatives were led by the business, or at least jointly by the business and IT, which is a good thing. A minority were led by IT. This approach usually causes issues because IT typically lacks the authority to get business people to change their way of working, which is necessary for a data governance program to succeed. This and other surveys we've conducted have shown an increasing recognition of this fact recently.

How did companies say that data governance rules were enforced? Was that an issue?

By "at source" I mean "in the operational systems" -- e.g., a telesales person taking an order may try and type in a new account name when in fact the account already exists. If the data quality rule is enforced at source then the order taking system will validate the account name and check for duplication (the best way). The alternative is to hope that a data quality audit will find the duplication later, which is obviously less satisfactory.

That's an interesting question. We did ask about the enforcement of business rules. It's a concern to us that only 25 percent of respondents enforced business rules at the source -- in the systems themselves -- for most or all key operational systems. Another 39 percent had few or no business rules enforced at the source. The remainder had patchy implementations. Validating at the source -- enforcing a data quality rule, for example, that prevents a new account from being created if an account already exists in that name -- is the best approach, so it's disappointing that only a quarter of respondents report doing so.

This is definitely a problem, as it turns out that enforcing business rules at the source of the data is a behavior that was highly correlated with the most successful data governance programs.

Are there any new trends happening right now in data governance (for example, the use of master data management, metadata repositories, or policy hubs)?

Certainly master data management and data quality can go hand-in-hand with data governance (although we believe that data governance has a wider scope than this, encompassing data security and archiving as well). More companies are deploying MDM technology in their businesses (41 percent of respondents in our survey said they are doing so, with 23 percent more in development).

However, the picture is different for metadata repositories. In the survey, 13 percent used them effectively; however, a full 20 percent has metadata repositories but find them ineffective. (The rest didn't have them at all.)

Policy hubs are a fairly new area, so it's not surprising that deployment here is rare (just 17 percent), although respondents showed lots of interest in acquiring such capability.

With your survey, you were able to directly correlate a company's data governance behaviors with successful programs. Can you share some of those behaviors?

The large sample size we obtained meant that we were able to conduct rigorous statistical analysis of the behaviors of data governance programs, calculating which characteristics were highly correlated with successful programs.

Successful programs had the following:

  • A data governance mission statement
  • A clear and documented process for resolving disputes
  • Good policies for controlling access to business data
  • An active risk register
  • Effective logical models for key business data domains
  • Either business processes defined at a high level or fully documented at several levels and available for data governance
  • Data quality assessments that were undertaken on a regular basis
  • A documented business case
  • A link between program objectives and team or personal objectives
  • A comprehensive training program
  • A Web site alongside a broader range of communication methods

It's as simple as that. That list summarizes what effective and successful data governance programs do. Again, this isn't a so-called expert opinion -- it's the statistically validated behavior of a successful program.

Conversely, what behaviors should companies avoid?

Clearly, if you avoid the behaviors listed above, then you are setting yourself up for trouble. Of course, it's much easier not to take time to document the program goals, not to set up dispute-resolution procedures, and not to bother with document processes and data models. Similarly, you can save time by skipping any undertaking of data quality assessments and by avoiding calculating the costs of poor quality data.

It's certainly much easier to take that path, which is why only 23 percent of data governance programs we measured are regarded by their own organizations as "quite" or "highly" successful, and why 41 percent of programs are regarded as at somewhat unsuccessful or worse.

Must Read Articles