Q&A: Why Data Quality is a Measurable, Ongoing Effort
The value of data quality can be measured as you work.
When she began working on data quality at Hewlett-Packard in the early 1990s, Danette McGilvray held a common misconception about an initial project: "We were going to go in, take care of it, and check it off our list." The "aha" moment came later, when she recognized data quality as an ongoing discipline. These days, McGilvray, as president of Granite Falls Consulting, offers advice on data quality and governance to Fortune 500 companies.
When McGilvray consults with companies on data quality, her favorite quote comes from J.R.R. Tokien's Lord of the Rings: "The burned hand teaches best," one of the hobbits remarks to the wizard Gandalf. "After that advice about fire goes to the heart."
That sums up what McGilvray, a long-time data quality consultant, often sees in the field -- those who understand the value of a data quality effort most are those who have had to struggle with bad data.
As president and principal of Granite Falls Consulting, a firm specializing in information quality management, McGilvray consults and speaks regularly on data quality and data governance issues. Projects have included enterprise data integration programs, data warehousing strategies, and best practices for ERP migrations for Fortune 500 organizations. She's also lead Webinars for TDWI about data quality..
BI This Week met with McGilvray recently to discuss the importance of data quality, and how easily companies can start small to begin reining in data quality.
BI This Week: You've been working on data quality issues since the early 1990s. Have you seen changes in how companies approach data quality initiatives? Have they become more sophisticated?
Danette McGilvray: It's no longer unusual within a company, particularly large, global companies, [to find] more than one group that has decided data quality work is really important. As one group starts investigating what they're going to do, they end up finding other groups that are also working on data quality.
That's a good thing because it shows that we're starting to get more people recognizing the importance of data quality. I see that as an advantage. Some might say it's a disadvantage because different groups will want to be in charge, but it really can be an advantage as long as you coordinate the efforts.
Where do these types of efforts tend to begin?
In a company of any size, a focus on data quality often begins in one particular area. That area tends to be wherever the pain point is or wherever someone in that company has recognized that there's a need. Then, once they get their feet wet, [those groups] start to understand what they're doing; they begin to figure out an approach. They get a few projects under their belt, then roll things out on a larger basis.
It may be the same subject area, maybe customer data, but it now includes other regions. Sometimes, people start to hear about the success of one group and want to expand [the effort] to their area.
A data quality effort tends to start somewhere with that need and then moves out in one of two ways. One is purposeful in that someone is pushing it, and the other is when someone with a pain point seeks help.
Has that pattern changed over time, or has data quality always spread through a company in that way?
I've seen that pattern repeated. That's not to say that somebody might not want to start out with a big bang approach instead. I just see that much less often than what I just described.
What are some of the tougher aspects of data quality? What do you see teams, team managers, and companies really stumbling over?
One issue is the need to show the value of information quality -- the business impact and why we should care about it.
Fortunately, I've learned some lessons over time and we have some techniques to do that -- real ways that people can learn to assess the business impact and be able to show it.
A very close relative to that is the ability to communicate about data quality, including the business impact. It includes being able to connect what is happening with the data to some real business needs. There's a handshake among these three things [business need, business impact, and data quality]. People need to realize that data quality should never be done just for the sake of data quality. There must be a business need -- it's not a theoretical exercise just for the sake of doing it.
Are there ways to quantitatively measure the return on investment (ROI) of a data quality initiative?
By that question, you've pointed out a common perception -- that we must always get a full ROI, or we must always be able to quantify [a data quality project]. ROI is good, and I think most people want to measure it, but there are plenty of things we can learn that may be more qualitative.
The two aspects can work together. I have techniques on a continuum moving from less complex to more complex and time-consuming. Often, when people think of showing the value or business impact, the first thing that comes to mind is exactly what you said -- I need to figure out ROI, and that may take me months. It's going to be too much work. Consequently, they do nothing.
What I try to show people is that there is something you can do to show business impact and business value. I'll give you an example on either end of the continuum.
The first technique, the easiest, is what I call anecdotes. It's simply gathering stories. Every company has what I call urban legends, the stories that go around about this or that, that happened because of the data quality. This technique means doing a little research and finding out what exactly happened and trying to quantify it in some way. For example, the manufacturing line went down for two days, or a large customer was very unhappy and cancelled an account -- those kinds of things. Any time you can put numbers to it, so much the better. People always like to see that, of course.
Even if you can't completely quantify it, you can tell a story, and it may be something that people within your company can relate to. When they hear the story, they understand: "That's what this quality thing does. Tell me more." So maybe all the anecdote does is help people understand what you're talking about enough to talk to you longer.
What's at the other end of the continuum?
At the other end is a full cost-benefit analysis that is quantitative. Once again, relatively speaking, it means more time invested and potentially more complex projects.
Any technique in between those two can also be effective. Less time doesn't mean less effective, just as more time doesn't mean more effective. I really encourage people -- "Let's look at this and let's do something." There's something that you can do now within the time and resources you have. If all you can do is gather some stories right now, let's do that. You can start building on that. Even with the quantitative aspect, are there one or two small things we could quantify?
In short, I think it's so important for people to learn how to show business impact. They just need to know how.
Do business impact and business value mean the same thing?
To me, those are more or less synonymous. Anyone in the company might use whichever one of those phrases is more meaningful. With one client, I might say business impact. Another client might say, "Value works better for us. We talk about value, not business impact here."
Is any amount of effort in this area be helpful, even something as low level as gathering "urban legends"?
Absolutely, and it can be either internal or external events. … What if one of your competitors shows up on the front page of the Wall Street Journal for a data quality issue? They probably don't want to be there -- you can use that to say, "We might have this same risk within our company."
Your book is Executing Data Quality: Ten Steps to Quality Data and Trusted Information. For whom is it intended and why?
The book is really intended for practitioners who have been given responsibility for carrying out some kind of project related to data quality. Project managers are another audience, though I suggest [in the preface] that project managers might just want to hit some of the overview pages and not get into the detail.
I wrote the book because it appeared to me that within our body of knowledge there was a gap -- lots of very good books talked about methodology and concept, but at a higher level. Then there were plenty of really good books that went into detail in some areas related to data quality. I didn't see anything for those folks who say, "Okay, I have enough support now. We need to get started, so where do I begin? That was really my intent, to address that group.
In terms of getting started with a data quality initiative, is there a general first step?
The first step for anyone is to have some awareness that data quality could be an issue. Maybe they already know that; maybe they've run into it with other projects. In fact, some of the best people that I work with are people who have been burnt before.
One of my favorite quotes is from Lord of the Ring by J. R. R. Tolkien, "The burned hand teaches best. After that advice about fire goes to the heart." Sometimes the people who are most enthusiastic and passionate about this are people who have been on a project with no attention paid to data quality, and they saw what it cost. They saw the pain, the price, the deadlines, and so forth.
To sustain data quality over time, do you also need data governance of some sort?
You really need data governance to help sustain data quality. Data quality can be done on a one-time basis or a short-term basis but I really do think you need governance as well.
Some people may have slightly different definitions of data governance, but to me the important point is that data governance provides the right kind of structure, organization, and implementation of things like policies, procedures, and roles and responsibilities. What do we do with that? We outline and enforce rules of engagement to determine who gets to make decisions -- essentially, who is accountable for effective management of our information assets.
An important key is setting up the right level of structure. It's not just a bureaucracy for bureaucracy's sake. We do it to bring the right people together in the room to make decisions, to make sure that the right kinds of areas are represented. We create venues for communication so people can make decisions, resolve issues, escalate issues if needed, implement changes, and communicate it all.
That's my brief definition of governance -- it's definitely an important component for a management system for data and information.