Q&A: Look Internally for Data Mining Success

Data mining offers fast returns in a slow economy.

Since the mid-1980s, Tony Rathburn has worked with commercial and government clients to develop solutions using data mining techniques. He delivers custom workshops, teaches at TDWI conferences, and consults on a wide range of commercial assignments. Rathburn has extensive data mining experience in the banking, insurance, and financial industries, including seven years teaching MIS and statistics at Kent State University, and as vice president of Applied Technologies for NeuralWare, Inc., a neural network tools and consulting company. He co-presents a recurring Webinar from his firm, The Modeling Agency (TMA), titled Data Mining -- Failure to Launch. TMA's mission is to guide those who are data-rich yet information-poor to establish their own internal predictive analytics practice.

TDWI spoke with Rathburn recently about a topic that has drawn great interest in a lagging economy -- how to use data mining and predictive analytics to glean better results.

BI This Week: Data mining and predictive analytics are topics that you speak and write about extensively. What are some of the benefits you see at companies that successfully implement data mining?

Tony Rathburn: The obvious benefit is enhanced performance in many different areas. We've seen an interesting surge in interest in predictive analytics and data mining beginning about a year ago as the economy started to change. People began to say, "OK, we can't just keep doing the same things we've been doing. It's time to look at something different."

Can you give an example of a client you've worked with on data mining who is seeing success?

We worked with a company that was doing business-to-business sales using catalogs and a telemarketing department, the same way they've been doing it for 30 years. They're very successful, with roughly $100 million a year in sales and about a $20-million-a-year marketing budget. They were trying to reduce costs and at least maintain commerce in this economy, not enhance it.

What we ended up seeing, after a 10-day project, was a 40 percent reduction in marketing costs, roughly speaking. We took the marketing costs down to around $12 million annually -- and they've seen about a seven percent bump in sales as well. That 10-day project netted them about $15 million a year.

If we consider predictive analytics as another sort of decision-support system within the company, how does it tend to work with systems already in place such as OLAP, statistical analysis, data visualization -- those kinds of things? How is predictive analytics different?

This is one of the reasons why at The Modeling Agency we have a specific definition of predictive analytics and our approach to it.

The approach that I've been taking for a couple of decades is that it's not about the technology. We are doing decision support, basically. There are a lot of different decisions in a business environment that are made on a regular basis. The major advantage to the quantitative approach [that we advocate] isn't that you get better decisions, although you can. The major advantage is consistency. The mathematical model that is part of predictive analytics helps you with a particular decision process that makes the same decision based on the same factors over and over and over again. You get a standardization of that decision process.

In most organizations, that simply doesn't occur. You have a number of individuals whose decision process varies depending on their experience, their background, and even with the same individual depending on the day and what they're working on. Developing quantitative approaches helps you establish consistency.

You can then start to develop and compare alternative approaches against a standardized set of performance metrics that will allow you to say, "OK, does strategy A work better than strategy B, and how do A and B stack up against strategy C?"

You get a process for modifying your decision strategies and then a consistent implementation process.

You've said that data mining is "not about the technology." Are the available tools mature now, so it's not a matter of the right or wrong software?

The tools are very sophisticated, don't get me wrong. They are just tools and I think that's important to understand.

If you were building a house, you wouldn't go down to your favorite hardware store and just buy the most sophisticated tool to help you build your house. You plan; you come up with a design; you understand what you want. You go to the lumberyard and look at available materials.

My materials are my data. Our clients work out an approach to what it is they're trying to build, and then we select the tools to help implement that project.

It's the same with predictive analytics. The critical piece is the project definition. That involves understanding from a business perspective what it is we're trying to achieve, what decision process is being impacted, how we're measuring success, and what we're doing now. We look at our data in terms of what raw material is available to help us potentially enhance our performance, and then we go out and look at the software and the techniques that are available and say, "Do I need a hammer or do I need a screwdriver?"

When you see companies struggling with predictive analytics and data mining, what are they doing wrong? What are some common mistakes?

The biggest problem people run into is this: They get too caught up in the technology. They're looking for the best algorithm or the best piece of software. They start looking at things and saying, "If I can just find the right tool, it's going to revolutionize my business."

It's not. You have to look at it from the business perspective and then evaluate your performance based on your business objectives. … Nobody has ever been given a raise, bonus, or promotion based on R-squared, LIFT, or some statistical technique, but that's where people tend to look.

People attracted to analytics are generally quantitatively oriented; they're used to thinking about issues from a technology perspective. However, predictive analytics is really about enhancing business performance, and as a side note, doing that using technology.

Where should a company start to work successfully with predictive analytics?

For projects that I've worked on that have been truly successful, the best starting point is this: The company really buys into the idea that, here is the business problem that I'm working on and how we measure success. Now, is there a better way to allocate our resources to achieve a performance enhancement by using information in our data and quantitative tools?

I used the analogy of building a house. You want to start in the same place. What is it that you want? What is critical to you? What are you trying to achieve and how are you going to evaluate it? That's the process that most people overlook. They oversimplify it and say, "Of course, everybody knows how we work, what we do, and how we measure success." Challenge them to write it down. In the group of six people in a room, there will be 10 or 12 different definitions of what they do, how they measure success, how well they're performing, and what the decision processes are.

First, stabilize that plan. You need a blueprint and a way of evaluating. That's the best starting point.

What about externally? Where's the best place to start there?

Rathburn: Externally, I'm going to say training -- and not from a quantitative-technique approach. It's not about going out and learning a bunch of algorithms. When I'm talking about training, I'm talking about having people help you with the process that I was just describing. Understanding the issues that are particular to the mindset of predictive analytics is foreign to most people. They don't understand the nuances.

We have clients who are project sponsors and need help clarifying exactly what it is they're trying to do. They understand ROI and making an investment, but they don't understand the reality of how the technology might be able to help them and what the related issues are.

You have individual experts in particular domains making these kinds of decisions. They're good at what they do, but they have a different perspective than the project sponsor would. For that reason, they can help you with much of the data and other materials you need.

We want to know what the data elements are that are important to them and that have information content. It's not necessarily about how they make their decision, but where the information content is in their data.

We have the end users. We have to look at people on the front lines making decisions. Are we doing a batch processing delivery system? Is this going to be in real time? There will be different considerations around the user interface based on [these answers]. It's not just developing a mathematical formula -- how do we actually apply it from the end-user perspective?

We also involve the IT department in terms of the data warehouse. Where is the data? How is it organized? In many organizations, what look like data quality issues are just ad hoc queries, with people going in and using a slightly different SQL statement and thus getting variations from report to report. How do we standardize that? Based on needs and desires, what are the IT perspectives going to be?

Those factors are unique to each organization and to each project within an organization.

Then you get into the modeling piece of it. If you do all the other pieces right, the modeling becomes almost trivial. The software then can go out and create a number of models using appropriate algorithms. That gives you a number of options and based on your performance metrics, you determine which one is the best of the group that you've developed -- which one is consistently outperforming what you're currently doing.

How good a job do companies do in calculating the ROI for these sorts of projects?

There have been a few surveys done in the last year or so -- it's always amazing to me to see what people are interested in and what they're trying to do with data. These surveys tell us that over half of the organizations that get into predictive analytics have no idea at the end of the project what their return on investment was. About a third of them have never even considered measuring it.

Can you imagine any other area of business in which you'd go out and spend tens or hundreds of thousands of dollars having no idea what you got in return? When I talk about performance metrics, that's what I'm talking about -- business decision projects. It's not about exploring some new technology. There are low-risk, high-ROI designs that any organization walking into this should understand right from the beginning, and that's part of what we help people do.