5 Critical Features of Successful Predictive Analytics Projects
Successful project opportunities share five features so developers can achieve business success.
By Thomas A. "Tony" Rathburn, Senior Consultant and Training Director, The Modeling Agency
[Editor's Note: Tony Rathburn is one of two speakers of the keynote, "Where Agile Meets Analytics," at the TDWI BI Symposium in Toronto on June 25.]
Predictive analytics is receiving growing attention throughout the business community. Unfortunately, most of this attention has been directed at the technical and mathematical aspects of the tools and techniques.
Predictive analytics is not a "cure-all" technology. Successful project opportunities share five key features that allow developers to achieve real world business success.
Feature #1: A Clear Business Objective
The first priority of any predictive analytics project is a clear understanding of the business objective being supported. Predictive analytics has been applied to customer/prospect identification, attrition/retention projections, fraud detection, and credit/default estimates.
The common characteristic of these opportunities is the varying propensities of individuals displaying a behavior that impacts a business objective.
Traditional statistical analysis is best suited for opportunities where enhanced understanding of the population is desired. The best predictive analytics opportunities involve the identification of the extremes of behavior propensities.
Feature #2: Defined Performance Metrics
A critical factor for successful development of predictive analytics projects is a well-defined set of business performance metrics specific to the organization's business objectives.
The models developed in predictive analytics must be evaluated in the context of these performance metrics. Reliance on traditional statistical performance metrics, such as R2 or Lift, results in models that appear to be viable but ultimately fail to perform in the business decision environment.
Feature #3: Specific Behavior of Interest
This behavior of interest is what we are coding as the "desired output" variable in our historical data. It represents whether or not the behavior was displayed for each record in our data set.
It is critical that we take the time to ensure that this behavior is defined in a way that supports our business objectives and that is consistent with our evaluation performance metrics. Failure to correctly define and code our behavior of interest virtually guarantees significantly reduced performance or complete project failure.
As an example, most fraud detection projects are actually intended to identify those instances where additional investigation is required. Many organizations are willing to let relatively minor errors escape additional investigation to allocate their investigation resources to the more significant cases.
An organization may determine that they want to investigate cases where errors and incorrect payments were potentially above $10,000. If the analyst codes the desired output variable in a way that all historical records containing errors and incorrect payments had a value of "1," and correct payments had a value of "0," the models would achieve a different level of performance (based on our business performance metrics) than a coding scheme where a "1" represented historical errors and overpayment of $10,000 or more.
In short, models will learn to identify what is defined as the behavior of interest in the historical data -- the 1's. Inaccurate definition of the behavior of interest in the historical desired output attribute results in models that learn to identify what you specifically stated, not what you meant.
Feature #4: Identified Resource Allocation Decision
Far too often, we get caught up in the technical aspects of an opportunity and lose sight of the business motivation. In general, the single most effective characteristic of solid business opportunities for predictive analytics is the identification of a decision process that involves the allocation of scarce resources.
Any time we are faced with the allocation of time or money in our business relationships there are some relationships that will benefit us and others that are either less productive or that have a negative impact on our performance metrics. These are the ideal opportunities for the application of predictive analytics.
By effectively handling the complexity of the large number of attributes associated with each of these relationships, we often derive a scoring system that serves as a relative ranking of various groups based on their propensity to display a behavior of interest in a way that allows us to adjust our resource allocation strategies and enhance business performance.
Feature #5: Sufficient Data
Predictive analytics projects commonly make use of an approach referred to as "supervised learning." Developers utilize a set of historical data to complete their analysis. Each record in this data set consists of attributes of the individuals under analysis, and a 'desired output' attribute that corresponds to the behavior supporting our business objective.
The desired output variable typically takes a "1" or "0" -- a binary representation distinguishing between individuals who display the behavior and those who do not.
The algorithms utilized to develop our models search the available candidate attributes and develop a mathematical formula. Our models effectively become a scoring system indicating the propensity of an individual to display the behavior affecting our business objective.
Without sufficient data, it is impossible for us to make use of the techniques in predictive analytics. However, there is often a misconception about the volume of data required. For many business opportunities, as little as a few thousand records can be utilized to develop quantitative models to significantly enhance business performance.
Predictive analytics has been plagued by exaggerated expectations, hype, and an over-emphasis on sophisticated quantitative techniques. The appropriate identification of opportunities where predictive analytics can be successfully applied is often the determining factor in the success of projects.
Thomas A. "Tony" Rathburn is a senior consultant with The Modeling Agency (TMA) and has over 25 years of applied predictive analytics development experience across a broad range of industries and application areas. He has been a regular presenter of predictive analytics and data mining courses at the TDWI World Conferences since 2003.
Tony is a presenter in TMA's popular data mining Webinar, Data Mining: Failure to Launch – How to Get Predictive Modeling Off the Ground and Into Orbit. Tony may be reached at firstname.lastname@example.org