Q&A: Predictive Analytics Hot and Getting Hotter

The increased interest in predictive analytics points to its power and possibilities, says analyst Fern Halper. Vendors are eagerly jumping on board, and open source is becoming increasingly important.

"There has been such an uptick in predictive analytics ... even though the technology has been around for decades," observes analyst Fern Halper, with Hurwitz & Associates. Although she used the technology at Bell Labs back in the 80s, Halper says, businesses today are beginning to understand the value of analyzing data in advanced ways -- including the economic returns possible. "The technology has become top of mind with a lot of companies," she says.

Halper is a partner at the consulting, research, and analyst firm Hurwitz & Associates. She has over 20 years of experience in data analysis, business analysis, and strategy development, and has held key positions at AT&T Bell Laboratories and Lucent Technologies. At Bell Labs, Halper spent eight years leading the development of approaches and systems to analyze marketing and operational data.

She is also the author of numerous articles on data mining and information technology, and an adjunct professor at Bentley College, where she teaches courses in information systems and business. She blogs about data and analytics at http://fbhalper.wordpress.com/.

In this interview, part one of two, she talks about the growing interest -- and changes -- she sees in the predictive analytics market.

BI This Week: What is your definition of predictive analytics? Is it the same as advanced analytics?

Fern Halper: I usually define predictive analytics as an advanced analytics technique. At Hurwitz and Associates, we define predictive analytics as a statistical or data mining solution consisting of algorithms and techniques that can be used on both structured and unstructured data, together or individually, to determine future outcomes. Predictive analytics can be deployed for prediction, optimization, forecasting, simulation, and many other uses.

I was looking at a TDWI report on advanced analytics; your definition in some ways sounded similar to my definition of advanced analytics. It's text analytics. It is predictive analytics -- it's just very advanced algorithms, and I have a different definition for it.

Explain the Hurwitz Victory Index project you recently completed.

The Victory Index is a new assessment tool that we developed at Hurwitz & Associates. It analyzes vendors across four different dimensions: vision, viability, validity, and value. What we're trying to do with the index is take a holistic view of the value and benefit of predictive analytics. We're not just looking at the technology -- the technical capabilities of the technology -- but also its ability to provide value to customers.

We used a weighted algorithm that has 40 different attributes across four different dimensions. The first two dimensions, vision and viability, are all about the market perspective. The vision is the strength of the company strategy, and the viability is the strength and vitality of the company.

The other two dimensions, validity and value, are more of a customer-product perspective. Validity is the strength of the product that the company delivers to its customers; the value is the advantage the technology provides. For validity and value, we use primarily data from customer surveys about how customers feel about different vendors that are part of the Victory Index. Viability and vision [rely more on] secondary sources.

We also used social media analysis as part of the Victory Index. We looked at what people were saying about different products on blogs, tweets, and so forth. ... We used all different sources of data and really tried to make the index as comprehensive as possible.

Do clients come to you for a better understanding of how predictive analytics can add value to their company and data?

Some companies are asking that, but what's interesting is that there has been such an uptick in predictive analytics. People are asking about it even though the technology has been around for decades. I used it back in the '80s when I was at Bell Labs -- all different types of predictive analytics techniques.

Now, though, people are beginning to understand the value, and there are economic imperatives around really understanding your customers. The technology has become top of mind with a lot of companies.

The companies I've surveyed and talked to are looking at predictive analytics in terms of advanced analytics. I also asked about things [such as] text analytics and analyzing data streams in terms of big data analytics.

Trying to find patterns in data was one of the top use cases for advanced analytics, which is [very much] about the predictive model. People are using predictive analytics to try to find patterns in data.

The top two drivers are to remain competitive and to better understand customer behavior.

That echoes what we're seeing at TDWI, that predictive analytics is gaining tremendous traction.

In the same study, I was very interested in understanding, as a side issue, companies' understanding of predictive analytics. Who is actually using this technology? As a data monitor from way back at Bell Labs, I wanted to understand who companies thought were actually going to be using these products. Was it a statistician or mathematician, [someone who could] really understand what this was all about? These models can be very complex, and if you don't know what you're doing, you can really not know what you're doing and come out with results that you may think mean one thing but actually mean something else.

What did you find in terms of who is using predictive analytics?

Two things of interest there: One is that there's definitely a shift by companies to have business users work with these advanced technologies, predictive analytics included. The majority of users of predictive analytics [before] would be mathematicians, statisticians, and quantitative types of people. For companies planning to use it, many of the end users that they think are using the tools are actual business analysts. They're not necessarily trained statisticians.

However, of the companies I talked to for the Victory Index, 98 percent said you have to have training on these tools. I completely agree with that.

Now, of course, there's a thrust from vendors to make tools easier to use and to automate certain functions, so you don't have to be a statistician to build basic models -- a business user could do some of it.

Are these tools really becoming easy enough for business users to use effectively?

It's interesting. The big vendors have tried to make their tools simple enough so that a business user could actually have a set of data, and the tool automates some of the data preprocessing and then suggests models based on the data -- models that someone could actually use.

On the one hand, these tools are -- in some sense -- becoming easy enough that a business user could use them. It depends on the user. I [worked in] predictive analytics for a long time, and I was not a trained statistician. I was trained in a quantitative field, so I thought quantitatively and I also understood the business.

I do think that they're making the tools easy enough in some respects for business users, especially marketers. However, I would recommend that users be trained. Even if it seems easy to use, you could get yourself in trouble. One of the things that I was taught when I was doing data analysis was, "Look at the data, look at the data, look at the data." ... Don't just start throwing a bunch of algorithms at the data. What's the data telling you? Explore the data in the first place, then generate your hypotheses, and then run your analysis.

A lot of attention is being paid to unstructured versus structured data these days. How is predictive analytics being used on unstructured data? What kinds of issues come up?

One of the most popular use cases I've seen is insurance fraud. You have structured data for and about an insurance claim, for example -- that has the name of the person and the date of the incident, and so forth. It's really in the unstructured data, the actual verbiage of the claim, that you could actually find some very useful types of information regarding potential fraud. Companies are marrying the structured and the unstructured data and using text analytics to go through all the claims data. They then pull out important themes, entities, and concepts [that might indicate fraud], then link that with the structured data to get a better lift on the model.

Analytics is also being used in telecom, in customer care centers, in warranty analysis, for example, to understand what problems customers are having.

There are cases where you can marry the structured and unstructured data together because you have a common key, in some ways -- a customer makes it easier to do that.

Then, of course, you have the whole area of unstructured data analysis in social media, which is another place where companies are using analytics. They may not be able to marry unstructured data together with structured data, but they're using it to get deeper insights.

SAS, for example, has a way to take unstructured data and pull out entities, concepts, and different aspects of insight that they can get from the unstructured data. They can then predict, for example, what the buzz is going to be around a certain product. However, as they are the first to say, that this is not for the faint of heart.

You're blending structured and unstructured data?

Yes, marrying the structured and the unstructured. You're essentially making the unstructured structured when you're running text analytics over it. Say I have a bunch of text somewhere and I'm running text analytics over it. I'm pulling out different pieces of information from it, then I'm putting that together with my structured data.

In the insurance case, for example: Someone files a workman's claim, and it turns out that they were called by their supervisor four times and written up for not doing the job to the best of their ability. There are things that you couldn't get from the structured data. Put them together, run your model, and get much better lift.

It's still in the early stages, but I've heard people talking about analytics in financial services, telco, various manufacturing industries, and even marketing.

You mentioned data analysis and social media. What is happening there?

That's a good question. Basically, it's a good example of what I've seen happening in predictive analytics.

There are probably 250 social media analysis products out there. First, you have to separate the listening posts that don't really analyze anything. Then, there's the social media analytics from vendors that are much more sophisticated at using natural language processing, doing sentiment analysis, and all of that.

Beyond that, there are a handful of companies that I would call pure-play social media analysis companies. They do it really well. Then there are business analytics companies that also are doing social media analysis, such as SAS and IBM.

SAS is really the only example I could point to at this point that is actually showing something that's predictive in terms of predicting buzz about a certain brand based on what's happened in the past. That's one use case.

Going back to your Victory Index research, did you uncover anything surprising?

I've been living and breathing analytics for so long that there was nothing incredibly surprising. But… the length that companies are going to make predictive analytics easier to use was interesting to me, because [all the vendors] were all talking about it.

Also, we can't finish the discussion without talking about how open source models are becoming more prevalent -- that was another interesting finding.

What did you find out about open source and predictive analytics?

Open source is becoming increasingly important because it enables a wide community to engage in innovation, and it's offered at academic institutions. [The open source predictive analytics language] R, for example, is becoming really popular. What's happening is that there is an ecosystem of vendors sprouting around these open source solutions to make them easier to use.

R, for example, basically [requires] command-level sorts of input. It's not intuitive, it's not easy to use, and it's also not that scalable. Vendors are popping up -- Revolution Analytics is one -- to try to make R easier to use. Even more established vendors are incorporating open source like R. Vendors are wrapping around those platforms.

Must Read Articles