Q&A: Data Mining for the Masses?

SAS, SPSS, and others say they’re making the Gandalf-the-White world of data mining more accessible— call it data mining for the masses.

By day, Mary Crissey is director of marketing with SAS Institute Inc. But that doesn’t quite capture Crissey’s responsibilities with the Cary, N.C.-based data mining and BI powerhouse. A veteran of the U.S. military and an avowed data mining junkie, Crissey currently works most closely with customers on the business side of the divide who aren’t as well versed in ars data mining-ica. It’s her job to make the often arcane technology and practices of data mining intelligible – in business terms—to folks who want more than anything else to see results. If it’s results they want, it’s results they’ll get, Crissey says—in part because SAS and other vendors are making the Gandalf-the-White world of data mining a lot more accessible to the non-cognoscenti. Call it data mining for the masses.

BI vendors have been talking up “BI-for-the-masses” for at least a decade now. And there’s a lot to like in this vision, especially because more and more BI-like capabilities are being exposed to more and more users. I don’t know that you’d say the same about statistical analysis, data mining, or any of the other traditional “wizard-like” technology practices, though. And yet SPSS in its most recent analytic platform release and SAS in its own marketing efforts have both touted improved ease-of-use as a strong selling point. Isn’t this somewhat unrealistic given the complex scenarios for which these tools are typically used?

I spent a whole career on active duty in the military so I was familiar with governmental agency kinds of things, and in military assignments we had to move every two or three years. So I know that in business today it’s very hard to maintain on the payroll a very highly educated PhD or masters candidate who can do those [statistical] models. Often the person who wrote it in the military would be reassigned, and it’s the same in the business world. On another note, it’s hard to keep these real creative mathematician thinkers off to the side in a busy business world, and lots of times they get tasked to do things that they’d rather not do, but they’re tasked to do it because it’s still beyond the capabilities of the [average business] user. That’s what we at SAS mean when we talk about making data mining more user-friendly.

In the BI space, ad hoc BI solutions still abound – whether it’s a homegrown reporting tool or a script-driven ETL solution. Is it the same in the data mining and statistical analysis spaces? Or are these technology domains so specialized that most companies have already standardized on tools offerings from you, SPSS, or other players?

In Las Vegas [last fall], we had our annual SAS Data Mining Conference. A high amount of attendees were not relying totally on our Enterprise Miner solution, but they had the homegrown variety, and they came to us because they needed a [data mining] solution they could embed in their [homegrown] software. They wanted a library of [algorithms] … rather than having somebody create it longhand and write out these lengthy syntax statements and having somebody who needs to keep [these statements] current. So the short answer, I think, is that even if they are using our Enterprise Miner tool, a lot of [customers] are looking for [additional] tools to help keep the collaboration going accurately and smoothly. That gets back to the ease of use that we talked about. You mentioned that more and more BI is going out to more and more users; you could say the same thing about [data mining]. So it’s important that we make it so… the non-PhD can use [these tools], too.

What’s driving this phenomenon, this data mining for non-traditional consumers, do you think?

I think [companies have] always wanted to have smarter decisions to be doing things the best they could, but in the past we haven’t had actual data available, and now we’ve got data all over the place. So we used to be making bad instinct decisions and just running with it, and then it came to be, ‘Oh, we’ve got all this data to analyze.’ And you could spend a lot of time analyzing it, but often you still need your answer by tomorrow morning, and often these people were still running off their instincts, and that’s where I think these [data mining] solutions and these software tools can help.

I think all along people have wanted computers to help them do their job better, but before we had the data, it wasn’t so much of an issue there. Because then the intelligence was in the manager, or the individual analyst. It’s still there; that intelligence is never going away, but it can be significantly enhanced by making this [data mining technology] available to them.

You mentioned information overload, a sheer abundance of data. That’s clearly driving the need for more and more BI, data mining, and statistical analysis, too. You also pretty clearly said that the individual manager, the individual analyst, the person who has the intelligence isn’t going anywhere. But because we’ve got so much data, and so little time in which to make sense of it, do you think we’re going to have more and more automation, where in many cases this BI or data mining software is making decisions—via pre-set triggers or rules—that used to be made by a physical human being?

I would really want a human being [involved]. I think most of our customers expect that [the output of] analysis will be interpreted by a human being. The point is that the machines are not running by themselves. I am strongly in favor of the human participant being a valuable part of the decision-making process.

But what about a rule-based system, where the software is basically triggering rules that have been defined by human beings? A sort of IF > THEN conditional approach to automation, perhaps with inferencing thrown into the mix, too?

In different banks or different companies that have their own internal regulations, in addition to federal regulations, we can specifically go in and tailor it and say, ‘We’ve noticed this new trend, let’s go in and hardcode it.’ That’s something we have done along those lines.

You mentioned hardcoding this logic in. That suggests a kind of exceptional scenario. Or, at least, a customer-specific scenario. Are there cases in which you’re incorporating rules and logic into Enterprise Miner, such that if a certain trend is observed in the retail or financial services industries, your customers in those industries can enable that logic, which is already built in?

If you can get it tailored and get it specific, say, for this type of industry, in this type of setting, then we can automate it. I have been working more on the general analytics technology, and so I’m trying to talk to a lot of the business guys who don’t like math, don’t want to see it, but if I can tell them something to make it work; if we have a specific repeatable thing that we can hardcode into logic, that’s what our solutions are all about. We have to go right with the consulting team. We usually want specialists; we want careful expertise; we would want to make it very specific.

Part of SAS’ strength is that we do have very rich and robust and complicated algorithms. We’ll have a few smart defaults, but if you want to go in and really make it unique, you can put in some very complicated and accurate code; you can get in and tweak it to your heart’s desire.

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.