Microsoft Sets Sights on Data Mining Dominance

Microsoft hopes to challenge established powers SAS and SPSS for data mining and predictive analytic bragging rights

When it comes to data mining and predictive analytics, Microsoft Corp. might not be the first company that comes to mind.

That could change, however, especially if Donald Farmer, Redmond's principal program manager for SQL Server Data Mining, has his way.

Microsoft has come a long way in the data mining and predictive analytics segment, Farmer says, and with a game-changing Excel 2007 release under its belt -- and a promising SQL Server 2008 revision in the pipeline -- Redmond hopes to challenge established powers SAS Institute Inc. and SPSS Inc. for data mining and predictive analytic bragging rights.

"[We don't] have all the functionality of something like a SAS or an SPSS, because that's just not our market," he concedes. It comes down to a difference of scale, Farmer argues: SAS and SPSS typically target larger, more expensive deployments -- typically with users well-versed in the usage of their tools. Microsoft is targeting a different kind of data mining consumer: the Excel analyst, for example, who might not have much (if any) experience -- with data mining, predictive analytics, or statistical analysis for that matter.

"I was looking at the SPSS figures from last quarter that came through, and it was one of their best quarters ever. They added 16 new customers. Obviously 16 customers is a pretty good number [for a company like SPSS], but for Microsoft, 16 customers doesn't even cover my travel expenses for a year. Our market just has to be a much larger market," Farmer says.

"By the way, I don't mean to say we can't hit the high-end. Within Microsoft, we have our own database marketing team. We're one of the largest companies in the world. We have a huge database marketing team who do classic customer analysis. These guys were all SAS users, but when they joined Microsoft, they started using our tools. The entire process runs on our database, they actually use the Excel [data mining] add-ins to do it. It's not that there's nothing they don't miss, [it's that] they are able to achieve the same business results using our tools."

Last year, Microsoft released a data mining and predictive analytic add-on for its Excel 2007 product (see The add-on, which is similar to Microsoft's well-known SQL Server BI Accelerator products, integrates natively with Excel 2007. It introduces a new "Data Mining" tab that exposes several pre-built functions, including forecasting, accuracy charting, cross-validation, exception highlighting, category detection, key influencers, shopping basket analysis (the last is a SQL Server 2008-only function) and many others.

The key, Farmer argues, is that the transition from straightforward analysis in Excel to data mining is relatively seamless: a user has only to click on the Data Mining toolbar and select from one of several canned functions (e.g., "Accuracy Chart," "Highlight Exceptions"). The next step uses a wizard interface to walk a user through the rest of the process.

In other words, he maintains, it's organic: as far as the Excel user is concerned, who isn't even doing data mining.

"For [a function such as] 'Detect Categories,' what [the add-in is] doing is building a clustering model in the background [either on a local or remote SQL Server instance], but we don't expect the Excel user to understand that. We just [call it] 'Build Categories,'" Farmer explains.

"What this does is actually build a clustering model on the server. It finds the five most significant clusters, and then returns that to the user again in Excel, so the users gets that in a user interface that lets them understand what the clusters mean, just using standard Excel visualization."

Ditto for a function such as exception highlighting. "When I run 'Highlight Exceptions,' … we're actually building a clustering model [on the server] that looks for outliers from the data. But the user doesn't have to understand any of that. The idea here is to use standard Excel features to give them a sense that they already know how to do this."

Microsoft's Accuracy Charting feature lets users compare the efficacy of different data models. This can also be a boon to data mining hotshots, says Farmer, because they can compare the effectiveness of the models that they design vis-à-vis those which Microsoft provides out of the box.

Microsoft isn't leaving data mining experts in the lurch, he maintains. They can design their own custom functions and embed them in the Excel toolbar, Farmer says, as well as design (and refine) their own data models.

"The functions that we use are all public functions. We have not extended anything here. There are no private [i.e., proprietary] protocols. It's all publicly available interfaces, so, in theory, anybody could write this," he comments.

Surprisingly, Microsoft's SQL Server Data Mining Add-In isn't based on technology Redmond picked up via its acquisition of the former ProClarity Corp. two years ago. Instead, Farmer maintains, it's all homegrown.

"All of this has been developed in-house by the data mining team. There's nothing of ProClarity in there," he asserts. "It's starting to get very interesting, [because] now that we have ProClarity in-house, we're talking about how we [can] get these capabilities together. We're doing a lot of work behind the scenes on that."

Another important trend, Farmer points out, is cross-pollination between both the Excel and the SQL Server Analysis Services teams.

"I now have a couple of program managers on my team who came from Excel and who are now working with us designing Analysis Services tools. Suddenly we now have people from Analysis Services who've gone and joined the Excel team," he explains.

The takeaway, Farmer stresses, is that Microsoft doesn't hope to compete with SAS or SPSS on a feature or functionality basis; its angle, he insists, is usability: Redmond's Excel-based, wizard-driven approach to data mining and predictive analytics might lack some of the analytical heft of solutions from SAS or SPSS, but it's eminently more usable, Farmer contends.

On the programming side, too, Microsoft is making it easier for developers to expose data mining or predictive analytic functionality to non-traditional users (via portals, dashboards, or Web applications), according to Farmer. In this respect, programmers can use OLE DB or ADO.NET to embed analytic capabilities in their custom applications. Redmond's competitors are doing this, too, Farmer concedes, but few other companies can claim to be as developer-friendly -- or, for that matter, as developmentally popular -- as Microsoft. "What we've been saying is that in many ways we don't compete with SAS or SPSS. If we're being cheeky about it, we say, 'We're just doing interventions,'" he comments. "We're seeing a lot of interest in the Excel-side [data mining], for one thing, but we're also seeing [interest] in the embed-ability, too. The people who are actually pushing this are from the developer side," he says.

"We just did a 25-city road show in Europe. Everywhere we did this it was absolutely packed with developers. They're continually looking for new ways … [to] offer differentiating functionality to their internal users, and [our] message of not [having to] recode your business logic [to expose new data mining functionality] is really resonating with them.

"For a developer, capturing business logic is actually very difficult. Understanding the business case and hard-coding it into applications -- that's very difficult. What we're doing [with SQL Server-based data mining] offers them a way to do that with lower maintenance costs, and they're getting very excited about that."

Must Read Articles