Microsoft Adds Data Mining to OLAP Services

REDMOND, Wash. -- When SQL Server 2000 ships, the OLAP Services within, formerly known as Plato, will have data mining capabilities.

Microsoft Corp. (www.microsoft.com) officials said here late last month at the SQL Server 2000 Reviewer’s workshop that OLAP Services will be renamed Analysis Services, and will include both the OLAP Services engine and the data mining engine.

Kamai Hathi, the program manager for SQL Server Analysis Services at Microsoft, says that Analysis Services is an evolution of Plato.

"We are moving beyond OLAP into a data analysis platform," he says.

Indeed, OLAP and data mining will be tightly coupled in Analysis Services. Data mining, for its part, is the ability to examine data by scanning samples of known cases and essentially picking up on patterns within that data.

Microsoft’s architect and development manager of Analysis Services, Amir Netz, says the data mining technology Microsoft built into SQL Server differs from most data mining products on the market.

Data mining in SQL Server is a feature, not a standalone product. According to Netz, standalone data mining products are so complex that it practically takes a Ph.D. in statistics to really know how to use them. "We build our analysis platform and integrate analysis tools," he says.

For instance, Microsoft built Analysis Services with standard APIs, on which the company expects developers to build.

Analysis Services also supports OLE DB for Data Mining. "In order to be extensible we have to support the OLE DB for Data Mining interface," Netz says.

Microsoft’s intention in supporting the interface is to make data mining a mass market technology. The OLE DB interface, according to Netz, reduces the cost and risk for users because one tool works with multiple providers.

Further, data mining will be integrated with RDBMS so users can build data mining models from within their RDBMS, train the models to work directly off their relational tables and, ultimately, perform predictions as relational queries.

Netz says there are a number of advantages to integrating data mining with OLAP. Relational data, for instance, is highly geared toward static reports, batch predictions fed into an OLTP system, and real-time singleton prediction in an operational environment.

OLAP, on the other hand, is geared toward interactive analysis by a knowledge worker, consistent and convenient navigational models, embedded semantics in OLAP for easy model creation, and pre-aggregations of OLAP that enable faster training time for Data Mining Models (DMMs).

Microsoft defines a Data Mining Model as a table. Thus, training a DMM is a matter of passing it data for which the attributes to be predicted are known. The DMM will not persist the inserted data but, instead, will analyze it to the build the DMM content.

DMM content is the patterns that the data mining algorithm detects in the data. The data is explored via an OLE DB schema rowset or predictive modeling market language, otherwise known as PMML, an XML string.

The resulting predictions apply the rules of a trained model to a new set of data in order to estimate missing attributes or values.

In building the data mining capabilities, Microsoft chose the SQL language with extensions. Netz says database administrators (DBAs) who know SQL will find the data mining model to be a familiar interface. "The goal is that every DBA and VB developer can become a data mining developer," Netz says.