Data Mining Gets Real

A few years ago, the concept of data mining was introduced to thecorporate world as a means to exploit the most valuable and often overlooked asset inbusiness: data. The term was vague, even mysterious, and early advocates sounded more like900-number psychics than technologists, with their unsubstantiated claims that nestledaway inside all of the data from all of the IT applications deployed by a company weresecret and valuable tidbits of information.

Early data mining efforts produced rather dismal results, which were quickly attributedby data mining advocates to erred methods in building a data warehouse. Detailedinformation had not been captured sufficiently, or had been lost through summarization andother flawed "data cleaning" or "normalization" efforts. Without thedetails, divinations of data mining could not be applied successfully.

Despite some rather harsh characterizations of early data mining efforts, the tidebegan to turn in the middle 1990s. Theoretical arguments about the potential value of datamining were supplemented by a growing volume of anecdotal, then documented, cases of datamining successes. Data mining suddenly got real.

Today, an entire market of products has formed around the technology of BusinessIntelligence (BI). According to Jackie Sweeney, Senior Research Analyst for InternationalData Corporation (Framingham, Mass.), data mining has "matured," producingfortunes for the "Big Three" vendors ­ SPSS, Inc., IBM and SAS Institute ­ androbust revenues for a number of smaller vendors who market solutions tailored to verticalmarkets.

IDC categorizes data mining tools as a subset of the information access tools market,which grew an average of 16 percent to $6.3 billion in 1997, according to Sweeney. Thedata mining tools segment of the market showed a very strong rate of growth, 69 percent in1997, passing the $500 million revenue mark in the same year.

Market growth has fueled the consolidation of vendors offering data mining and businessintelligence products and services. Says Sweeney, "Over the past few years, smallervendors have either been acquired by large players or have abandoned the marketing ofhorizontal data mining products, those that can be used in any environment and which placethem into direct competition with the Big Three, in favor of solutions targeted tospecific vertical markets. Now, many smaller vendors offer specific products that cater tosales and marketing, pharmaceutical research and testing, or to other verticalapplications."

At the same time, Sweeney reports, "a convergence is occurring between the twobranches of data mining techniques: statistical technology (neural networking andstatistical techniques) and rules-based technology (data mining based on rules inferenceengines)."

The result is a growing number of easier-to-use data mining tools, says Sweeney, and aBusiness Intelligence market populated by a variety of desktop and server-based products.

SPSS Goes Mining with Clementine

It doesn't take a crystal ball for Carolyn Calzavara to see that the market forbusiness intelligence products is growing. As Director of Marketing for Data Mining andBusiness Intelligence at SPSS (Chicago, Ill.), Calzavara oversees a market-leading arrayof Business Intelligence products and services. SPSS leads the pack in data mining,according to several research firms, owing in part to the company's acquisition ofIntegral Solutions Ltd. (ISL is a UK-based company with U.S. headquarters in King ofPrussia, Pa.) late last year.

"When we acquired ISL, we got their Clementine data mining workbench, which isbecoming the framework for our data mining solutions for the future," says Calzavara.

Clementine provides an interactive, point-and-click interface for modeling businessprocesses in order to solve business problems. It serves as a convergence point for thestatistical- and neural networking-based offerings from SPSS (including SPSS itself andthe company's Neural Connection product) and Clemetine's own data visualizationtechnology, which depends heavily on rule induction, according to Calzavara.

"Basically, Clementine is helping to change the attitude that many had about datamining in the past. It allows users to put their business knowledge to work to answer abusiness problem using both algorithms and query methodologies."

Calzavara adds that data mining is a subset of business intelligence in the SPSStaxonomy, "To SPSS, Business Intelligence is an overarching umbrella that includesonline analytical processing (OLAP) tools for slicing and dicing data, statisticalreporting tools and data mining products. Data mining tools differ from the othercomponents of Business Intelligence because data mining is used to generate more specific,in-depth, analytical models. Some business applications require a combination of thesetechnologies."

Smaller Firms Also Looking for Nuggets

Michael Gilman, President and CEO of Data Mining Technologies, Inc., would agree withCalzavara's characterization of data mining. He is unhappy with the casual use of the termby numerous vendors in marketing their non-data mining products.

"There is a basic educational problem out there regarding the meaning of datamining. While there are many definitions, the fundamental difference between true datamining and other techniques is data mining is the automated discovery of patterns of data.OLAP extracts data through queries. It is interactive and human-driven, thus it is notdata mining."

Gilman observes that true data mining is guided by a question, "but it is ageneral question; it is not a series of queries, it is autonomous data discovery."

True data mining, says Gilman, consists of the application of symbolic and statisticaltechniques to identify relationships in a database ­ "to discover rather than toconfirm trends or patterns in data and to present solutions in usable businessformats."

Unlike many smaller data mining vendors, Data Mining Technologies continues to offer ageneral-purpose data mining tool, Nuggets. Nuggets uses patented data learning algorithmsto sift databases and to identify underlying business rules, so they can be analyzed. APC-based system, it is designed to facilitate use by less-technical end users, accordingto Gilman.

According to Gilman, companies evaluating general-purpose data mining products need toconsider eight key factors in selecting an appropriate product for their application.

1. Does the company already have "results" and "input variables?"Results are historical data that serve as a basis for analysis. Dependent variables aredata elements that will be used to specify a data mining question, such as income level,number of credit cards that a customer has, etc.

2. Will data mining involve historical data that is numerical, nominal, or both? A gooddata mining tool should be able to handle both data types.

3. Does the data mining tool provide the capabilities to build initial models fromhistorical data and variables? Does it provide tools to generalize, predict and validateresults?

4. Can the tool handle the amount of data that the company wishes to use? Thiscriterion determines how large a problem the data mining tool can handle.

5. How does the tool handle missing data? Often, historical records are not complete.The data mining tool must be able to cope with missing elements.

6. Can the data mining tool handle noisy data? Historical databases may containconsiderable detailed information that is not rigorously cleaned or normalized. SaysGilman, "This is sometimes for the best, since the axiom 'garbage in, garbage out'does not always hold in data mining."

7. Can the tool provide the level of granularity sought from data mining? "Withstatistical and neural networking techniques," says Gilman, "small patterns inthe data often disappear or are overlooked. Rule induction methodologies deliver a muchfiner degree of granularity and are capable of discovering more relationships in the datathat may be of value."

8. How much technical knowledge is required to use the product? The data mining shouldbe conducted by persons with good knowledge of the domain or business area in whichresearch is being conducted. Gilman argues that statistical methodologies require endusers to learn new languages and acquire new skills before they are able to performbusiness-relevant data mining. Rule induction is a less difficult methodology to master.

Vertical Products

The last criteria has been seized upon by a number of smaller vendors who haveoptimized their products for specific industry segments. Trajecta, Inc. (Austin, Texas)offers three data mining products ­ Decision Optimizer (strategic decision-makingsoftware), CreditPRO (software and consulting designed specifically for credit portfoliore-pricing optimization) and Intellect Optimizer (software designed to optimize resourceallocation in the pharmaceutical industry) ­ based upon an underlying proprietary datamining engine.

Another example of a market-customized tool is Customer Care Suite from DataDistilleries (Amsterdam, The Netherlands). According to the vendor, the product is anintegrated suite of modules specifically developed for business users that enables theanalysis of customer data throughout the entire customer life cycle, "fromidentifying prospective customers to extending and maintaining customerrelationships." The product uses a general-purpose architecture developed by thecompany but features predefined algorithms and a graphical user interface intended to makedata mining more accessible to the marketing and sales professional within a business.

Other companies now offer data mining technology-driven products that are designed toprovide specific task-oriented solutions. For example, Relativity Technologies (Cary,N.C.) uses "knowledge mining techniques" embedded in its RescueWare product toaid companies in migrating legacy applications to client/server platforms. Data mining isused to ferret out underlying business rules in older COBOL programs so that "theycan be converted through RescueWare to re-useable components in COBOL, C++ or Java."

Computer Associates (Islandia, N.Y.) too, has harnessed neural networking data miningtechniques to implement "Neugent" (neural networking-enhanced agents) technologyfor systems monitoring and management. Says Steve Mann, Vice President of ProductStrategy, "CA views enterprise management as an information management problem.Systems monitoring produces a lot of data on assets that needs to be managed and mademeaningful in order to predict system events. Neural networking and data mining helps toautomate decisions about the thousands of metrics so that what is useful can be culledfrom the rest."

Mann reports that Neugent technology is being extended to the broader realm of businessdata mining. Currently, "One financial services firm is using Neugents in connectionwith a 30 million customer database to look for patterns affecting the results of variousproduct marketing strategies. Another entertainment company is using Neugents tounderstand customer preferences in regard to various entertainment offerings." Mannsays that a general-purpose data mining product from Computer Associates, based on theneural networking agent technology, should reach the market within 18 months.

The Corporate Oracle

The mention of data mining still conjures images of fortunetellers and tarot cards inthe minds of some IT professionals. Others view data mining as a low priority endeavorwithin the context of day-to-day mission-critical business system and network operations.

Today, it's clear the software tools used in enterprise technology management areborrowing from data mining methods to become more intelligent. IT pros owe it tothemselves to become more familiar with the current state-
of-the-art in business intelligence

About the Author:

Jon William Toigo is an independent author and consultant. He can be
reached at or visit