Data Mining Growth Reflects Strategic Mission

IDC predicts a 23 percent annual growth in the data warehousing tools market. This article presents strategies in data mining, developed by IBM and SAS Institute, as well as up-and-comers NeoVista Software and HNC Software.

Not so long ago, data mining was little more than a gleam in the eye of the information technology industry; a promising technology in search of a killer app. These days, however, the technology increasingly is being seen as a strategic business tool that enables organizations not only to manage statistical functions, but to boost marketing and customer care activities as well.

The data mining story is one of promise and possibility, colored by the search for a mission. The process of gathering, cleaning, amassing and evaluating reams of data from diverse sources initially promised to grow into its own specialty. When the technology’s initial growth rate initially could not be sustained, however, venture capitalists withdrew.

The fallout in data mining technology occurred because it started as "a technology in search of a problem," one analyst said. "It coincided with the Internet Gold Rush," said Brian Murphy, an analyst with the Yankee Group in Boston. "People got carried away in the hysteric moment."

Specifically, companies that rushed out with a generic data mining algorithm failed, said Teresa Wingfield, senior industry analyst with Giga Information Group in Santa Clara, Calif. In contrast, firms that offered vertical applications brought new insights into resolving problems, such as insurance and credit card fraud.

Today, data mining is making its presence felt as a subspecialty of the business intelligence field. International Data Corp. (IDC) is bullish on the technology, predicting that the data warehousing tools market – which includes data mining – will grow by 23 percent annually, reaching $8 billion by 2002. According to Dataquest, IBM held the largest market share in the data mining industry in 1997 at 14.2 percent, followed in the top five by Information Discovery, 9.0 percent; Unica Technologies, 8.3 percent; Cognos, 6.7 percent and Silicon Graphics, 5.5 percent.

Key players in the market offering comprehensive solutions include IBM, SAS Institute and two up-and-coming Silicon Valley companies – NeoVista Software, Inc. in Cupertino, Calif., and HNC Software Inc., in San Diego.

Vendor Strategies

As it approached the data mining market, IBM discovered two types of customers – the highly technical statistician and/or programmer with deep mathematical skills, and the other half of the market that is customer-driven. That latter half is comprised primarily of marketing department representatives. IBM’s strategy is to make data mining more palatable to companies that have no mathematicians or statisticians on staff.

The Intelligent Miner for Relationship Marketing is designed to help marketing professionals discover niche markets, capture the most profitable customers, identify targets for cross-selling opportunities, catch customers before they leave and win back profitable customers who’ve left.

"In some cases, it’s a strategy, and in other cases, we’ve been forced to learn from our customers that the ease-of-use factor is more important than the data mining tool itself," said Dan Graham, Global Strategy and Operations Executive for IBM Global Business Intelligence Solutions in Somers, N.Y. As a result, IBM uses a browser-based graphical screen on top of its Intelligent Miner solution. The Intelligent Miner family uses analytical software tools to help customers identify and extract such value-added information as customer buying habits, hidden relationships and new trends.

The latest in IBM’s series of solutions aimed at targeted industries’ marketing departments is DecisionEdge. It applies new algorithms from IBM Research to insurance companies’ enterprise data. It lets those companies gain a comprehensive view of their customers and helps determine how individual customer relationships can be maintained and enhanced.

Two insurance companies – Farmers Group Inc. and Switzerland-based Winterthur Insurance – announced this past November that they were implementing DecisionEdge for relationship marketing. Intelligent Miner also has specific data and text applications. Intelligent Miner for Data Version 2.1 searches for hidden information stored in traditional files, databases, data warehouses and data marts. The newest version of the Intelligent Miner includes an improved user interface, increased parallelization, new platform support, statistical functions, a neural net value prediction technique and optimization of algorithms.

The Intelligent Miner for Text 2.1 uses data mining to gather information from text documents and data sources, such as e-mail, Web pages, customer correspondence and online news services. The tool identifies a document’s language, gleans patterns from text, clusters similar documents in groups, categorizes documents by content and builds a dictionary of names, terms or other vocabulary. It includes an advanced text search engine and a Web text search facility.

To ensure that its products work at maximum efficiency, IBM sends consulting teams – one based in Dallas, the other in Paris – to guide its clients, including well-known retailers, such as Safeway. Indeed, implementing data mining solutions with no guidance can produce "horrendously bad results very fast," Graham said. "You can get results that sort of make sense, that cause a great deal of grief."

SAS Focuses on ‘Open Box’

Cary, N.C.-based SAS Institute Inc. offers up an "open box" technology that automates the data mining process and makes it available to the user at every step so the user understands how the answers were generated, according to Mark Brown, the company’s Data Mining Program Manager.

Brown believes some of the now-defunct data mining companies relied too heavily on algorithms that claimed to either be new or replacements for traditional techniques. SAS instead stuck with tried-and-true predictive modeling techniques, and includes in its solution a complete range of algorithms: decision trees, clustering, neural networks, data mining regression and associations.

That makes sense for SAS, which has been building predictive models since it was founded 22 years ago. The company now has the largest installed base for analytic processing in the analytic software market. "We think data mining is part of a larger process – customer relationship management," Brown said. The relationship requires that an organization learn from its customer and vice versa. The company uses the compiled data to the customer’s advantage, such as identifying its most profitable clients. The customer then provides the data mining company with more data, which leads to more detailed business intelligence, such as the best time to call those clients at home. The result is any marketing department’s Holy Grail – a one-to-one marketing approach. "One of the technologies that will allow them to get there is data mining," Brown said. "It’s the next step up from traditional database marketing.

In broader terms, it’s moving from account management, otherwise known as product-centric marketing, to customer-centric marketing."

SAS’s solution, Enterprise Miner, features a graphical user interface that automates the company’s data mining process of sampling, exploring, modifying, modeling and assessing (SEMMA) the data. Enterprise Miner virtually eliminates manual coding so business technologists, quantitative experts and IT professionals have an easy-to-use solution that doesn’t sacrifice analytical power.

NeoVista Technology Approach

Of the new players, NeoVista Software, Inc. (Cupertino, Calif.) approached data mining as a technology, not as a desktop-oriented algorithm. That means its solutions are business intelligence-driven and target vertical industries. For about 10 years, the company focused on delivering high-performance, highly parallel, mission-critical pattern recognition solutions to the defense, science and intelligence communities. Projects ranged from matching gene sequences for the Human Genome Project to MPEG decoding for post-production studios.

NeoVista switched to software and commercial applications in 1996. Its Decision series software suite ferrets out non-obvious relationships in corporate data warehouses. Release 3.0 makes building business models easier with DecisionAssistant and adds three new data mining engines, as well as individual case factor analysis to the tree induction and neural network mining engines. The data mining engines include DecisionKmeans, a clustering algorithm that lets companies market their products or services to previously unidentified customer segments; DecisionBayes, a predictive algorithm that is based on probability theory; and DecisionCubist, a predictive algorithm that is based on regression trees and replaces probabilities with regression equations.

DecisionAssistant also provides native database connectivity for Oracle, Informix, Sybase and DB-2 relational databases. These relationships are incorporated into real-world applications that improve operational efficiencies and customer intimacy. The Decision series is the basis for NeoVista’s Retail Decision Suite (RDS), which uses point-of-sale data to determine the most attractive products to offer and at what time of year in a way that is unique to each store. The process hinges on a wide range of variables.

In the retail market, for example, NeoVista takes into account such things as the weather, a company’s competitors and product histories to generate insights. The store uses the information to order products when they’re most in demand. Wal-Mart, a NeoVista customer, saved $1 billion in inventory and increased sales 12 percent in 1997, according to the June 8 issue of Discount Store News.

Another feature of DecisionAccess is target dependency analysis (TDA), which identifies suspicious attributes, ranks attributes based on their predictive power and determines optimal split points for continuous attributes and groups for categorical attributes. These functions help automate the model building process and cut computing requirements.

Two new products in the Retail Decision Suite are RDS Assort and RDS-Profile. The assortment tool evaluates the performance of each product in every store in a chain, groups the stores that have similar preferences, and determines which products will sell the best, and in what relative quantities, said Judson Groshong, Vice President of Marketing for NeoVista Software Inc.

The companion RDS-Profile targets just-in-time delivery by determining how much of each product to offer at certain times so stores keep just the right amount of inventory. The tool also can be used to choose which products should be promoted together so one can be put on sale. In addition, NeoVista targets the insurance, financial services and retail banking industries. Its applications pinpoint such problems as "premium loss," in which parents of teenagers take out auto insurance policies without claiming the child, thus fraudulently lowering their premiums, Groshong said. In the banking industry, data mining tools can determine which customers with credit cards and money market accounts are most likely to want a home mortgage. "The ability to predict behavior is critical," Groshong said.

HNC Focuses on Real Time

HNC Software, Inc. touts its ability to retrieve and mine data in real time. Its original foray into data mining centered on credit and debit card fraud, but it has since expanded into card management and lending.

HNC’s data mining product generates extensive reports that give managers detailed results, and includes a relational online analytical processing tool (ROLAP) so managers can get ad hoc statistics on their own. Its offerings include Falcon, ProfitMax and SelectMarket Profiles. Falcon is a neural network-based system that examines transaction, cardholder and merchant data to detect credit card fraud. It uses predictive software techniques to capture relationships and patterns that traditional methods miss. The Falcon Expert subsystem lets fraud managers define and deploy rules to automate fraud prevention procedures.

ProfitMax provides transaction-based, real-time authorization and action decisions from within a complete infrastructure to manage credit card portfolio profitability. It uses neural networks, expert rule bases and HNC’s cardholder behavior profiling technology to analyze cardholder accounts and predict future profits.

The profit evaluation is customized to the using issuer’s definition of financial profit. SelectMarket Profiles is a predictive customer relationship management tool that uses transaction data to segment cardholder accounts for targeted marketing programs. It is based on high-volume, real-time customer transactions and unearths numeric data and textual information from cardholder transactions. The information lets marketers optimize retention, build cardholder balances and cross-sell other bank products.

As the possibilities of data mining continue to be unearthed, users must dig through the various strategies and products that have risen to the surface.

About the Authors:

S.D. Rodefer and Lane F. Cooper are freelance writers. They can be reached at

Must Read Articles