Standard for Sharing Data Mining Models Falls Short

Predictive Model Markup Language gives applications a way to define statistical and data mining models and to share them with other PMML-compliant apps

Like it or not, data mining has long been viewed as the provenance of pointy-headed wizard types versed in the inner workings of products from SAS Institute Inc., SPSS Inc., and others.

It’s an unfair stereotype, to be sure, especially in light of the work that these companies have done to make their products easier to use. Now that there’s an XML-based data mining standard—dubbed the Predictive Model Markup Language (PMML)—that gives applications a way to define statistical and data mining models and to share models with other PMML-compliant applications, it’s a stereotype that may finally be put to rest.

Wayne Thompson, product manager for Enterprise Miner with SAS, says there’s a clear need for a standard like PMML. “A lot of times someone will develop a predictive model and they may hand translate that model into another language, like COBOL, or perhaps into C, for deployment into another operational system. They may develop a model in SAS, for example, and want to deploy that directly in the database, which in this case (with IBM) is DB2,” he explains. “[PMML] provides them kind of an interoperable code translation of the model that can be deployed across these various systems without recoding, and in turn reduce human risk. So it’s a more foolproof way of automating a manual process.”

PMML is a good start, but it’s also a work in progress. That’s why two purveyors of widely-used data mining solutions—SAS and IBM Corp.—say that it may be necessary to deviate from the PMML standard—at least until version 3.0 is approved. The two companies have developed a set of PMML extensions that they say are designed to help SAS’ Enterprise Miner interoperate more effectively with IBM’s Intelligent Miner and DB2 database.

The appearance of proprietary extensions has sounded a death knell for many a hopeful standard, but in this case Thompson is emphatic that neither IBM nor SAS intends to circumvent the PMML standards process, which is managed by the Data Mining Group (DMG), of which both IBM and SAS are members in good standing.

“A lot of these extensions that we proposed are being adopted for the next release [of the PMML standard],” he asserts.

So why offer the extensions now, before the next version of the PMML specification has been approved? Anne Milley, director of analytical strategy with SAS, says that the two companies are responding largely to customer demand, particularly from the cutting edge financial services community.

In this regard, she says, the PMML extensions developed by IBM and SAS make it possible for customers to automate the exchange of predictive and descriptive models in their operational systems. “One reason is to support the richness of the SAS analytical algorithms, which go beyond what the current PMML specification can support,” she comments, noting that the extensions IBM and SAS have proposed support algorithms such as regression interaction and multi-way splits in the decision tree. “Our algorithms do that, so we went ahead and extended the way in which this PMML represents the model to accommodate these additional features that are important to our customers.”

Why do IBM and SAS believe that the DMG will approve their proposed PMML extensions? “These are the kinds of things that the Data Mining Group recognizes should be added going forward,” asserts Milley.

Adds Thompson: “These extensions that are algorithmic-related are being adopted, and other vendors that want to use the SAS model, too, that are specific to our language—they’re working on adopting these, too.”

In addition to customers, Thompson says the two partners will make their PMML specification available to database vendors. “It’s primarily the database vendors that are doing this, because they’re the ones that have the scoring engines. So we’re actually making our specification available to these vendors, and many of them are working on it at this time, and when they get closer to supporting the PMML specification that we produce, there’ll be an announcement for that,” he confirms.

About the Author

Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.