JDM API Promises “Data Mining for the Masses”
Proponents say the new standard includes algorithms for classification, regression, association, clustering, and attribute importance, and is designed for both novice and expert-level users.
Last week, the Java Community Process (JCP) Executive Committee announced the unanimous approval of Java Specification Request (JSR) 73—a.k.a., the Java Data Mining (JDM) API.
JDM API has been a long time coming: Development began in July of 2002, and the specification itself has been available for public review since October of 2002. Its supporters have never minced words about its intent: To enable ubiquitous connectivity among data-mining applications, much like JDBC did for relational databases. Several vendors have backed the JDM API, although Oracle Corp. leads the charge. Other prominent industry backers include BEA Systems Inc., Hyperion Solutions Corp., SAS Institute Inc., and SPSS Inc.
Supporters say that JDM gives developers a single API to write to to embed analytics in Java and non-Java applications alike. In addition, JDM is capable of leveraging other data-mining standards, including the Data Mining Group’s (DMG) Predictive Model Markup Language, the Object Management Group’s Common Warehouse Metadata, and the International Standards Organization’s Structured Query Language Multimedia.
"JSR 73 is an important step in enabling production data mining," said Jacek Myczkowski, vice president of data-mining technologies and life sciences with Oracle, in a statement. "Widespread adoption of Java Data Mining will bring data mining to the masses because developers can learn one API and embed analytics in any application, regardless of vendor."
So what kind of data-mining functionality does JDM API support? Proponents say the new standard includes algorithms for classification, regression, association, clustering, and attribute importance. What’s more, they note, the API is designed for both novice and expert-level users.
Recently, the data-mining space has seen a flurry of activity around open data-mining standards. This April, for example, IBM Corp. and SAS Institute announced proprietary extensions to the DMG’s XML-based PMML standard. Like JDM, PMML gives applications a way to define statistical and data mining models, and can also share models with other PMML-compliant applications.
Why introduce proprietary extensions to an open standard? “One reason is to support the richness of the SAS analytical algorithms, which go beyond what the current PMML specification can support,” said Anne Milley, director of analytical strategy with SAS, at the time. Milley points out that the proposed extension supports algorithms such as regression interaction and multi-way splits in the decision tree. “Our algorithms do that, so we went ahead and extended the way in which this PMML represents the model to accommodate these additional features that are important to our customers.”
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.