In-Depth
SAS Revamps Data- and Text-Mining Tools
The last fortnight has seen a flurry of activity in the once-sleepy text-mining space
The once sleepy text-mining space has been energized with new data- and text-mining announcements this week from SAS Institute Inc., making text mining seem like a suddenly sexy technology.
Kicking things off was IBM Corp., which donated its Unstructured Information Management Architecture (UIMA) to the open-source community. UIMA describes a powerful search technology that parses text within documents and other content sources to discover latent meanings, buried relationships, and relevant facts.
Big Blue also announced a new enterprise search tool, WebSphere Information Integrator Omnifind Edition, as well as a new partnership—involving SAS, Cognos Inc., and SPSS Inc., among others—to promote innovation on top of the UIMA platform. Speaking of SPSS, that company—long a dominant player in the information mining space—recently announced a new version of its flagship statistical analysis and mining platform, SPSS 14 (see separate story).
In this respect, SAS’ announcement of enhancements to its data- and text-mining tools is well timed (and forward thinking)—SAS’ revamped Enterprise Miner and Text Miner tools won’t start shipping until this fall. According to the company, both tools will provide new statistical and visualization options to help users uncover trends and patterns. Nevertheless, neither tool is available right now. Why the lengthy lead time? Did IBM’s UIMA announcement or SPSS’ new platform announcement factor into the company’s thinking?
Not at all, says Mary Crissey, data-mining and text-mining strategy manager at SAS. To be sure, Crissey says, SAS is on board IBM’s UIMA standards wagon, but her company insists there’s a lot more to information mining than supercharged enterprise search.
“We are participating in those discussions and we’re very anxious to participate in those discussions, but we don’t foresee [UIMA] being a [competitive] issue. IBM might be saying that with this one piece you can do a lot, but we’re trying to say that you still need to do a lot of data management [and] data access in addition to that,” she comments.
Instead, Crissey says, SAS believes that initial interest in enterprise search technologies such as UIMA will almost certainly encourage customers to consider best-of-breed search and information management tools, such as those marketed by SAS. “We like to say that we have the whole SAS Enterprise BI platform [to draw from],” she observes, noting that users can feed search or information-mining results to the SAS Enterprise BI Server, or—by means of SAS’ market-leading ETL tool—to other BI reporting systems. “Sometimes you may do a Google search on the Web, or use IBM’s new search tool to find these key words. Then you still might get a lot of information, so you can do a further scrub. You can put it through our processing tools also.”
Nevertheless, Crissey treads carefully on the subject of SAS’ relationship with IBM. After all, the Cary, N.C.-based BI giant has several long-standing partnerships with Big Blue. “I kind of see some of our offerings as complementary [to those of IBM]. They [IBM] stop in certain places that we carry on with, and they have some tools that might be a good place to start. But with SAS, we have so many tools that you can do so many different things with,” she says.
What’s New
The Enterprise Miner 5.2 release SAS plans to deliver this fall features improved Interactive data analysis and modification features, which Crissey says makes it possible for users to more easily identify data anomalies. New interactive data preparation features include support for interactive plots that identify outliers and specify valid ranges (for both interval and categorical variables); an interactive expression builder that lets users incorporate unique custom business rules, including interactions between factors; improved automatic decision-making features (based on customized profit-and-loss decision matrices), which now support finer resolutions of decision definitions.
Enterprise Miner 5.2 also ships with a raft of improved visualization options, including area bar, scatter plot matrix, lattice, parallel axis, and 3D charts.
One of the biggest new deliverables in Enterprise Miner 5.2 is an improved Web mining feature, which Crissey says can rapidly identify significant Web path sequences and track the navigational patterns of Web site visitors. This lets data miners include Web traffic behavior, time-series transactions, and market basket contents into their modeling processes, resulting in richer prediction models that can in turn identify highly complex customer behaviors.
SAS’ Text Miner 2.3—which is only sold with the Enterprise Miner tool—also boasts improved data mining capabilities. It offers macro support for the automatic creation of synonym lists from misspellings, including transposed, inserted, and deleted letters, acronyms, and embedded punctuation. It also supports Singular Value Decomposition, a proven dimension-reduction technique that processes "like terms" together. Still another improvement, says Crissey, is a streamlined setup process. Customers no longer have to manually enter industry-specific vocabularies.
Other enhancements include support for as many as eight languages (including Dutch, Italian, Portuguese, and Swedish, in addition to English, French, German, and Spanish). The revamped Text Miner can now handle more than a dozen document formats, and can analyze text culled from call-center notes, comments gathered in surveys, and blogs, among other sources.
SAS’ data-mining tools grew out of the statistical analysis space, which—in the popular imagination, at least—is populated with math whizzes, academics, and power users. Over the last few years, SAS has struggled to expand the reach of its data-mining tools by making them easier to use. The upcoming Enterprise Miner 5.2 and Text Miner 2.3 releases continue this trend.
“We’ve polished it more. Part of this thing is we’re trying to allow these really smart Ph.D. statisticians to communicate and show their insights that they’ve found to the decision-maker who’s got to run a daily business,” Crissey says, citing improved interactive graphics in the product. “There’s been this problem of carrying the results to where they need to be acted on, to business users or decision-makers, so we’ve put a lot of emphasis on making [these tools] easier to use.”
What’s Next?
Now that enterprise search is emerging as a sexy technology area, does SAS plan to trumpet its own expertise in this area? Crissey is noncommittal.
“We’re meeting with [SAS CEO] Dr. [Jim] Goodnight, and he’s going to be talking about the priority of where [enterprise search] fits. We’re seeing what the business market is doing. We’ve had a lot of interest in mining from financial customers and the medical field. They have lots of detail data. We have data coming from the medical industry, time-series data, [and] other textual information that can be combined with the structured data. We’re looking at our existing customers and seeing where it makes sense.”
About the Author
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.