In-Depth
Aster Data Doubles Down on MapReduce
With the recent announcement of three dozen new SQL-MapReduce functions, Aster Data's message is clear: MapReduce is a boon to analytics.
Criticism of MapReduce has always concerned its analytic applicability. The argument reprises one against grid computing: yes, the technology's neat, and it's doubtless powerful for certain applications, but of what real use will it be in the enterprise?
Some MapReduce supporters agree with this argument, outlining very specific MapReduce use cases that -- critics allege -- have the effect of pigeonholing the technology.
For example, recent MapReduce convert Teradata Corp. -- which supports the open source Hadoop MapReduce implementation -- has described MapReduce as a kind of ETL on steroids.
It's a viable use case, and it certainly describes an enterprise (if not an analytic) application. According to Steve Wooledge, senior director of marketing with Aster Data Systems Inc., it's nothing less than weak tea. Aster markets nCluster, a massively parallel, columnar analytic database that implements MapReduce.
"I think there are basically two camps [of proponents of MapReduce]. There's the camp of people who will never architect MapReduce in their architecture because it's just too bloody hard, and then there are people like us who say MapReduce can fit in well with a relational data store and produce some additional additives," he observes.
"Look at one of the examples where [a competitor] will say, 'We have a connector built for Hadoop!' What this means is that they're going to use Hadoop for a lot of the data preprocessing to get it in a structured format and then load it into their warehouse for business analysts to run queries against. From our perspective, this isn't making effective use of [MapReduce]."
Admittedly, Wooledge's take is self-serving. Aster and rival Greenplum Software Inc. were the first analytic database players to deliver a native (i.e., database-level) implementation of MapReduce. (Both vendors announced native MapReduce on the same day, within hours of one another.)
Both vendors likewise position MapReduce as a distinct and disruptive technology -- with a distinct (and undeniable) analytic aspect.
By contrast, Wooledge argues, the growing ranks of MapReduce-come-latelies -- including such luminaries as IBM Corp., Netezza Inc., and Teradata Corp. -- are effectively co-opting MapReduce. According to Wooledge, these vendors tend to promote non-analytic MapReduce use cases (such as ETL on steroids) that are less likely to disrupt their existing business models.
He cites Aster's delivery this week of more than 1,000 analytic MapReduce functions as a clarifying contrast.
All told, Aster announced almost three dozen new SQL-MapReduce functions (available as part of its "Aster Analytic Foundation" library) and more than 40 MapReduce-ready "automatically parallelized" packages -- which collectively comprise more than 1,000 MapReduce functions -- that are available in either Java or C.
Wooledge positions the former as "business analyst-ready" offerings; the latter, he concedes, are geared more toward developers or (from Aster's perspective) "power users."
Aster's new SQL-MapReduce functions support activities such as path analysis, relational analysis, clustering analysis, text analysis, and (of course) statistical analysis. New nCluster-powered analytic applications include location analysis (via an offering developed by Aster partner Cobi Systems); call detail record and/or netflow analysis; and fraud detection.
Wooledge points to the examples of Aster reference customer MySpace, which is using MapReduce to power its advanced analytic efforts.
"What MySpace is doing is … looking for ways to optimize their site to improve usability for end users. They look for the way people navigate the site and they do a lot of multivariate testing: [they identify] if there are certain paths through the site that become dead ends or certain components or logic that aren't being used. This way, they can make decisions about how to optimize the site. They have a lot of SQL-MapReduce that they're using [to do this]."
Other applications for MapReduce include sessionization -- a use case that rival Vertica Inc. is also targeting -- and usage tracking.
Aster will continue to double down on MapReduce. Wooledge cites the technology's popularity among developers, who like MapReduce because they can program for it in the (supported) language of their choice. He also points to the efforts of partners that are building advanced analytic applications on top of Aster's nCluster analytic database.
One such partner is Fuzzy Logix LLC, which develops "large libraries of analytic functions for financial services companies," according to Wooledge. "There's a lot of sort of data application developers that want to build solutions for customers on top of these [MapReduce] platforms, because it enables them to build applications more easily, because they can program them in common languages like Java, or [using] standard data mining desktop tools," he explains.
"Our partners are also looking to build more packaged applications that are reusable [and] resalable across many industries. They want to take the platform much further. There are other partners like SAS [Institute Inc.] where we're doing work with them around pushing down SAS functions into our platform as well."
Wooledge declined to elaborate on Aster's work with SAS, although he conceded that the ongoing collaboration between the two companies "is similar to what they're doing with Teradata." Two years ago, SAS and Teradata announced an ambitious effort to run SAS analytics in the context of the Teradata database.