Crunching the Numbers on Big Data

What’s so big about Big Data? It’s probably bigger than you think.

For some time now, the specialty data warehouse (DW) has been a technology proposition in search of a market. DW appliance pioneer Netezza Inc. -- acquired by IBM Corp. in mid-2010 -- hemorrhaged cash for the first few years of its existence. (Netezza enjoyed its first yearly profit 18 months after its IPO.)

Ditto for DATAllegro Corp., which Microsoft Corp. picked up 28 months ago. At the time of its acquisition, in fact, DATAllegro's customer base consisted of less than a half-dozen shops.

Something happened in 2007 and 2008, however. The specialty DW found itself a market opportunity: Big Data.

Two years on, it's now obvious what's so big about Big Data: it’s bigger than you think.

The Backstory on Big Data

The traditional data warehouse -- powered, in most cases, by a commercial, off-the-shelf (COTS) database package -- just isn't up to the task of crunching Big Data. What's needed is either a row-based data store powered by massively parallel processing (MPP) engines, or -- even better, according to some -- an MPP-based columnar data stores.

Notionally dispassionate industry analysts even concede as much. "The analytical techniques and data management structures of the past no longer work in this new era of big data," concluded Wayne Eckerson, former director of education and research with The Data Warehousing Institute (TDWI), in TDWI's recent Big Data Analytics Checklist Report.

Not only has the DW space seen an influx of new competitors -- companies such as Aster Data Systems Inc., InfoBright Corp., ParAccel Inc., and Vertica Inc. -- but computing giants Oracle Corp. (which introduced Exadata, the Oracle Database Machine, specifically to target Big Data), Microsoft (which tapped DATAllegro to power a massively parallel version of its SQL Server database), IBM Corp. (which acquired Netezza), and Hewlett-Packard Co. (which positions its Neoview platform as a Teradata-like DW replacement) have also jockeyed to reposition themselves. Over the same period, DW mainstay Teradata Corp. went public and introduced its first branded DW appliances.

Factor in additional acquisitions (storage giant EMC Corp. recently picked up DW specialist Greenplum Software Inc.; applications giant SAP AG nabbed Sybase, which markets a respected columnar analytic offering in Sybase IQ); plenty of new start-up activity (e.g., Algebraix and VectorWise); an explosion in MapReduce-oriented activity (with IBM, Netezza, Talend, Teradata, and other vendors embracing the open source Hadoop framework); agitation and innovation on the part of both established (Kognitio) and rising (Kickfire Inc.) players, and 'the result is a fractious Big Data marketplace.

The Numbers

Just about every vendor in the analytic DW segment has tried to make Big Data its own. Aster Data has been particularly aggressive; through late 2009 and all of 2010, it convened a series of "Big Data Summits" in several North American cities.

Like Hadoop World and -- of course -- Teradata's annual PARTNERS conference, Aster Data's Big Data events tend to draw customers interested in using high-end DW technology to tackle Big Data problems of scale. In Aster Data's case, it claims to have drawn about 800 such customers, most of whom participated in a survey concerning their Big Data needs.

The results of this survey, which Aster Data published recently, are interesting, for a variety of reasons. For example, a surprising percentage of shops are interested in using Aster's nCluster and other Big Data platforms to accelerate Social Network Analysis, or SNA.

BI and DW players have been talking about SNA (in one form or another) for a long time; many have likewise described SNA as something that -- while it's of great interest to customers -- isn't yet widely adopted.

That could change. Big Data adopters, at least, see the value of SNA: almost one-sixth (15 percent) identified it as a "key business opportunity," according to Aster Data's survey. This isn't surprising, says Aster Data CTO and co-founder Tasso Argyros, who suggests that SNA -- which, by definition, is dealing with enormous data sets -- is an especially promising Big Data technology.

"It's [a matter of] education, [of] awareness, [of] imagination: when [customers] see the things they can do with [Aster Data's nCluster], this naturally leads them to think about other potential uses," he told BI This Week during a meeting at TDWI's Summer World Conference in San Diego.

In the past, most discussions of SNA were relatively limited -- e.g., sentiment analysis. That's changing, too. These days, shops are looking at SNA as a means to support activities such as graph-based analytics (which promises to enrich an organization's understanding of its customer base) or to assist in understanding user behaviors or identifying key influencers.

Elsewhere, a full one-sixth of respondents say they're interested in tapping Big Data to improve behavioral targeting or boost the efficacy of their advertising efforts. Survey respondents specifically highlighted the importance of "establishing links between purchasing behavior and areas such as advertising spending to better tailor budgets and promotional campaigns."

The biggest anticipated use of Big Data (or of Data Analytics, as Aster Data now describes it) is a surprisingly fuzzy one: almost one-third (30 percent) of respondents expect to use it to "discover new market opportunities."

Call it the Magellan Effect: would-be adopters believe that a deep "exploratory analysis" of data will yield a "big business insight" that (in turn) helps drive new revenue- or income-boosting initiatives. From this, Aster Data officials conclude that the "role of data scientists is key for today's data-driven organizations."

Other expected uses include monetizing data (cited by 15 percent of respondents, who hope to use Big Data to optimize ads and improve "purchase to abandonment" ratios) and fraud detection or risk profiling.

Many of Aster's Big Data event attendees are already grappling with Big Data problems of scale. For example, one-fifth of respondents identified complex query processing performance as "a big obstacle," while almost one-third (30 percent) cited the poor scalability of traditional DW systems -- or the prohibitive cost of scaling traditional systems -- as a pain point.

Officials say that the needs of such customers help underscore the opportunity for Aster Data and other Big Data-oriented players.

"This is where we excel," said Argyros. "We're designed to address specifically this kind of Big Data [use case], where you're analyzing real-time and historical data in the same [context]. Ten years ago, you just couldn't do this -- the technology didn't exist. Now you can bring this real-time [data] into the warehouse and you [can] use MapReduce to analyze it [along with] historical [data]. You don't want [your managers] to be making decisions without seeing the data in context. The historical [data] gives you context."