In-Depth

IBM Embraces MapReduce for Big Data Analytics

Officials say IBM's new MapReduce-based offering permits customers to tackle Big Data projects that involve petabytes of information.

At its Information on Demand (IOD) conference, held recently in Rome, IBM Corp. unveiled a new analytic offering that leverages Apache Hadoop, an open source implementation of the increasingly ubiquitous MapReduce algorithm.

Officials say the new offering, InfoSphere BigInsights, permits customers to tackle Big Data projects that involve petabytes (PB) of information.

Big Blue is on an analytic roll, ever since its announcement of a new Business Analytic Optimization (BAO) service last April. Since then, IBM has racked up analytic milestone after analytic milestone: after acquiring SPSS Inc. in July of 2009, for example, it unveiled a line of "Smart Analytics" appliances and management software.

Late last year, IBM unveiled Blue Insight, a new internal cloud service -- hosted by a massive System z10 mainframe and comprising more than 1 PB of data. At the same time, Big Blue likewise trumpeted the availability of private analytic cloud technology (under its "Smart Analytic" brand) for enterprise customers.

More recently, IBM fleshed out both its Cognos and TM1 performance management offerings with CFO-oriented analytic niceties and unveiled several industry- or domain-specific analytic offerings (including RAMP, a Real-Time Analytics Matching Platform for call centers).

An Enterprise Case for MapReduce

Big Blue's new Hadoop-based deliverable, InfoSphere BigInsights, is part of this thrust, says Bernie Spang, director of product strategy for database software and systems with IBM. "This is all about helping our clients reduce the costs associated with dealing with this huge volume and velocity of information and transactions they're dealing with," he says.

Hadoop is the Apache Software Foundation's implementation of the MapReduce algorithm (and of the related Google File System) made famous by Google Inc.

Since late-2008, several specialty database entrants (including both Aster Data Systems Inc. and Greenplum Software Inc.) have announced MapReduce implementations for their analytic databases.

More recently, Teradata Inc. also endorsed Hadoop, while analytic appliance pioneer Netezza Inc. has made noises about serving up a MapReduce facility of its own. At the same time, no one seems to agree on a silver bullet use-case for MapReduce in the enterprise.

Some proponents talk up its analytic potential in the context of Very Big Datasets -- of hundreds of terabytes or multiple petabytes -- while others champion its use as a kind of supercharged ETL facility.

Spang, for his part, touts several enterprise-ready use-cases for Apache Hadoop. "It's basically a spreadsheet UI approach to working with the data that you have access to and working with [this data] in the [context of the] Hadoop [file] system. It is bringing that familiar spreadsheet paradigm to folks but using it as a front-end to the Hadoop file system-based interface. That's one way we're using it."

The salient point, he insists, is that MapReduce permits shops to tackle hitherto unimagined problems of scale.

"This is a paradigm where you have billions (and potentially trillions) of rows of structured information being analyzed. That opens up a whole new area and set of information [for analysis]. In the Hadoop context, you can be talking about gathering and analyzing both structured and unstructured information," he notes.

"We're talking about giving … enterprise clients the ability to … bring together information from a broader set of environments, including across the Internet [and] across their own internal Intranet-based [networks]. How are you going to analyze that huge volume of information? Are you going to bring it all into a structured form and load it into your warehouse? That just isn't practical."

Must Read Articles