The Case for Data Warehousing-in-the-Cloud

Data warehouse players seem to have their heads in the clouds. Are customers ready to follow them? For some applications, the answer is yes.

What do the purveyors of cloud-based data warehouse (DW) services know that the rest of us don't?

Taken as a collective whole, the BI and DW industry seems to have cloud on the brain -- but the head-in-the-clouds effect is particularly pronounced on the DW front. After all, there's an ever-expanding base of cloud-based DW services, and on Monday, DW software specialist Greenplum announced its first cloud-based service, dubbed the Enterprise Data Cloud (EDC) initiative.

Greenplum describes EDC as a "bold" vision for self-service DW and analytics, touting the applicability (and ideal nature) of the cloud model to host and manage data at a petabyte scale. Other recent cloud adopters include illuminate Inc. -- another DW-centric player (see http://www.tdwi.org/News/display.aspx?id=8876) that touted cloud-readiness as part of last week's illuminate 4.0 launch; ditto for Infobright Corp. and its recent Infobright 3.1 release. Other recent come-to-clouders include, Sybase Inc. which announced an IQ-based analytic cloud offering; Aster Data Systems (touting that Coremetrics is running its nCluster software in a cloud configuration); Kognitio (which has been talking up data warehousing-as-a-service for more than a year), and, Vertica Inc.

Elsewhere, BI players such as Pentaho and SAS Institute Inc. have also shot for the clouds, announcing, respectively, the availability of a cloud-ready version of their software, or -- more ambitious still -- a significant investment in cloud infrastructure in the form of a $70 million dedicated cloud computing facility.

One reason that DW players have been out in front in the cloud computing market is that the cloud model – which promises customers utility-like access to computing capacity at guaranteed service levels (and which lets customers do what they want with that capacity – to the point of bringing up multiple operating system or application instances) – is well suited to the DW model. The DW appliance, for example, is effectively a single-purpose realization of the cloud vision – without, that is, the flexibility (in terms of virtualization and provisioning capabilities) offered by cloud services.

Among DW providers, Vertica arguably got there first, introducing its first cloud-based offering early last year. Vertica officials touted the first-gen Vertica-on-the-cloud which ran on the elastic compute cloud (EC2) cloud computing service from Amazon.com.

At the time, Vertica talked up a number of potential drivers, with data warehouse prototyping, the ability to create inexpensive proofs-of-concept, and quick-and-dirty problem solving topping the list. Fast forward one year -- to last week, in fact, when Vertica announced version 3.0 of its Vertica Analytic Database for the Cloud -- and Vertica officials are trumpeting basically the same benefits.

"To be perfectly honest, that's where most of the traction is," said Dave Menninger, vice president of marketing and product management with Vertica. "We do see commercial traction in other [arenas]. Software-as-a-service companies and other emerging vendors, for example, don't want to invest in the [DW] infrastructure, so they go out and effectively outsource their IT infrastructures. These are production [deployments], as opposed to proof-of-concepts," he continues.

Most production deployments, Meninger continues, take the form of what might called quick-and-dirty problem solving: one-off or timely projects that -- because of the inertia that's endemic to any organization (IT or otherwise) -- are easier to develop and deploy in the cloud.

"We have household-name financial service companies [that] use the cloud for short-lived analytic projects. You have this bulge in your deployment requirements because of short-lived stuff, where it just costs too much to [provision and] deploy resources for this stuff [internally]," he explains. "Also, you have stuff [workloads] that results from acquisitions -- that's another scenario where we're seeing outsourcing more on a temporary basis than on a permanent basis."

Aster Data recently touted a big-name cloud win of its own -- ShareThis, a developer of a so-called "viral" sharing tool for Web 2.0 services such as Facebook, Digg, MySpace, and others.

Although ShareThis is an intriguing case study -- it uses as the analytic backbone its multi-TB data warehouse -- it's hardly an example of a conventional enterprise use case: on a monthly basis, it collects about 1.5 TB of Web traffic, using nCluster to report against and analyze that data to yield information about trends, topics, traffic, and content to Web 2.0 service providers. It's an intriguing -- and staggering -- example of the new norm in DW scalability: to the extent that ShareThis offers historical insight to its customers, it's necessarily dealing with volumes of digit in the double- or even triple-digit TB range.

Broader adoption of DW or BI in the cloud for conventional (production) requirements might well hinge on the appearance of enterprise-specific or "private" clouds -- i.e., cloud implementations that are deployed and managed inside the enterprise firewall by the DM group itself. Vertica and other vendors talk up the private cloud as a legitimate model, noting that their cloud-branded offerings can just as easily be deployed (and managed) internally as externally. Meanwhile, big name services providers such as IBM Corp. (see http://esj.com/articles/2008/11/24/ibm-offers-cloud-computing-help.aspx) and Hewlett-Packard Co. (see http://esj.com/articles/2009/04/02/hp-cloud-assurance.aspx) also market enterprise-oriented cloud service offerings.

The vision of a private cloud in every enterprise is still far from a reality, however.

The cloud computing craze is still in the early phases of its adoption cycle. Recently, market-watcher Gartner Inc. suggested that the general cloud-computing model -- which includes both public cloud services and private (internal) clouds -- won't become a mainstream phenomenon for at least another five or six years at the earliest. To that end, Gartner talked up the eventual emergence of a "service-enabled application platform," or SEAP.