In-Depth

Informatica Touts a DI On-Ramp to the Clouds

Informatica's PowerCenter Cloud Edition will support Amazon's Elastic Compute Cloud (EC2) along with Salesforce.com and Web services.

Some BI/DW vendors aren't waiting for cloud computing to catch on. Despite Gartner's prediction that cloud computing probably won't be ready for production work until 2015 (see http://esj.com/articles/2009/02/10/computing-in-the-clouds-enterpriseready-by-2015.aspx), many vendors are acting now.

Informatica Corp. last week announced its intent to deliver a cloud-ready version of its PowerCenter data integration (DI) software. Informatica's PowerCenter Cloud Edition, which officials say will ship in September, will support Amazon.com's Elastic Compute Cloud (EC2) along with Salesforce.com and Web services connectivity. Informatica already offers a version of its DI software for Salesforce.com; in this case, it's the explicit cloud branding -- and support for EC2 -- that are new, officials say.

"The capabilities that we addressed over the last couple of years [with Informatica on Demand] really address the reliability and performance issues that are typical in an environment like that," says Girish Pancha, executive vice president and general manager of data integration with Informatica. "We've created this [PowerCenter Cloud Edition] technology to really treat data in a cloud environment. There was no significant data integration offering [in this space], so this is the first of its kind to be offered up."

That isn't strictly true. A number of vendors -- including DI players Pervasive Software, Cast Iron Systems, and (to a less ambitious extent) Talend -- offer cloud-oriented data integration services. Furthermore, cloud-oriented DW vendors such as Aster Data Systems, Greenplum, Kognitio, and Vertica offer data migration services of varying sophistication. That being said, Informatica has a credible claim to the cloud DI avant garde: it introduced its first cloud-oriented DI service (for Salesforce.com) nearly three years ago.

A Turnkey Appliance

The forthcoming PowerCenter Cloud Edition aims to be a turnkey appliance, with built-in DI and replication features, along with -- in iterative stages -- change data capture (CDC), data quality, data profiling, and other data management (DM) capabilities. (Informatica expects to deliver CD, DQ, and similar features within about six months of PowerCenter Cloud Edition's debut in September, Pancha says.) Not that there's any rush, Informatica officials concede: right now, enterprises are making halting ascents into the cloudscape, largely confining their experimentation to test and development or tactical use cases. Pancha, like other cloud futurists, expects that to change.

"Ultimately, I wouldn't case these [use cases of today] 'dirty,' but they're definitely kind of the tactical department-level-one-off-type solutions that are getting put up there, so it's not necessarily moving your enterprise data warehouse to the cloud, but smaller subsets of that," he indicates.

"I think it's just a matter of time. As more and more data goes outside the enterprise into these applications, there will be a point where larger and larger portions of your enterprise analytics will get up in the cloud," Pancha continues, "so the offerings that we have initially [as part of PowerCenter Cloud Edition] do include some of our real-time capabilities, which you can use to solve the synchronization scenarios. Over the course of the next six months or so … we plan to pretty much get as much of our software offerings to work on EC2 as we can, so that for pretty much any data integration use case -- for things like data quality or lifecycle management use cases -- they all can be implemented."

Licensing for the Cloud

Pancha and Informatica talk up a new cloud- or EC2-friendly licensing model, too. Call it a la carte licensing, cloud-style.

"One of the things that we see with some of these tactical uses … is that … they use data integration infrastructure [software] somewhat sporadically; in a lot of these [cloud] use cases, it's just batch jobs that run every so often. We are introducing effectively a pay-as-you-use pricing, which is actually [priced] by the hour," he explains. "Using the Amazon payment infrastructure, our customers would effectively use a credit card, so to speak, and pay Amazon. We would work with Amazon to get prepayments through Amazon."

Clients can also opt for traditional perpetual licenses, Pancha says. "The idea here is that if you are going to use it a significant percentage of the time, probably around 50 percent of the time or more, it would be more cost effective to buy a perpetual [license]. If you're going to be using it sporadically, it' s a lower cost of entry to buy it by the hour."

Amazon's EC2 service is emerging as a popular platform for would-be BI and DW cloud vendors. Columnar DW specialist Vertica last year introduced a version of its analytic database software for EC2; SaaS DW player Good Data also pushes its DW platform-as-a-service (PaaS) offering in conjunction with EC2.

Some cloud players claim that infrastructure-as-a-service (IaaS) offerings such as EC2 are the most (or only) viable model in the still-gestating cloudscape. Others disagree. Dyke Hensen, chief marketing officer with SaaS BI player Pivotlink -- which pursues a build-your-own cloud infrastructure strategy -- cites a recent McKinsey & Co. report (Clearing the Air on Cloud Computing) that he says offers a good dollars-and-cents comparison of EC2 versus traditional (i.e., build-your-own or internally hosted) cloud services.

The point isn't that building on top of EC2 in particular is cheaper than building your own, Hensen maintains; it's that -- at this point -- the cost of outsourcing one's BI or DW infrastructure lock, stock, and smoking MPP node is prohibitively expensive regardless of platform.

"I do expect that EC2 and others … will continue to drive down the costs and come up with more innovative pricing models -- it is the future," he points out. On the other hand, he argues, the cost of cloud compute capacity hasn't come down far enough yet -- on EC2 or other services. "There is a reason why you don't see major corporations running large production systems in BI running on EC2 today and it's not about [capabilities like] dynamic provisioning -- its about the costs of CPU consumption and storage."

Must Read Articles