TDWI Report Examines Data Warehousing and Private Clouds

It's time to consider data warehousing in the clouds as a viable option.

It seems as if it's still too soon to be talking about data warehousing (DW) in the clouds, but a majority of data management (DM) practitioners are already thinking about the technology, and many are already planning "private" or infrastructure cloud deployments -- i.e., internal clouds that are hosted and managed by IT itself.

The upshot, says industry veteran Philip Russom, research director for data management with TDWI, is that it's time to consider data warehousing-in-the-cloud as a viable option.

"[I]nfrastructure clouds are already entrenched in IT as successful platforms for virtualizing application servers. It's just a matter of time before DW and BI stores move onto some kind of cloud," writes Russom in a recent TDWI Checklist Report, Consolidating Data Warehousing on a Private Cloud.

"Users who depend on DWs need to control this migration by considering database clouds that are optimized for DBMSs and therefore are capable of handling a wide range of database workloads, plus handling them concurrently, out of the box, and with little or no tweaking," he writes.

"There's a new need for the consolidation of diverse BI databases," Russom told BI This Week in a telephone interview. "The rise of advanced analytics in recent years has driven many user organizations to deploy more analytic databases, data marts, operational data stores, and data staging areas outside of their enterprise data warehouse environments. In another trend, many organizations have deployed departmental systems for BI during the recession because departmental budgets were often more fluid than the capitol budgets that fund most central BI programs. Instead of consolidating these new BI databases into traditional BI and DW platforms, some users are contemplating future-facing platforms, which include different types of clouds."

Certainly, DM practitioners are aware of cloud computing. In a November, 2010 TDWI survey, for example, more than half (53 percent) of respondents said they'd prefer to host their data warehouses in a private cloud context; only 7 percent expressed a preference for public clouds.

Awareness doesn't translate into deployment, of course, but more than one-fifth (22 percent) of respondents said they plan to use a private cloud specifically for data warehousing. "Your peers in other organizations are planning their migrations to private clouds. You should, too," Russom writes.

DW-in-the-cloud makes sense on several levels, he continues. Cloud-based DWs make it easier to consolidate and effectively tune database systems for different kinds of data warehousing and business intelligence (BI) workloads. They should also make it easier to respond to changing capacity requirements, chiefly by exploiting the elasticity of the cloud paradigm to rapidly provision and de-provision new DWs (with workload-specific tunings) as the need arises.

"There are many types of database workloads commonly used in DW and BI, each with its own technical requirements. The diversity of data workloads is a challenge because it's so difficult to design and optimize a single database to run diverse workloads optimally. Database clouds that allocate platform resources based on load characteristics promise to meet the mixed workload challenge," Russom writes. "[C]onsolidating multiple BI data stores into a single DW database is challenging because of the mixed workload problem. A more realistic and modern approach is to consolidate into a private database cloud."

Consolidation should likewise help simplify data management. "When disparate databases are consolidated into a private database cloud, it greatly simplifies data governance, stewardship, master data management, metadata management, and some forms of data quality and data integration," he argues.

Russom lists several use cases that could also benefit from being shifted into the cloud, including bread-and-butter tasks such as reporting ("A cloud can help the DW scale to growing and diversifying report workloads," he points out) and analytics: "Most DWs are optimized for reports, not analytics. Hence many organizations are taking their analytic workloads to various kinds of clouds."

"A bulk data load differs from a real-time trickle feed, although both workloads commit data to a database. Most DWs are optimized for the former, not the latter. So users often build a real-time operational data store ... as a data staging area outside the DW to handle workloads for real-time data," he explains.

The issue is that ODSs of any kind tend to proliferate. New ODSs are typically commissioned in response to new workload challenges; existing ODSs are rarely decommissioned. "This proliferation of BI data stores leads to a heavily distributed DW architecture that's difficult to change, optimize, and govern," Russom notes. Data-warehousing-in-the-cloud can help restore some degree of manageability to this chaotic state of affairs.

"A private cloud can accommodate multiple workloads better than traditional, distributed DW approaches. As DW workloads start up and shut down, the cloud provides generous processor and storage resources to ensure processing speed and volume scalability," he concludes. "The cloud can recover and reallocate these resources efficiently as workload processing ceases or as temporary DW structures -- such as data marts -- are no longer needed. A private cloud has similar advantages for business intelligence ... platforms for reporting and OLAP, where the number of reports and concurrent users varies unpredictably."

Must Read Articles