Virtual Data Warehouses: Beware Traditional DW Issues
Data federating technologies—ETL, EII, and EAI—are the Next Big Thing, reports META Group, especially where unstructured content is concerned, but the virtual data warehouse’s biggest selling point—direct access to "fresher" source data—could be overblown.
As more and more organizations seek to integrate once disparate data sources into so-called “virtual” data warehouses, data federating technologies will emerge as The Next Big Thing.
That’s one conclusion of a new report from research firm META Group, which says that organizations are typically creating virtual data warehouses in order to realize cost savings and provide real-time access to heterogeneous data sources. What these companies fail to anticipate, a META Group researcher says, is that their virtual data warehouses could be torpedoed by the same problems that sink many enterprise data warehouse (EDW) efforts.
According to META Group, the market for data federating technologies—such as enterprise information integration (EII) tools—will approach that of batch-oriented ETL offerings by 2005. Through 2007 and 2008, META Group says, all classes of data integration solutions—including ETL, EII, and enterprise application integration technologies—will evolve to accommodate unstructured content sources. That’s a departure from the situation today, where EII is largely positioned as the best strategy for unifying structured and unstructured data.
There’s a lot at stake, META Group notes, as some IT organizations could devote as much as 60 percent of their budgets to application integration efforts. In this respect, the research firm finds, what EII firms are promising—including improved flexibility through the repurposing of old data sources; faster access to more accurate information; and cost reductions from reusing existing data sources—may be more than they can deliver.
According to META Group, the rub isn’t so much physical interoperability of data sources—whether it’s accomplished through JDBC or ODBC connectivity, or via canned adapters into SAP, PeopleSoft, or Siebel enterprise applications—but rather logical interoperability, which META Group analyst Charlie Garry defines as a “meaningful exchange of data.”
In this respect, Garry suggests, we have the problem of physical interoperability just about licked, but we’re just starting to tackle the issue of logical integration. “Through 2006, the major data integration issues will be how to identify which data is relevant, which data is accurate, and where it is located,” he writes in a META Group research note. “The three most frequently mentioned reasons for data warehouse project failures are weak sponsorship and management support, inadequate end-user involvement, and organizational politics.”
What’s more, Garry argues, the notion of the “virtual data warehouse” not only doesn’t address these issues, but actually compounds their difficulty.
In the virtual data warehouse model, he writes, the absence of authoritative, centralized control over the underlying physical data sources is at odds with the goal of logical interoperability. “Certainly, strong central management and governance of the enterprise information integration infrastructure would be a must just as it is for a physical enterprise data warehouse,” he writes.
Moreover, Garry says that many of the most frustrating tasks associated with the successful operation of an EDW—identifying operational stores, data cleansing, data transformation, and data reconciliation among source systems—are all required for virtual data warehouses as well. He also notes that the virtual data warehouse’s biggest selling point – direct access to source data, which is ostensibly “fresher” than in an EDW model—may also be overblown. “Our research indicates that even companies with so-called ‘active warehouses’ often need data to be freshened only every several hours,” he writes. “Very few data items require data freshness within time frames of less than an hour.”
Then there’s the issue of performance, where the conventional EDW typically enjoys major advantages in both runtime operation and tuning.
Not surprisingly, Garry and META Group recommend that organizations continue to build EDWs for the foreseeable future. Nevertheless, he writes, companies shouldn’t dismiss virtual data warehouses as too immature or high maintenance for all tasks. “The federated approach is still a valuable tool as part of an overall EII approach. Enterprise data warehouses are not built overnight … and IT organizations will always have a need to provide some type of bridge solution that enables users to gain access to some critical data in the short term,” he writes. “In addition, organizations may need to incorporate data that does not reside in the EDW or within the internal systems of the organization. In these instances … federated access is a great tool for extending the EDW.”
Stephen Swoyer is a Nashville, TN-based freelance journalist who writes about technology.