Business Intelligence: Options for Handling Federated Data
One of the biggest challenges facing data warehousing managers is how to integrate islands of analytical data to deliver an enterprise view of the organization. Despite their best intentions, most companies have multiple data warehouses or data marts, each of which were separately designed and are difficult to integrate. According to a recent Data Warehousing Institute survey of 341 data warehousing managers, 52 percent of companies have two or more data warehouses. And 23 percent have more than four data warehouses.
Most companies eventually realize that they must integrate these disparate analytical resources. Often, they are driven by the need to reduce costs. This can be done by eliminating redundant data feeds, merging staffs or reducing expenditures on hardware and software by consolidating operations onto a single platform. In other cases, top executives want a single view of the truth – whether it’s customers or operations or finances – so that they can better steer the firm to growth and profitability. Companies that need to integrate data warehouses and data marts can pursue one of four options: 1) do nothing, 2) centralize, 3) federate or 4) virtualize.
Before launching a complex integration initiative, companies should recognize that they may not need to integrate all analytical resources. Typically, there are several types of analytical applications that can and should stand by themselves. Usually, these applications don’t share dimensions or facts with other applications, and extract data from specialized sources.
Examples might be a factory floor application that analyzes the quality of specific manufactured components, or a fraud detection application that continuously analyzes credit card transactions for anomalies. Integrating these applications creates unnecessary work.
Another tactic is to consolidate all analytical resources into a single, centralized data warehousing environment. This strategy works well in centralized organizations with top-down management structures because it often takes executive fiat to force various groups to share data, agree on data definitions and merge IT groups.
SBC Communications, a telecommunications firm, has used the centralized approach to grow its data warehouse from 200 GB in 1994 to more than 23 TB today. During the past seven years, SBC Communications acquired several telecommunications companies, most of which had data warehouses. After each acquisition, SBC merged the acquired data warehouses into an existing data warehouse.
Companies with a decentralized corporate culture whose executives can’t dictate tactics must take a more gradual, grassroots approach. Although these companies see the benefit of having a single version of the truth, they also see the advantages in giving business units who are closest to customers the leeway to define and attack their markets in a more autonomous fashion.
Rather than centralizing data, staff and operations, these companies are apt to centralize metadata. Each division or business unit downloads shared data definitions from a corporate metadata repository to keep their data marts in synch with the corporate view of the enterprise. If they want to maintain their own definitions, the repository or some custom code needs to translate between corporate and divisional definitions to preserve the integrity of the data.
This approach gives business units the freedom to maintain their own analytical resources and view of the world while contributing to the creation of an enterprise view of data. The key to a federated approach is having a robust metadata repository and toolset that can maintain and translate multiple sets of data definitions.
The virtual approach uses distributed data access tools with a global view of the business to assemble data on the fly from wherever it resides. This approach gives users the timeliest data possible, since there’s no need to wait for the data warehouse to be refreshed. It’s also valuable if the data resides outside the corporation, on the Web or in a customer or supplier database.
The virtual approach eliminates the time and money required to build and maintain a large data warehouse, and provides up-to-the-second data. However, this approach can bog down operational systems with lots of queries, and it doesn’t create an integrated set of historical, summarized data that is easily understandable to users.
The virtual approach is best deployed when a company wants to understand user query patterns before building a data warehouse. It is also helpful when users want to retrieve individual records in realtime, rather than perform a historical analysis. It’s best used when the volume of data to be retrieved is low and network connections are robust and reliable.
Wayne Eckerson is Director of Research and Education at The Data Warehousing Institute (TDWI). He can be reached via e-mail at firstname.lastname@example.org.