Data Access: Just One Aspect of Enterprise Information Integration

Remember virtual warehouses? Is that what enterprise information integration is about?

Enterprise information integration (EII) gets lots of attention these days as vendors rush to highlight the benefits of integrating federated data sources without the need to first load data into a data warehouse. Some EII proponents even claim that an EII solution can totally do away with the need for a data warehouse, and thus result in quicker implementations while eliminating the need for an additional data store.

Does anyone still remember virtual data warehouses? EII sounds a lot like one.

While an EII solution is based on combining and analyzing data contained in multiple data sources, the data integration aspect, while of obvious importance, is just one component of the solution. Many vendors offer strong tools for accessing and integrating data from disparate sources and creating temporary data caches that minimize the load on network resources. However, consolidating this data for analysis purposes is not just a matter of simple aggregation; it also involves knowing where all the relevant data is located and ensuring that the data is consistent in definition, value lists, and unit-of-measure. Integrating federated data sources requires the integration and understanding of the underlying metadata.

The ability to access and logically combine data from disparate sources, in particular operational systems, does not guarantee the quality or consistency of that data. For example, a company using an EII solution to determine sales of a particular item across all of its divisions needs to know which divisional databases to access, the product number that each division uses for the item (and never assume without first checking that each division uses the same product numbering scheme), as well as the appropriate unit of measure each division uses for measuring sales (e.g., U.S dollars, Canadian dollars, deutschmarks).

Among the basic properties of a data warehouse are that its contents are non-volatile and time-stamped. This allows for period-to-period comparisons and the discovery of trends. Direct access to operational systems, an EII mantra, means accessing volatile data that is normally not time-stamped. In an operational system the next transaction replaces the then-current value; in a data warehouse, a new time-stamped record is added. Furthermore, achieving data consistency and metadata integration across multiple data sources within a data warehouse are two of the data warehouse’s most important functions.

That said, an EII solution, when properly implemented (and when its limitations are understood), can offer true benefits to organizations that deploy them, especially if one of the federated data sources is a data warehouse containing historical data. EII can also be used by IT to produce “quick-and-dirty” prototype reports for the business community.

An EII solution is not an end in itself; rather it is part of an overall information architecture or enterprise information management framework. Despite some vendor claims to the contrary, organizations need to recognize that EII complements a data warehouse; it does not supplant it. At the very least, ask the vendor proposing an EII solution not just how it accesses the underlying distributed data but perhaps more importantly, how it keeps track of all the underlying data sources and how it ensures consistency when bringing the data from these sources together.

About the Author

Michael A. Schiff is a principal consultant for MAS Strategies.