In-Depth

Productivity Ain’t Cheap: Exploring the Hidden Costs of Data Warehousing

Strategy Base from ClarusResearch (www.clarusresearch.com) reported that in the second quarter of 1998 global merger and acquisition activity for transactions greater than $500 million totaled $631 billion, as compared to a total of $810 billion for all of 1997 – which was also a record year. Ten of the largest mergers and acquisitions in history were announced during this period. Many of these transactions came from industries where changing regulations have presented the opportunity for offering new services and products.

Consolidation is seen as the best means for achieving quick time-to-market, but the real work for making previously independent organizations work effectively together hinges on IT achieving the data consolidation required for managing the business and its evolution. Layered on top of this are the dramatic changes e-commerce is bringing to the retail and service business. It’s enough to make your head spin.

Obviously, the ability to accurately assess a company’s performance and quickly retarget its efforts will depend on users having timely and flexible access to the information they need, and that will not be achieved without data warehouses. On the other hand, most large companies have had sufficient experience with implementing data warehouses to know that they are one of the hardest applications to design and implement – for a number of reasons:

• They require cross-functional cooperation on multiple levels.

• They are built on shifting sands in the sense that the operational systems from which they draw their information are regularly modified to accommodate changes in the business.

• The initial load of data warehouses often takes longer than anticipated because "secret schema changes" are encountered, which result in inconsistencies in the source data values.

• Retrieving the data needed usually involves one or more legacy sources or merging data from equivalent applications. In both cases, a significant amount of data transformation is required. With legacy systems, values must be transformed into user-friendly formats. With equivalent applications, it’s a challenge to achieve consistency in the way the same data is represented. In addition, a number of sort-merge steps are usually required.

• Likewise, these systems are frequently distributed geographically, thereby complicating the task of maintenance.

However, there are a number of other reasons that warehouses have proved more costly than previously anticipated.

Why You Keep Paying

When you consider the large number of technical problems encountered in designing and loading data warehouses, it’s clear that companies did not design many of their operational systems with maintenance in mind. What’s interesting is that many companies are finding that they did not develop their initial data warehouses with the maintenance cost in mind.

It missed the mark. Warehouses allow business users to track trends in key business indicators, variable behavior across sub-populations of their customer base and the success of marketing or sales initiatives. Companies use various techniques to determine what they should be tracking in their initial warehouse effort, with a goal of having the initial project show significant return on investment to justify further initiatives. Occasionally, however, the initial project is a bust. For example, one banking customer’s initial project was a warehouse that tracked customers’ use of their services. It took about six months to get the warehouse designed and implemented; after six months of use, management decided that nothing of value had been learned and canceled the entire initiative. In this case, there were no ongoing costs, but one wonders if the proverbial baby was thrown out with the bath water.

Data marts don’t scale. In an effort to avoid the "big bang that turns out to be a fizzle," other companies have opted to use a data mart approach to reduce both the time and cost of getting something in the hands of users. While this approach reduces the initial risk, if one or more data marts are successful, they can fall victim to their own success since management and users may not understand the need for IT to step back and retool in order to scale. While data marts are frequently implemented as department-level projects, if a company is going to maximize its benefit by warehousing, it must insure that:

• The same rules for representing identical information are used across all data marts/warehouses.

• Refresh programs are maximally efficient by acquiring all the information required in a single pass of each source system.

• CPU cycles are maximized.

The hardware costs have been underestimated. OLAP products compute indices that provide quite a turnaround on queries that slice and dice information in a variety of ways. Over time, the creation and storage of these indices can become hardware-intensive. As a result, it is important for companies to track what information the warehouse users are actually accessing in order to reduce the support of unnecessary indices. Likewise, data elements, which are not being accessed over some period of time, should be dropped from the warehouse.

In order to make these adjustments, the warehouse strategy must include tools and/or strategies for monitoring warehouse use – and ideally (whether through subjective or objective criteria) the business value of the information that is being used. In either case, over time it is likely that a company will want to change the warehouse’s structure and organization. Without a strategy to minimize the cost of these changes, the company faces a hard choice – live with spiraling hardware costs or the cost of maintaining the current warehouses while re-architecting for future efficiency.

It needs to change. Finally, warehouses must evolve as quickly as the business environment. If a company merges with or acquires another organization, two disparate warehousing strategies must be consolidated. If the two companies were highly similar, then it is likely that the data warehouses in use are fairly similar in the information they were tracking. On the other hand, it is equally as likely that the representation of the same data values are not identical, and that the warehouse consolidation effort will resolve these inconsistencies.

If the companies were complementary – for example, one in telecommunications and the other in cable – then the warehousing strategies are more likely to have been different, and part of the task will entail reflecting the new product and marketing strategy that actually drove the change in organizational structure. In either case, the company will be spending more on IT for the short term instead of less, thereby reducing any benefits from an economy of scale for some period of time.

Finally, if a warehouse has been effective, it will have provoked as many questions as provided answers. A good part of the goal of the warehouse is to help management discover what they might have otherwise overlooked. Often the pattern that suggests a trend must be verified with new data from operational systems or even data obtained from external sources.

What about data mining? The vision behind data mining is that your warehouse contains important information that lies outside any query results that come from management’s current assumptions and that by looking for statistical patterns between related values, a company may gain important new insights. Or a company may spend a lot of computer cycles and find nothing of interest. In either case, such activity requires significant hardware resources above and beyond those required to support the average business user.

Building Your IT Infrastructure

The idea of most, if not all, strategic IT initiatives is that after an initial investment, the company will operate with greater efficiency. In other words, the assumption is that the cost of the initial implementation will be greater than maintenance. But in the connected world of intranets and extranets, that assumption does not necessarily hold since every change must percolate through to any related system in order to maintain data consistency.

GartnerGroup claims that 35 to 40 percent of all programming efforts is devoted to developing and maintaining programs to transfer information between databases. Dain Rauscher Wessels estimates that $85 billion was spent in the manual integration of applications in 1998, and that this figure will grow 30 percent by the end of 1999.

However, while one can argue that implementing a set of ERP applications may reduce the cost of maintenance, such arguments were never the motivation for implementing data warehouses. The data warehouse is an application whose goal is to enable discovery, and the very term "discovery" implies that there are no guarantees. Just as many experiments in scientific research do not result in breakthroughs, it should be expected that some warehousing costs would be incurred with very little benefit. The hope is that when something is discovered, it will be significant enough to more than pay for all the wrong turns.

When considered in light of the technical difficulties in building and maintaining warehouses, there is the potential for data warehousing to be one of the most cost-intensive IT initiatives a company can undertake. Consequently, it is critical that a company chooses the appropriate mix of product and methodology to minimize the cost of the change cycle. In fact, an excellent exercise for fleshing out the appropriate requirements for any particular organization is to develop several likely change scenarios and walk through the steps required to adjust the warehouse accordingly. If the combination of products and methodology don’t support timely response to change, then you probably don’t have the right combination of products and processes.

For all the risk and cost involved, however, a company’s data warehouse initiative may prove to provide the best means for determining what an organization’s IT infrastructure should be in order to support the maintenance of the distributed, heterogeneous networks that characterize today’s volatile business environments. And if it can help an organization achieve this, it may prove to be one of its most beneficial undertakings.

Metadata – The Obvious?

One key component to any effective warehousing initiative is a sound metadata strategy.

Large companies often have 100,000-line COBOL copybooks or 400-page schema layouts. When database definitions of this complexity are coupled with the "secret schema changes" that have been made over the years to avoid having to reorganize historical data, it is simply easier – and less error-prone – to discover what you don’t know in the process of implementing a new project. However, what you want is to maximize the benefit of having incurred that learning curve.

If the products that you use capture a full metadata audit trail of what you have done and support metadata interoperability, then over time you can accrete a full information map of not only the database layouts of each system, but the interrelationships between systems. With this information, you can accurately perform impact analysis when something changes and accurately anticipate the cost of supporting it. As a result, the foundation of any sound warehouse methodology is a strong metadata strategy, for it is the key to containing costs moving forward.

About the Author:

Katherine Hammer is President, CEO, co-founder and Chairman of the Board of Evolutionary Technologies International (Austin, Texas).

Must Read Articles