In-Depth
BI Experts: Validating Your Data Warehouse Direction
Technology changes are helping us analyze growing data volumes. Our data warehousing architecture must remain flexible, and frequent validation of user requirements and data usage will help us achieve this goal.
A primary goal when implementing data warehousing and business intelligence solutions is to support and enhance our organization’s decision-making capabilities. Many years ago, this consisted of determining the most pressing needs and collecting the data needed to perform the requisite analysis. Unfortunately, it was impossible to satisfy everyone's requests and review processes were established to set priorities.
Two of the criteria (unless the requests were from the CEO or VP of marketing, but that's another story!) that were frequently used to set these priorities were cost/benefit analysis and ease of implementation. For example, a simple low-payback request that could be implemented relatively easily was often fast-tracked over high-payback requests that required a major effort. This was especially true if the required data elements were already resident in the warehouse since adding additional data elements usually made a request more complex.
Many complex requests involved time-consuming evaluations to obtain accurate costs and benefits. Most early data warehouse practitioners and/or their (sometimes irate) users can cite examples of business opportunities lost due to the perceived "non-responsiveness" of the data warehouse team.
Fortunately, technology and economics have combined to now make it possible to satisfy requests that in the past would have been difficult (if not impossible). Speedier analysis greatly increased thanks to enhanced computational power, in-memory technology, column-oriented databases, and massively parallel processing. Vast amounts of unstructured data can now be analyzed thanks to technologies such as Hadoop and MapReduce. Data warehouses originally contained summarized data, but now it is not unusual for organizations to store it at the transaction level. Most organizations recognize the value of establishing an environment of complementary data warehouse platforms rather than a single point solution; does anyone still dispute the value of special purpose data warehouse appliances?
Although technology has improved our analysis capabilities, it has also increased the information that may need to be analyzed while decreasing the timeframe to accomplish it -- for example, cross-selling to Web site prospects before they leave your Web site.
Few, if any, data warehouses meet everyone's needs when first implemented. Furthermore, new data sources (social media in particular), require immediate analysis to quickly resolve potentially viral negative comments. One of the keys to satisfying new analysis demands is to establish a data warehouse environment that meets current requirements and is flexible enough to likely be able to meet unanticipated future requirements. In today's rapidly changing world, it is more important than ever that we take steps to continually validate our data warehouse direction relative to current and anticipated needs.
We should also reconsider past requests that were previously not feasible. Although many of these requests are no longer relevant, some of them probably still are. However, how many organizations have a process in place to review past requests to see if they are now feasible?
We must be careful not to become satisfied with the status quo. We should continually poll the user community to better anticipate their future needs and be prepared to quickly meet these needs when they become important. Although positive user feedback is encouraging, negative feedback, especially when accompanied by constructive criticism, can be invaluable.
We should monitor data and report usage to see what our users are really using and what no longer seems to be of value and can be removed or archived. We need to ask our users what additional data would benefit them and if the data we currently collect is at the right level of detail. We should keep a catalogue of worthwhile (but previously not feasible) requests and periodically revisit them to see if they can now be addressed
No matter what the future may bring, it is safe to assume that our data warehouse environment must be able to quickly react to a wide variety of analysis needs. We should not forget that data warehousing is an ongoing voyage, not a one-time point destination. Organizations must establish a data warehouse direction that enables their members to take the side trips that may be necessary to satisfy currently unknown analysis requirements. Flexibility should be a major goal of our data warehouse architecture. Frequent validation of user requirements and data usage will help us achieve this goal.