In-Depth
Architecture Anarchy and How to Survive It: God Save the Queen
Here it is, the new century, and amazingly enough, we’re still hearing raging debates about "top down" architectures versus "bottom up" construction approaches to building data warehouse and data mart systems. Those of us in the trenches had written this off as old news two years ago, and have been concentrating on building real business intelligence (BI) system solutions to solve real business problems with whichever architecture best fits the unique characteristics of the site. Those of us who actually build these systems for a living had long since recognized that both approaches are valid, both work and both end up at the same goal – the Enterprise Data Warehouse (EDW). But onward the debate rages, with the gurus surrounded by their polemists waging battle with speeches, white papers and surveys. The charges and counter-charges fly back and forth, with both sides claiming that theirs is the one, true, single way to build an EDW.
In the end, neither clan has served the industry well. Instead of a well-informed, educated market using both approaches where appropriate, we have a polarized world, with fanatical zealots clustered around their favorite guru, analysts spouting theory, vendors making their usual outrageous claims of certainty and the practitioner community largely left to fend for themselves in picking out the architectural pieces that truly work.
How can you sort out the vendor supplied fear, uncertainty and doubt (FUD) from the hyperbole from the scant facts? Even more critical, how can you wade through the variety of data warehouse/data mart architectures and ensure that your organization is well equipped for today’s needs and tomorrow’s challenges?
This article will examine the alternative architectures available, some of the market forces that are shaping the current and future BI architecture environment, and factors to consider when choosing your architectural path.
Let’s start by reviewing the basic choices in architectures and approaches.
Classic Enterprise Data Warehouse
The classic EDW is a hub and spoke design, consisting of a common and unique repository for enterprise information. It is a read-only environment made up of detailed and aggregated data that is fully cleansed and integrated, and includes extensive detailed history of transaction level data. Architecturally, the classic enterprise data warehouse delivered that priceless and elusive goal, a single version of the truth, primarily through subset, or dependent, data marts. The hub in the design is the EDW repository, the spokes are the feeds from the source systems to get the data in, and then out to subset data marts to deliver the information to the users.
The EDW is pure elegance in design, and is an easy sell to both technologists and the business. It’s a bit of a "no brainer" to approve the concept of getting all the data in one place, so you can deliver a "single version of the truth."
But, as in all things, anything that sounds too good to be true, probably is.
As it has turned out in the intervening years since the theory of data warehousing arose at IBM and was popularized, commercialized and merchandised by Prism Solutions, very few, if any organizations were able to achieve a fully realized dream of the über EDW. This was not due to any technical shortcoming, but rather due to non-technical issues that so often tripped up these teams and projects.
If the EDW could be constructed by a technologist team in a vacuum, there would be little problem in creating winning, high-impact EDW systems. Unfortunately, outside of a pure research environment, this is not possible. In the real world, EDW teams are caught in a maelstrom of competing political factions, business rule disparities, crushing deadlines, antiquated source systems, bewildering user demands, faulty tools and overwhelming cultural challenges. While there has been no technical reason to shy away from an EDW approach, the cultural, or "soft" issues have proven very challenging for the average IT team to overcome.
Chief among these is the "cross everything" nature of an EDW system. By its very name, "Enterprise" Data Warehouse, it implies and demands that the EDW team cross every political, functional, cultural, process, fiefdom, semantic, ownership, organizational, geographic, etc., boundary in the entire organization. The successful negotiation of this minefield requires a tremendous level of political acumen, one that is very, very rare in the typical EDW technologist team.
The classic, monolithic EDW has fallen into the chasm between the elegance of the theoretical vision and the political and skill set realities of implementation in the real world. While technologists are very good at a great many things, savvy political expertise and excellent communication skills are not anywhere near the top of the list of our recognized talents. Add to this mix the requirement for the team to be extremely flexible, 100 percent user-oriented, capable of living in constant change and able to constantly, endlessly resell and re-market the EDW system, and you end up with a set of challenges that very few teams are culturally outfitted to overcome.
Also, EDW systems require an absolute commitment of sustainable political will and resources at the highest levels of the organization. These systems often take a long time to develop, and may not demonstrate any tangible ROI for many years. In order to fend off competing initiatives, maintain resources and funding, and maintain the sponsorship and commitment of the organization, the project must enjoy sustained political will by the CEO/Managing Director level. The sustaining of political will, in the face of ever changing executives, ever shifting sands of corporate priorities and constantly growing demands for resources often proves to be impossible for an EDW team.
Apart from all of these well-known challenges, the EDW architecture remains an attractive choice for sites with long-term, sustainable, life-threatening pain at the CEO/Board level. In cases where the political and cultural factors required for success are present, it is a terrible mistake to throw away the opportunity to realize the upsides of an EDW implementation. The inherently integrated nature of the data in the EDW and the ease of propagating subset data marts make this the ultimate, if extremely politically challenging, path to the goal of the EDW system.
The EDW is the most common implementation architecture on mainframe systems, and is often used as a highly normalized DB2 repository for detailed transaction data, feeding subset data marts on UNIX or NT platforms.
Incremental Architected Data Marts
In an effort to realize the tremendous upside potential of data warehousing, but avoid the very difficult challenges inherent with the "top down" model, data warehousing teams have developed the "bottom up" approach to reach the goal of the EDW system. In the "bottom up" approach, an Enterprise Data Mart Architecture (EDMA) is developed to provide a context for development efforts. While it takes in the entire system scope at a high level, it is not as detailed as an EDW system architecture, so it avoids the "analysis paralysis" so common to those efforts.
Once the EDMA is complete, an initial area of business pain is selected for the first incremental Architected Data Mart (ADM). The EDMA is expanded in this area to include the full range of detail required for the design and development of the incremental ADM. Subsequent phases fill in the EDMA, until the team and the organization is ready to construct the EDW. It is typical that a modern, enterprise class Extraction Transformation and Load (ETL) tool is used to facilitate an "extract once, populate many" strategy of populating the incremental ADMs.
The incremental ADM approach uses common data staging areas and shared, or conformed, dimensions (customer, product, employee, etc.) to leverage resources across multiple development efforts.
An incremental ADM approach requires an EDMA to be successful. Non-architected incremental data marts are simply data silos that contribute little or nothing to the goal of an integrated information resource.
Incremental ADM system development also requires a common metadata repository, thus the requirement of an enterprise-class ETL tool. Developing the ETL processes manually is not a viable option in this approach for sustainable systems.
If there are numerous incremental ADMs being developed, the "bottom up" approach also requires excellent communication and coordination across multiple project teams. Again, communication talent is not at the top of our list of inherent skills as technologists, so multiple, simultaneous projects can be difficult for inexperienced teams.
Mainframe systems most often come into contact with incremental ADMs as members of a federated architecture. It is less common to construct a data mart on a mainframe system.
Federated Architecture
In most organizations, multiple teams undertake BI/DW projects, resulting in multiple data warehouse and data mart systems across the enterprise. Although in the strictest sense, there is only one Enterprise Data Warehouse, with all other entities being subset or incremental architected or non-architected data marts, not many organizations are as strict with semantics. Thus we have the majority of medium- to large-sized enterprises around the world with two, six, or a dozen or more "data warehouse" systems, plus scores to thousands of data marts. This proliferation of data warehouses has led to the next evolution of the EDW architecture, that of a federated data warehouse system of data warehouses and data marts.
The Federated Data Warehouse/Federated Data Mart (FDW/FDM) system is marked by the characteristics of sharing common data points between multiple data warehouse or data mart systems, thus eliminating redundancy, and ensuring a consistent and unique version of the "truth" throughout the organization. The federated architecture enables IT organizations to quickly adapt to changing business and system requirements, such as the requirement to implement a turnkey EDW from an Enterprise Resource Planning (ERP) system vendor.
By sharing key metrics and measures across systems, a federated system facilitates the realization of a single version of the truth, while remaining flexible enough to accommodate dedicated data feeds, multiple data warehouses and a wide variety of data mart and analytical application systems. With less than the ideal elegance of the pure EDW solution, the federated architecture represents the best option for most organizations, in terms of a practical, politically viable, real-world path to achieve the goal of an integrated information resource.
The chief challenges of a federated system lie in achieving semantic and business rule consensus between the stakeholders of the component data warehouse and data mart systems. Just as in an EDW project, this is often a fruitless and fatal endeavor.
Mainframe systems are common players in a federated architecture. They often host DB2-based reporting systems, Operational Data Stores (ODS) systems, and EDW systems for functional segments of the enterprise. Connectivity and data throughput in cross-platform data migration are common challenges that must be overcome in these scenarios.
Market Forces
There are powerful market forces that are shaping our BI architectural world.
One of the most powerful market forces is the proliferation of very low-cost, turnkey, non-architected data marts and analytical applications. Complete self-contained solutions, including domain-specific analytical applications, RDBMS, Online Analytical Processing (OLAP) server, ETL capabilities, server, end user tools, etc., are available for prices that are well within the signing authority of mid-level business managers.
These business managers, faced with pressing challenges, will not hesitate to purchase and implement these systems, regardless of enterprise architecture initiatives. Over the past few months, we have surveyed over 750 IT professionals to find if they could cite an instance when the business chose to delay or forgo the purchase and implementation of this type of targeted business solution due to architecture considerations. The score so far is business 750 and architecture zero. This does not bode well for rigid BI architectures that force all data and analysis to live within their domain and control.
A second major market force is the factor driving federated architectures: the multitudes of data warehouse and data mart systems in the typical enterprise. For example, at a fairly large U.S. company, we recently documented nine EDWs with a mix of top down and bottom up approaches, one ODS, one mainframe reporting database, and literally countless silo data marts and analytical applications. Interestingly, while the EDW systems were generally serving the needs of their various functional group stakeholders (depending on whether you asked IT, who thought yes, or the business who often thought no), the people at the top of the corporate organization were starved for integrated information, which was not available from this fine collection of silo EDW and data mart systems.
The lesson here is that building an EDW resource for one element of the organization and declaring it a success is not enough. In fact, this move is but one of many non-integrated EDW silos that are failing to deliver the "one version of the truth" that the CEO needs to move the business forward.
Another market force that is rapidly building is the Application Service Provider (ASP) market. It is estimated that in four years, 60 to 70 percent of new IT system initiatives will be delivered in the ASP model. It is critical that you establish an architecture that enables easy implementation of ASP BI system capability.
In the mainframe segment, ongoing favorable moves in price/performance continue to make this platform more competitive with UNIX systems in this regard. There remains a wide gap in availability of popular BI/DW tools in the mainframe environment, and this trend shows no signs of changing in the near- to mid-term.
How to Choose
To successfully choose which BI architecture is appropriate for your unique needs and characteristics, first consider the following key factors:
• Level of sponsorship. If it is lower than CEO, then you should shy away from a top down/EDW approach. You need long-term, sustainable political will for a top down scenario. If you have long-term pain and buy-in at the top level, then don’t miss the opportunity for an EDW.
• Organization size. If you are a mid-size company with top-level support, you are a good candidate for a top down/EDW system. Large organizations are federated by default.
• Time to market. If time is of the essence, you would be best suited for a bottom up approach, which minimizes many of the "cross everything" tasks that chew up a lot of time in an EDW system.
• Existing systems. If your organization already has multiple EDW systems or fleets of architected or non-architected data marts, your only choice is a federated route.
A full evaluation of the unique needs and characteristics of your site that drive an architecture choice involves many more factors than this space will allow. To examine more of these factors, there is a free automated assessment on choosing an architectural approach at www.egltd.com.
Your next step is to combine your unique characteristics with the information that is available in the market. Start by discarding the rhetoric from the warring clans. Visceral emotion replaced reasoned debate long ago among these players. Instead, look to disinterested third parties, especially among the practitioners that are not aligned with vendors with a survival stake in the "top down/bottom up" debate. People who work in the field can give you real-world, experience-based input on what architecture and approach is best-suited for your individual characteristics. Again, shy away from vendor employees, as they tend to identify problems that only their products can address, regardless of what is actually best for you, the customer.
Lastly, avoid theory and ivory tower musings and analysis. At best, these sources are far removed from your life in the trenches and at worst, they are working an agenda for their stable of vendor clients. Theory certainly has a place: We wouldn’t have airplanes without theory. But in the end, I’d rather have an experienced pilot at the controls than someone who can describe the theory of flight.
The downside to the current mix of BI/DW architectures is that there is no clear industry consensus, and fact-based decision making is nearly impossible due to the zealotry of the advocates of the various alternatives. The upside is that you have a solid variety of architectural options for your BI system. Properly matched to your unique characteristics, any of the options will bring you and your organization lasting, sustainable success.
About the Author: Douglas Hackney is President of Enterprise Group Ltd. (Hudson, Wis.; www.egltd.com). He can be reached at (800) 428-2005.