Is a Data Mart in Your Future?

The question of the month is, "Is there a data mart in your future?" If you’re in an organization of any reasonable size, the answer is probably "yes."

The distinction between a data mart and a data warehouse has become generally accepted over the past few years. Most observers agree that a data mart is a decision support database with limited analytical scope. A data warehouse contains enterprisewide data, queried by users from different business units. The scope of a data mart is limited to supporting users of a single business unit or business function such as accounting, sales or manufacturing.

Several issues must be considered before implementing a data mart. Two of the biggest issues are the data mart architecture, and choosing between buying or building a data mart. In this column I’ll discuss data mart architectures, and in my next column I’ll discuss build-vs.-buy issues.

There are three data mart architectures: independent, dependent and federated. The independent data mart obtains its data directly from production source systems. There is no coordination of extraction and transformation logic, nor is metadata shared between data marts. Independent data marts often spring up under the guise of a short-term, temporary project. But like the Quonset huts of World War II -- which wound up serving as permanent housing for South Pacific residents for more than 25 years -- the "temporary" data mart often takes on a life of its own.

Since there is no synchronization of development, data and resources are often duplicated, resulting in higher overall costs. Data in different data marts is often inconsistent, because there is no common business metadata. Thus, a user may get two different answers, depending on which mart he or she queries.

The independent data mart approach can be problematic when an organization tries to consolidate data across marts or when an implementation develops scalability problems. In general, the long-term maintenance costs of an independent data mart can often outweigh its short-term benefits.

The dependent data mart, on the other hand, obtains its data from an enterprise data warehouse. The data warehouse is a repository of cleansed and validated data for decision support. Since a large data warehouse can be relatively unwieldy for end users to navigate, an IT department can spin off a data mart that departmental users can query and browse. A dependent data mart will often contain only summary data. If a user needs detailed data, the software can generate the appropriate SQL to retrieve it from the data warehouse.

A dependent data mart is a clean, consistent, uniform, enterprisewide analysis architecture. Data extraction and transformation costs are centralized and metadata is consistent. This architecture also provides flexibility of the data mart tool. The central data warehouse can be a relational DBMS, but dependent data marts can be a mix of relational databases, multidimensional cubes, or desktop cubes. This enables IT to deliver the most appropriate tool to meet the end users' needs.

The bad news is that you have to build a data warehouse first. That makes a dependent data mart solution relatively expensive, although there is an argument to be made that maintenance costs will be lower due to software reuse and metadata consistency.

The federated data mart combines the flexibility of an independent data mart with the discipline of a dependent data mart architecture. A common extraction and transformation platform is used to populate the various data marts. Metadata is stored in a common repository that ensures uniform data access keys for various dimensions of the target databases. Such consistency enables users to submit queries that bridge data marts, ensuring that the data will be meaningful and accurate.

Which approach should you take? That depends on several factors. If your company has distributed operations with lots of local autonomy, I recommend a federated data mart approach. If, on the other hand, the company has centralized control over data access, a dependent data mart approach would be practical. Regardless of the architecture you choose, be sure the transformation software and metadata repository used are supported by the tools you’ll need to build the data marts. My next column will look at buying vs. building a data mart. --Robert Craig is vice president of marketing at Web Engine Inc. (Burlington, Mass.), and a former director at the Hurwitz Group Inc. Contact him at rcraig@webengine-db.com or via the Web at www.webengine-db.com.