Metadata Management
Metadata is the foundation for an effective data-centric information system. In most corporate environments metadata is used both within a system and between systems. There is little difficulty with metadata management when the metadata is homogeneous -- that is, it is created and managed by a single application or by software from a single vendor. This is the case with ERP packaged applications, such as those from SAP AG., Baan Co., PeopleSoft Inc. or Oracle Corp.
Two popular scenarios lead to heterogeneous metadata, which results in complex and cumbersome metadata management. One is where a company deploys a decision-support database, such as a data warehouse. The other is where it deploys multiple applications. Many companies, of course, do both.
The warehouse environment contains three major categories of metadata. There is the metadata associated with the decision-support database. This metadata describes the database structures such as tables, columns and partitions, as well as security settings and operational information. The second category of data warehouse metadata is used by the end user to navigate the database. A query and analysis tool, such as BusinessObjects from Business Objects Inc. or PowerPlay from Cognos Corp., usually creates and manages this metadata. The third category is the metadata created by the back-end extract/transformation tool that's used to move data from the source systems to the data warehouse. This metadata is primarily concerned with source data definitions, transformation logic and source-to-target data mappings. These tools also must be concerned with process scheduling, maintaining data integrity and error management.
In the application arena, companies often deploy applications from different vendors, each with its own proprietary metadata. In fact, in many instances you'll find multiple applications from the same vendor that can't share metadata. This can be because different product groups have different development life cycles or, more commonly, the applications may be the result of a merger or acquisition.
A few vendors, such as Platinum Technology Inc., OneMeaning Inc. and Sterling Software Inc., are tackling this problem. I also expect several new product announcements before the end of this year from other vendors delivering metadata management solutions. These products have various mechanisms for exposing, integrating and sharing metadata across platforms and tools. They will primarily depend on moving metadata from the popular tools and databases into a specialized database. Some are designed to take the next step and insert metadata from one tool into the catalog of another tool, in its native format, thus spoofing the tool into thinking it created the transplanted metadata.
As these tools come to market, users will benefit from standard interfaces that will enable them to review data lineage and assess data quality from their desktop query and analysis tools. Users will also benefit from the ability to use different query and analysis tools, all sharing a common set of business rules and database schema definitions. IT managers will be able to spend less time manually synchronizing metadata between back-end extraction and transformation tools and decision-support databases. Application implementers will be able to exchange key business rules such as discount schedules, commission structures or product specifications.
Those of you who are regular readers of this column may wonder why, given my enthusiasm for the Microsoft Repository, I don't advocate it as the metadata repository. I do think that we eventually will see Microsoft Repository become the de facto metadata repository standard for both transaction processing and decision- support solutions. However, the metadata management tools I'm describing don't depend on Microsoft Repository for two reasons. One is that Microsoft Repository is not yet mature enough to meet their needs. The other, frankly, is that the vendors don't want to provide Microsoft Corp. with any incentive to play in their sandbox and take over into this emerging market niche.
The first generations of these new tools don't solve all the metadata management problems that are plaguing database administrators and end users. For example, they don't deal with aggregate management, which is one of the most pressing issues for data warehouse DBAs. Aggregate management is focused on identifying candidate tables for aggregation, as well as aggregate tables that aren't being used. Another issue they don't address is query management, such as blocking or rescheduling queries that will consume extensive resources or that will take a long time to complete. I'll describe some of the tools that address these issues in a future column. --Robert Craig is director, Data Warehousing and Business Intelligence Division, at Hurwitz Group Inc. (Framingham, Mass.). Contact him at [email protected] or via the Web at www.hurwitz.com.