The Data Model’s Role in Data Governance
Data models are an important part of your data governance program. We look at the Zachman Framework, which describes multiple levels of data models.
- By Jonathan G. Geiger
[Editor’s note: Jonathan Geiger is leading three sessions at the TDWI World Conference in Las Vegas, February 12-17, 2012, including TDWI Advanced Data Modeling Techniques and TDWI Data Modeling: Data Analysis and Design for BI and Data Warehousing Systems.]
We’ve seen an increasing interest in data governance in recent times. The drivers for this interest varied. Some efforts were started to support master data management (MDM), business intelligence (BI), or other IT initiatives, while others were kicked off to provide specific business benefits, address perceived business risks, or improve data quality and consistency. Regardless of the catalyst, all of data governance program have several things in common, including an organizational structure with visible executive support and cross-functional representation, associated policies, procedures, standards, and practices, supporting technologies.
In my view, data models should rank right up there with other components of the data governance program. At the heart of data governance is the need to treat data as an asset. Other asset management programs include a formalized approach for organizing information about the asset – financial asset management use a chart of accounts; human resource management utilize organization charts; facilities management use blueprints and maps. Similarly, data management should be supported by its organization aid -- the data model.
The Zachman Framework describes multiple levels of data (and other) models.
It begins with a planner’s perspective in which the subject areas provide a framework for organizing the data assets. Just as the chart of accounts delineates among assets and liabilities on the balance sheet and revenue and expenses on the income statement, the subject areas identify the 15 - 20 major groupings of information. Commonly recognized subject areas include Products, Locations, Customers, Human Resources, etc. Another important subject area, particularly within the context of data governance, is Information.
Unlike the business data model, the subject area model can be developed quickly, often in a matter of hours or a few days. Each subject area needs to be defined, with the definitions all being mutually exclusive. Sample entities within each subject area should be included in the definition to improve people’s interpretation of it. If desired, major relationships among the subject areas can be drawn. In addition to helping in developing the downstream data models, the subject areas contribute directly to the data governance structure. Specific uses include:
Prioritization support: The subject areas can be used to establish at a high level the data areas that need to be addressed most urgently. One technique for doing this is to independently evaluate the potential impact of each subject area and the level of satisfaction relative to the data governance drivers. For example, if the driver for data governance is to improve consistency of business intelligence reports, each subject area can be rated based on the impact that good data in support of BI can help the business (with 5 being high) and on the degree to which that data is available today (with 5 being poor). The product of the two provides information on the relative business priority of the subject areas in support of the goal.
Data stewardship assignment: One approach for designating data stewards is to assign them based on the data they are addressing. (Other approaches include assigning them based on organizations they represent.) The subject areas provide a basis for making such assignments.
Business Data Model
The owner’s view (row 2 of the Framework) is represented by the business data model. This model represents the data of interest to the enterprise and the associated business rules independent of the business process and organizational structure. Only one such model exists in an enterprise, and it is generally created as a ‘logical’ model in third normal form. This model instantiates the data structure information gleaned through the data governance activities. It includes:
- Individual data attributes grouped within entities
- Definitions for each attribute and entity
- Relationships among entities, along with the optionality and cardinality
- Other “metadata” the modeling process captures
The data model is developed primarily by a skilled data modeler and the data steward. Others, such as subject matter experts, may also contribute to the effort. The sanctioned information in this model establishes the basis for the physical data structures.
System Data Model
The designer’s needs are addressed by the system data model. This model is typically extracted from the business data model to encompass data within the scope of a particular system. During the transition, constraints, data entities, and data attributes may be adjusted based on the scope and purpose of the system. In addition, since this model will eventually be deployed, it is denormalized based on the performance requirements and usage of the system being developed. Although adjustments may be made, the resultant model must remain semantically consistent with the business data model.
The dimensional model is also a system model, but unlike its relational counterparts, the dimensional model should be developed using dimensional modeling techniques. Although it need not represent all of the business rules in the business data model (because it is designed to facilitate data navigation and retrieval), it cannot violate those rules.
The database builder applies the technology model. This model transforms the system model into a physical model that reflects the platform and DBMS on which it is being implemented. At this level, data types and other physical characteristics may be incorporated, along with other constraints that can be performed by the DBMS. When this model is developed using a data modeling tool, the tool can often be used to generate the DDL that will create the ultimate schema.
An effective data governance program requires a facility for organizing the data and for storing critical information about it and the data models meet much of this need. At the highest level, the (subject area) model provides the framework, while at the lowest level the (technology) model provides the parameters for the physical structure.
The models need to be augmented by appropriate business processes and also by a capability to manage other information about the data (e.g., metadata). Some of the metadata may be stored within the data modeling tool, though a more comprehensive facility (e.g., repository) will often be needed to improve the efficiency with which the metadata is managed and delivered to its ultimate users.
I welcome your feedback. Please send me your thoughts at firstname.lastname@example.org