In-Depth
10 Key Elements of Your Data Strategy
We look at the 10 key details your data strategy must address and suggest a good place to start formulating your strategy.
As data warehouse practitioners, one of our fundamental objectives is to help our organizations turn data into actionable information in order to achieve goals that include improving the decision-making process, identifying new opportunities, and recognizing issues and problems in time to take corrective action.
We know that an organization's data is a major asset, and we are well aware of the importance of data quality. However, we sometimes overlook the basic need to have an overall strategy for managing our data assets, a data strategy that encompasses more than data quality.
A data strategy should help determine these 10 details:
1. What data should be collected. This may seem obvious, but it is all too often simply answered with "everything".
2. How long to keep the data. You should monitor report usage to determine what analyses and data sources may no longer be needed and could be removed or archived. One of my data axioms -- "Data in a warehouse is like the clothes in a closet; even if we haven't accessed some data in two years, we still tend not to throw it away" -- is well worth remembering when formulating a data strategy.
3. Where to store the data. Some data needs to be available on-demand while other data may not be as time critical. A data strategy should consider what needs to be archived and where to archive it. Your data strategy must also answer the question: on-premise or cloud-resident? As organizations move data to the cloud and/or use SaaS applications, it is important to consider a "worst case scenario" should there suddenly be a "cloudburst" due to, for example, the bankruptcy of their cloud vendor. Make certain you retain all rights to your data, and make sure you have timely backups in your possession.
4. Data privacy and security. Who can access your data and what level of detail can they see? How should data be protected and/or encrypted?
5. Where can data be accessed (trans-border issues). Some countries place restrictions on data that can be transmitted across their borders. Make sure your organization, or your cloud vendor, is not inadvertently violating governmental restrictions.
6. What data can be displayed. U.S. government and various state regulations control the use, storage, and display of Social Security numbers.
7. The level of detail to retain. A major decision concerning data is the level of granularity to keep in a data warehouse. Although most early data warehouses contained summarized data, economic and performance advancements now make it more feasible to capture data at the detail or individual transaction level. However, just because this is now possible does not make it desirable. Make sure that you retain data at the level is required to meet current and anticipated needs.
8. Data governance. Who owns the data element, who is responsible for associated value lists, what must be verified (e.g., should addresses be checked against postal databases during data entry), how soon a value must be updated when it changes, what are the legal requirements, and so on. Although many organizations have devoted considerable effort to governance initiatives concerning customer master and financial data, data governance should apply to all data assets.
9. Data integration tactics. What data should reside in data warehouses and how often should it be loaded? What should be the data's source, and how much history must be retained?
10. Data virtualization. Since the advent of data warehousing the concept of a "virtual data warehouse" allowing access to source data without first moving it into a data warehouse has been proposed and, in many cases, implemented. Although some of the benefits include access to real-time data and reduced storage costs, one overwhelming risk is that data from multiple sources is often inconsistent and inaccurate. One of the benefits of a data warehouse is that data sourced from operational systems that were built prior to the establishment of organizational data standards can be transformed to conform to these standards in the data-load process; this is especially true with data sourced from legacy applications. Carefully evaluate the risks associated with data virtualization.
Note that these requirements are not mutually exclusively (e.g., data governance may dictate minimum privacy, security, and retention requirements) nor are they collectively exhaustive (i.e., there are other considerations as well).
Where to Start
One place to begin when formulating an overall data strategy is to ask, "What data do we need to run/analyze our organization and how long do we need it?" In the design of any operational or analytic application, this is the one of the most important concerns. I am an admitted "data chauvinist" and maintain that data is more important or fundamental than process. If you have the right data, you can always find a way to process it; if you are missing critical data, then processes that depend on it will be compromised or simply not work at all. Furthermore, don't be afraid to retain summary-level data in your data warehouses while archiving transaction level data in offline storage in case it is needed later. Be aware that if you only have summary data, you can no longer explore the details behind it.
Recognize that data is the lifeblood of your organization and treat it accordingly. Remember, "garbage in, garbage out" is just part of an overall data strategy.