Managing DSS Data
Once you've designed, tested and deployed a decision support software (DSS) application, there comes the problem of managing it. While many of the issues related to managing DSS data are commonly faced with transaction processing (TP) databases, there are a number of issues that are unique to DSS databases.
DSS databases, for example, tend to deal with substantially larger volumes of data than a traditional TP application. Where 10 gigabytes is considered to be a big TP database, that amount qualifies as a small to mid-sized data warehouse. This is because TP systems tend to focus on a specific application, such as order entry, which limits the scope, and therefore the volume, of the data. TP systems also typically purge old data, rather than keeping it online to enhance application performance.
A data warehouse, on the other hand, collects and retains historical data, often from multiple TP systems. Many data warehouses are in the hundreds of gigabytes range. A few are at the multiterabyte level. This data is often kept in a star schema, rather than a third normal form schema, which adds layers of aggregated or redundant data. While this can significantly improve performance, it also imposes greater storage requirements.
DSS application queries tend to be more diverse and wide-ranging, while TP queries are constrained to a specific business function, such as order entry, customer service or cost accounting. DSS users tend to ask broad questions such as, "How many of my customers purchased orange juice and potato chips in convenience stores this quarter, compared with the same quarter last year?" The typical TP query is a record lookup for a single entity, such as a purchase order or customer service record. The variability of DSS queries creates more performance issues on the system, and users may perceive that performance is variable. Ideally, response time should be consistent within a predictable range.
Managers of DSS applications have to deal with data updates, data quality and data transformation as part of data warehouse processes. This creates more complex processes to obtain, scrub, format and prepare data for the target database. Scheduling, error handling and data staging all have to be managed.
DSS applications tend to be used by a broader mix of users, whereas TP applications are typically used by a relatively small, homogeneous group of users. Security issues take on broader ramifications since you need a system that will grant access to users who need to analyze the data, while protecting it against unauthorized usage or tampering.
When you are designing a data warehouse for deployment, part of the design and development process should include developing a list of the kinds of system and data management tools you'll need and analyzing the products from the various vendors.
What kind of tools will you need? Some of the tool categories that should be considered include backup and recovery; data extraction and cleansing; query management and analysis; and security management, including authentication and authorization. If you're in a dynamic environment with rapidly changing business requirements, you'll also want to consider tools to help with database and schema reorganization. If multiple organizations are accessing the data, you may need a mechanism for tracking usage that is capable of managing billing or chargebacks.
Over the next couple of issues, I'll concentrate on the area of decision support system management. I'll describe some of the third-party tools available to help deliver consistent response time, availability and data reliability. If there is a tool you would like to hear more about, send me an e-mail and I'll do my best to dig up some unbiased information on it. --Robert Craig is vice president of marketing at WebXi Inc. (Burlington, Mass.), author of Microsoft Data Warehousing, and a former director at the Hurwitz Group Inc. Contact him at firstname.lastname@example.org or via the Web at www.webxi.com.