Scaleable Data Warehousing: For the Windows NT Server Environment
The evolution of data warehousing and the rise of Windows NT Server plays a part in addressing the key criteria that should be evaluated in selecting a RDBMS to implement a Windows NT Server-based Data Warehouse or Data Mart.
As we rapidly approach the end of the decade, among the key industry trends that aredriving the future direction of computing are the growth and acceptance of MicrosoftWindows NT as an enterprise-class computing environment and the acceptance of datawarehousing as a mainstream technology to improve business competitiveness. Today, thesetwo trends have converged to make it practical for many new potential users to implementdecision support applications like Data Warehousing in a Windows NT Server environment.
This article briefly explores the evolution of data warehousing and the rise of WindowsNT Server and will also address the key criteria you should evaluate in selecting aRelational Database Management System (RDBMS) to implement your Windows NT Server-basedData Warehouse or Data Mart.
The Evolution of Data Warehousing
The notion of providing a single source of information that meets all businessrequirements has been through a number of incarnations. In fact, database products wereoriginally sold on the "Corporate Database" concept where all data itemsrelevant to an entire organization would be stored in a global data model and shared byall users and all applications. Originally referred to as Executive Information Systems,they later evolved into Decision Support Systems, and today are referred to as DataWarehouses or Data Marts.
Most installed Data Warehouses are of the type referred to as Enterprise DataWarehouses. These high-end Data Warehouses, with capacities ranging from 500 gigabytes tomultiple terabytes, have historically tended to be expensive, store vast amounts ofdetailed data about all aspects of an organizations business, and are the domain of theMassively Parallel Processing (MPP) architecture running either a proprietary operatingsystem or UNIX.
Smaller Data Warehouses, with capacities ranging from 50 to 500 gigabytes, are used bymany organizations who view the technology as a means of understanding more clearly thecauses and effects of swings in business. Aggregated data is normally sufficient here,while the scope of the information can still result in large volumes. The majority ofthese Data Warehouses have been built on Symmetrical Multi-Processing (SMP) hardwarerunning UNIX operating systems.
The last few years have seen the emergence of a new single subject or single departmentversion of the Data Warehouse called the Data Mart. A Data Mart stores only thatinformation which is needed to address a particular subject area supporting the needs of aspecific group of users. Data Marts have become extremely popular because they have asmaller data model, a shorter implementation curve, less data and fewer users, and a lowercost to implement and maintain. These considerations have made data marts a perfect matchfor Microsofts Windows NT Server.
The Rise of Windows NT Server
Since its introduction in July 1993, Microsoft Windows NT Server has gained inpopularity, both as a desktop and a server operating system. Windows NTs low price,ease of use and administration have made it a compelling solution for many organizationsthat previously would have been reluctant to invest in a new operating system. Marketacceptance of NT Server has been fueled by a trend towards more distributed computing andapplication deployment outside of the traditional corporate data center by departmentalusers. Today, NT Server has evolved to reach the stable, mature status of anindustry-standard operating system.
Last year Microsoft released Windows NT Server, Enterprise Edition 4.0, which extendedthe scalability, availability and manageability of Windows NT Server. Windows NT Server,Enterprise Edition contains key new features designed to support larger mission-criticalapplications. Those features include:
- Microsoft Cluster Server (MSCS), providing two-node, high-availability clusters on standard PC-server hardware.
- 4-Gigabyte Memory Tuning (4GT), providing up to 50 percent more application memory capacity for improved performance.
- Eight-way SMP server license, providing the ability to run Windows NT Server, Enterprise Edition on machines with up to eight CPUs.
A research study conducted earlier this year by Unisys Corporation found that manybusiness organizations are now expanding their definition of mission-critical applicationsbeyond traditional high-volume, transaction-intensive applications. According toUnisys findings, the next generation of mission-critical applications will includedecision support and executive information systems. The Unisys study also found that thepercentage of mission-critical applications deployed on Microsoft Windows NT would growfrom 12 percent today to 39 percent in 2002. The survey respondents see Windows NT asmeeting key criteria for supporting mission-critical applications: availability ofapplications, cost advantages, ease of use and support for the networked computingenvironments that enterprises need to be competitive.
Which RDBMS for a Data Warehouse on Windows NT Server?
The increased performance, reliability and popularity of Windows NT Server has expandedthe selection of operating systems that business organizations can now choose from todeploy their database applications. Today, it is important that customers choose arelational database management system (RDBMS) that will make maximum use of Windows NTEnterprise Servers current capabilities and provide the additional capabilitiesnecessary to support the unique requirements and workload of data warehousing.
Data warehouse workloads differ significantly from those of on-line transactionprocessing. Online transaction environments are typified by short quick retrievals of datawhere queries are relatively simple, tend to access single tables, and are usually knownor pre-defined. An example would be, "What is the current balance in account number562970?" Whereas in the data warehouse environment queries tend to be more complex,accessing multiple tables and in most cases, are unknown or ad hoc in nature. An exampleof a data warehouse query would be, "Which customers, summarized by city, haveaccount balances greater that $500 and make their purchases primarily on weekends?"
To respond to such queries, data warehouses require a RDBMS that can support multiple,complex, ad hoc query processing. Some of the important requirements that should beconsidered when selecting a database management system for building any Data Warehouse orData Mart are:
- Scalability
- Availability
- Query Performance
- Ease of Management and Administration
- Total Cost of Ownership
- Reference Accounts and Product Maturity
- Support Infrastructure
NCR Corporations recently released Teradata for Windows NT delivers thetraditional strengths of Teradata. Teradata is the industrys most powerful, provenand cost effective relational database management system. Teradatas shared-nothingarchitecture provides superior performance and will far exceed the current scalabilitylimits of Windows NT or other relational database systems.
Why Teradata for NT?
Building a data warehouse is an iterative process, starting small and incrementallyadding functionality. Demands on a data warehouse increase exponentially datavolumes grow, access patterns change, updates increase, and the number of users and toolsexpand, as does the number of operational systems that feed the warehouse. Scalability isthe cornerstone of Teradata. Teradata has a proven track record for supporting warehousesfrom as small as four processors up to some of the worlds largest. The Teradatasoftware and BYNET hardware provide the ability to scale; not the operating system and theSMP hardware. Teradata has overcome the NT scaling issues by adding more nodes andallowing Teradata to manage the entire configuration as a single system.
Mission-critical availability means access to data at any time. Teradata NT performsthe loading, reorganizing, archiving, restoring, or purging of data in parallel, whileusers are still running queries. Teradata NT also provides power failure protection,interconnect failure protection, disk failure protection operating system protection, aswell as protection when an entire node fails. Teradata NT automatically rebalances thesystem without interruption. This kind of robustness is unique in the NT Serverenvironment.
Complex data analysis and drill down questions require a high performance RDBMS.Teradata NT enables the unconditional ability to ask any questions of any part of yourbusiness. With Teradata all query operations are performed in parallel. This includesjoins, sorts, and aggregations, which limit the scalability and query performance of otherdatabases. Unconditional parallelism is the secret behind Teradata NT. Teradataspatented shared-nothing architecture ensures optimum performance with minimum tuning andadministrative overhead.
Teradata was designed to facilitate business decision-making in an environment thatrequired little to no administrative overhead. Because of this near-zero-administrationdesign requirement, Teradata provides a number of unique features that reduce many of thetime-consuming tasks of the typical database administrator. Most notably, Teradata neverrequires database reorganization. Rather, Teradata automatically distributes and managesthe placement of data across your entire system using an optimized hashing algorithm.
Teradata, the proven industry leader for data warehousing, with a 15-year track recordand more than 850 production sites ranging from 10GB data marts to more than 100TBenterprise warehouses, is now available on Windows NT. Portable and "open,"Teradata continues its dominance over all other products in performance, scalability andconnectivity, while offering the lowest cost of administration over any other warehousedatabase. NCR Corporations proven Teradata product surpasses all other databases inindependent benchmarks and industry awards.
Conclusion
The lower hardware, software and administration costs combined with the improvedperformance and availability features of Windows NT Server 4.0 have made data warehousingaccessible to an entire range of small to mid-size businesses and departments withinlarger organizations for which data warehouses were previously too expensive. As with anydecision to proceed with a data warehouse or data mart implementation, the question ofwhich RDBMS to use to build the warehouse is critical. With the recent release of TeradataNT, business organizations who are seeking to deploy an "enterprise-class" datawarehouse in a Windows NT environment, can now choose a relational database managementsystem with proven scalability, availability and performance that, unlike databases fromother vendors, can address the unique growth and processing requirements of datawarehouses.
ABOUT THE AUTHOR:
Jack W. Wood is Director, Data Warehouse Sales and Marketing for Formula ConsultantsIncorporated (Anaheim, Calif.). He has over 13 years experience in the Unisys marketplace.He can be reached at (714) 778-0123 (x128), or via e-mail at [email protected].