DB2 Database Clusters and the New VI Architecture Standard

The move of select mainframe-class relational database management system (RDBMS) applications to clusters of low-cost servers is on. Users are demanding enhanced access to the ever greater volumes of information at the lowest possible cost. In response, software, hardware and middleware vendors are developing significant new cluster-oriented products that are already starting to change the face of enterprise computing.

Microsoft, for example, is directing its Windows NT Server operating system toward clustering technologies, moving its low-cost per MIP "up stream" with Microsoft Cluster Server (MSCS) and future Windows NT 5.0 releases.

Hardware vendors, meanwhile, are delivering ever faster cluster interconnects based on new standards that have emerged to ensure the performance of clustered applications. Chief among these is the new Virtual Interface (VI) architecture, a distributed messaging specification and supporting architecture that enables today’s highest speed, lowest latency interconnectivity between cluster components.

IBM DB2 Universal Server

The first implementation of cluster-aware middleware over a VI interconnect. The final critical dimension is cluster-aware middleware. This software resides between an application’s programmers and the cluster’s innate complexity — hiding the fact that data, peripherals and other resources are distributed across multiple servers in a cluster.

The most common types of middleware are distributed RDBMS. Not coincidentally, read-only data warehouse, data mart, online analytical processing (OLAP) and data mining applications, are leading the current wave of migration from mainframe-based transaction systems to low-cost, high-performance Windows NT Server systems.

IBM’s DB2 is a mainframe database system familiar to almost all information technology (IT) professionals. As Microsoft is moving Windows NT Server upstream via clusters, IBM is pointing DB2 downstream to take advantage of Wintel price/performance and the new clustering paradigm. Its newest implementation — DB2 Universal Database Enterprise Extended Edition — is optimized for Windows NT Server clusters. It is also the first implementation of an open RDBMS to be ported to the new VI Architecture standard, a distributed messaging specification for cluster computing endorsed by more than 300 software, hardware and networking vendors. This move puts DB2 in line with prevailing needs for enhanced price/performance, portability and standards. It also brings IBM’s significant experience in clustered mainframes and UNIX-based servers to bear on the new Windows NT Server cluster paradigm.

Clustering 101: Achieving Scaleable Performance

A cluster is essentially a group of autonomous servers working together as a single system. It can be grown, or scaled through the addition of new servers. Although applications and data can reside anywhere within the cluster, the cluster appears to the user or client application as a single computer and can be managed as such. If this sounds suspiciously like parallelism, it’s because it largely is.

Until now, the only way to scale a Windows NT Server system was by adding processors within the box. A single copy of the operating system can take advantage of multiple processors — all sharing the same memory and I/O subsystem — to execute applications and Windows NT Server functions in a symmetric way. This is known as the symmetric multiprocessor (SMP) model, and it is quite different from a true cluster. Because all of the processors contend for the same system resources, a commodity-priced SMP system cannot scale with reasonable linearity much past six or eight CPUs. Further SMP scaling requires complex high-priced memories, caches and system buses (a scenario that would quickly erode the cost-effectiveness of Windows NT Server.

A true cluster, on the other hand, is based on a shared-nothing parallel model. In this model, each server node has its own copy of the operating systems, memory resources, own I/O subsystem, storage resources, and so on. Because clusters avoid the memory contention problems inherent in SMP systems, users are better able to expand power and improve performance by incrementally adding commodity-priced nodes to the cluster and balancing the workload among them. The result is a full node’s worth of performance for each added node.

DB2 Database Clusters

Microsoft has elected to achieve high-end Windows NT Server scalability through clustering low-cost, multiprocessor servers (up to four processors, soon to be eight processors) in a shared-nothing configuration. Meanwhile, DB2 Universal Database is perfectly positioned to support large Windows NT Server cluster configurations as they appear.

As a shared-nothing parallel database, DB2 logically breaks a single query into multiple subqueries that are then parceled out to multiple nodes for processing in parallel. Internode data movement, which could exert a major drag on cluster performance and scalability, is minimized through DB2’s intelligent distribution of the subqueries and filtering functions on each node.

A major limitation to scalability, experienced by traditional shared-disk RDBMS systems, is eliminated by shared-nothing RDBMSs, such as the IBM DB2, Tandem NonStop SQL and Informix XPS. A shared-nothing database does not allow disk resources to be accessed directly by more than one node. All other nodes send requests for data to the node that owns a particular part of the database. This eliminates the need for a distributed lock manager (DLM), which is used by shared-disk distributed database systems to prevent multiple nodes from simultaneously accessing the same database row for update. DLM contention grows with the addition of nodes to the cluster and ultimately limits any further growth.

From a business perspective, an IT organization can use its mainframe DB2 application programmers to develop for the new clustered environment. The programmer interface is the same, and programmers are completely shielded from the complexities of the cluster configuration , because DB2 transparently manages the distributed data.

Advantages of the VI Architecture

With cluster-aware middleware, such as DB2 Universal Database, NonStop SQL and Oracle8 database now falling into place on Windows NT servers, applications and systems designers will be able to take advantage of the new VI Architecture standard for achieving maximum application portability and cluster performance. The VI Architecture is a distributing messaging specification developed by Compaq, Intel and Microsoft. It defines a standard transport layer and a single application programming interface (API) that software and hardware vendors can use to provide high-speed, low-latency interconnectivity between servers in a database or other cluster.

All database vendors have written a portability layer of common operating system services. Server vendors and networking vendors modify the internals of the standard portability modules to optimize them for the operating system and the type of intercluster network to which the database is being ported, including Windows NT Server, UNIX operating systems, and the AS 400 minicomputer. This allows a single set of upper level code modules to always recognize the same services. The VI Architecture, however, is independent of the operating system, network and chipset. This means that it is now possible to create a single set of OS services for cluster communications. As a result, a database engine ported to this standard interface will be portable to any VI Architecture-enabled environment without changing a single line of code.

Taking the operating system out of the critical path for internode communications provides critical cluster performance benefits as well. When the VI Architecture was being developed, it was determined that the majority of interserver messages in a cluster tended to be very short (256 bytes on average). It was also determined that the latency of these messages has a direct correlation with overall cluster throughput.

Message latency is a combination of the time it takes to move the message across the interconnect hardware and the time it takes to move the message from the application through the operating system and communications protocol stacks. Whereas the short messages that characterize interserver communications spend little time going through the interconnect hardware, they spend a lot of time moving from the application to a kernel buffer, causing a context switch from user mode to kernel mode and going through a reverse of this process at the other end of the connection. This untunable operating system overhead cannot be removed from implementations of the traditional communications architecture. The end result is that all traditional LANs and WANs exhibit a fixed latency that cannot be reduced by new protocol stacks or faster interconnects.

The VI Architecture accommodates traditional LAN and WAN implementations, such as Ethernet and Asynchronous Transfer Mode (ATM). However the VI architecture depends on the availability of a very reliable interconnect. Although a VI interface can be grafted on a traditional LAN or WAN, the required data integrity and guaranteed sequential delivery of messages can only be achieved with those traditional interconnects by using high-overhead transport protocol stacks. Thus, even though there might be a natural predilection to use existing network hardware when implementing a cluster interconnect, the hefty protocol that comes along with it can have a stultifying effect on latency and processor consumption.

Creating Low-Latency, High-Performance Switching Fabrics

For lowest latency, highest performance, and reliable interconnections, the VI Architecture is optimized for the new system area networks (SANs) coming to market. Designed for efficiency and speed, a SAN does not require a heavy software protocol stack to ensure in-sequence message delivery or data integrity. Instead, this can be accomplished through low-cost, VI-compliant hardware.

The role of a SAN is to transfer all message traffic in a cluster. A SAN must also be able to provide the communication path between the servers and storage area networks. With the current trend toward intranet applications and multimedia database systems, the ideal implementation of a SAN is a high-speed, low-latency packet-switched network. A network fabric of cascaded switches can efficiently move image processing controller (IPC) messages, video, audio and stored data from any endpoint of the network to any other endpoint. Another critical attribute of a switching fabric is its ability to increase its overall bandwidth through the incremental addition of switches. The overall bandwidth of a switching fabric can be massively increased, without changing any of the existing network interface cards on the servers or any of the existing cabling.

Emerging SAN hardware implements much of the functionality of a four-level protocol stack. This eliminates the overhead and latency of traditional protocol stacks. If the SAN also conforms to the VI architecture standard, then the operating system overhead is also eliminated.

An early example of a VI compliant SAN, where reliable delivery and reception of messages is guaranteed by the hardware, can be found in Tandem’s ServerNet interconnect technology. The distinguishing component is a six-way non-blocking switch implemented in a single application specific integrated circuit (ASIC). As discussed, a SAN scales through the addition of switches. In this case, a massive scale to one million end points and 150,000 gigabits per second can be achieved in order to handle more complex queries or transactions, higher data volumes or more users.

Proof Point

In February 1998, IBM, Intel and Tandem publicly demonstrated a beta version of DB2 Universal Database, Enterprise Edition on a VI Architecture-enabled Windows NT Server cluster built from standard commodity components. This was the first implementation of an open RDBMS ported to the VI Architecture. Six off-the-shelf Compaq ProLiant 6500 servers running Windows NT Server 4.0 were interconnected using ServerNet technology. The configuration also consisted of 0.25 terabytes of disk storage and 12 I/O channels. The aggregate data movement capacity provided by the SAN was 4.8 gigabits per second—more than four times faster than gigabit Ethernet.

The 30 million row database was implemented on the cluster using the standard VI Architecture API with a ServerNet technology-enabled SAN. The database was taken from a retail industry application and contained information on 50,000 suppliers, 750 customers and 1.5 million orders.

The same complex query was run four times against the database. It was first run on a single-node configuration, then on two, three and finally all six nodes. When all six nodes were used, there was a nearly six-fold reduction in execution time, proving the value of efficient (that is, low-latency) message passing between shared-nothing nodes and the linear scalability of the cluster.

Putting the Pieces Together

With the VI Architecture, VI-compliant SANs, low-cost Windows NT Servers and a cluster-optimized RDBMS, the picture is complete for an enterprise-ready database cluster. The VI Architecture enhances the database’s portability and ensures its performance and scalability in a Windows NT Server cluster environment.

Cluster-aware database engines with large installed bases, such as IBM’s DB2, NonStop SQL, Informix XPS and Oracle Parallel Server, allow organizations to leverage programming skills and training as they migrate applications from the mainframe to Windows NT Server clusters. Programmers are shielded from cluster complexities while the database transparently manages the data in a cluster environment. When combined with the VI Architecture and its standard API, database applications can benefit from massive scalability at a commodity price and at a measured, cost-effective pace. The ability of the VI Architecture to support SAN technologies means fast, low-overhead system interconnections and even communications fault tolerance.

Many mainframe attributes are now exhibited by Windows NT Server clusters, providing a new low-cost alternative for the IT professional. The demonstration of DB2 combined with the VI Architecture provides a preview of future applications. Massive OLAP databases and fault-tolerant Web server complexes can now reside on low-cost Windows NT Server-based clusters. Data mining and other decision support operations can now be launched against highly detailed data at a reasonable cost. The door is open and organizations with large DB2, and other open RDBMS can take the opportunity to better serve their users if they consider clustering and begin moving select applications. The clustering experience they gain today will quickly scale into significant dividends down the road.




Patrick Vallaeys is Director of ServerNet Business Development at Compaq Computers Corporation’s Tandem Division (Cupertino, Calif.). He can be reached at (408) 285-7388 or via e-mail at patrick.vallaeys@tandem.com.