W2K Advanced Server Clustering: Better than NT 4, But at a Price

Windows NT 4.0 has been criticized mercilessly for its clustering ability. While other operating systems, Unix for example, have been able to cluster to multiple nodes for years, NT is limited to just two nodes. In return, Microsoft Corp. has spent a lot of time and energy touting the clustering capabilities that Windows 2000 will bring, particularly the Advanced Server and Datacenter Server products.

While Datacenter Server, which Microsoft is promoting as its highest-end clustering solution, is still in the beta testing process, Advanced Server will be available this month. Although it is still limited to two-node clustering, Windows 2000 Advanced Server offers several substantive features that enhance the company’s high-availability portfolio, but the price for such capabilities is a longer and more complex installation process.

The Testing Bed

For simple fail-over between clustered nodes, Windows NT 4.0 leverages a shared SCSI bus model that requires some architectural refinements if it is to be effectively implemented.

For testing purposes, we used a pair of Microsoft Cluster Server approved eXtremeRAID 1100 controllers from Mylex Corp. (www.mylex.com). The eXtremeRAID 1100 packs a number of cluster-friendly features, including a BIOS setup feature that enables the controller to operate in cluster-aware mode. Most importantly, the Mylex eXtremeRAID 1100 steadfastly maintains SCSI termination -- even in the event of a power failure on one of the nodes in which the eXtremeRAID controllers are hosted.

We installed and configured both Windows NT 4.0 Enterprise Edition and Windows 2000 Advanced Server on our two test machines, the first of which is an AMD Athlon 750-MHz system with 512 MB of RAM and 18 GB of internal storage; the second is a 550-MHz Xeon-based server, also with 512 MB of RAM and 18 GB of internal storage. Each machine was configured in a shared SCSI bus schema and connected to an Ultra2 SCSI 36 GB external storage subsystem from Seagate Technology Inc. (www.seagate.com).

In Microsoft’s shared-SCSI schema for Windows NT clusters, a single RAID or SCSI controller is configured with the default controller ID in a SCSI chain -- #7. The second RAID or SCSI controller is configured with SCSI ID #6 and booted after the default controller. In this regard, the benefit of our Microsoft Cluster-certified Mylex controllers was readily apparent, as the eXtremeRAID controller configured with SCSI ID #6 detected the other eXtremeRAID controller at ID #7 and automatically entered clustered mode during its initialization sequence. In addition to not causing actual damage to the controller or to other devices on a SCSI chain, this type of functionality can also prevent corruption to the shared disk subsystem that is the storage underpinning of the MSCS architecture.

The shared SCSI bus enables the storage side of Microsoft’s clustering solution, but it’s the heartbeat connection between two nodes -- enabled by, of all things, standard fast Ethernet network interface cards (NIC) -- that provides MSCS’ failover mechanism in the first place. We used both 3Com 3C905B-TX standard 10/100 Ethernet adapters and 3Com 3C980B-TX 100-MB server NICs.

Windows NT 4.0 Enterprise Edition

Configuring MSCS in Windows NT 4.0 was a surprisingly trouble-free experience. We’d heard a number of negative accounts -- let’s call them horror stories -- about configuring MSCS in NT 4.0, but we actually didn’t encounter a single difficulty. Start to finish time between building our servers and completing configuration of MSCS was about two hours.

We configured our Athlon-based primary test server as a primary domain controller (PDC) and established our Xeon box as a backup domain controller (BDC). We then invoked the MSCS install wizard on our Athlon PDC, named our cluster, specified our MSCS service account, and defined a partition on our shared storage subsystem for the cluster quorum and log files. Joining our Xeon-based BDC to the cluster was a simple task as well.

Because we were installing Microsoft Exchange Server 5.5 Enterprise Edition on our Windows NT 4.0 Enterprise Edition MSCS cluster, we applied Windows NT 4.0 Service Pack 5 to both our Athlon PDC and our Xeon BDC. Microsoft documentation specifies that Service Packs 4 and later are required to patch specific problems with MSCS and Exchange.

Exchange Server 5.5 Enterprise Edition is a cluster-aware application, and during its install process invoked a special clustering-compliant setup process without us having so much as to lift a finger.

Microsoft explicitly states that it will only support MSCS in select configurations that it has pre-approved as cluster-compliant. This may well be the case, but we suspect that because of the complexity of its shared SCSI bus architecture, the most important single component in any successful MSCS implementation is a cluster-aware SCSI or RAID controller card. The eXtremeRAID 1100 RAID controllers from Mylex that we used were tested and certified with Windows NT 4.0 Enterprise Edition and MSCS, which probably explains why we didn’t experience any substantive problems during our review process.

Windows 2000 Advanced Server

For the Windows 2000 Advanced Server cluster, we configured our test servers as domain controllers in the Active Directory.

A successful installation was not so easy this time. We went through each of the steps, naming the cluster, specifying the MSCS service account and preparing a partition on our shared storage array for the cluster quorum and log files. But MSCS would not start and we were forced to remove the uninstalled MSCS service and reboot each time anew.

We traced the problem to the network adapter cards that we were using to enable the "heartbeat" connection between our two clustered nodes. These NICs -- the very same 3Com 3C905B-TX 10/100 Ethernet cards that we leveraged with great success in our Windows NT 4.0 Enterprise Edition clustering test -- refused to initialize properly in Windows 2000. We replaced these somewhat plebeian 10/100 adapters with some exotic 100-MB 3Com 3C980B-TX server NICs and, voila, a working cluster heartbeat.

Lesson learned: Not all the products certified to work with NT 4.0 are automatically certified for Windows 2000. It’s a good idea to check this before beginning to build a cluster with Advanced Server.

Whereas installing MSCS in Windows NT 4.0 and configuring one node as a PDC and the other as a BDC was an essentially pain-free task, we ran into a problem in setting up Advanced Server.

We got hung up on a dialog box from one server stating that it was not a domain controller. Our cluster needed to be a domain controller or authenticate against one. We could have authenticated against a Windows NT 4.0 domain. But without such a domain, the only way to make a Windows 2000 Server a domain controller in a new domain is to configure the Active Directory.

Installing Active Directory on the clustered machine is neither beneficial, necessary, nor in any way encouraged by Microsoft. Unfortunately, we could either have constructed a Windows NT 4.0 domain on a third server or we would have to build an Active Directory. We chose to bite the bullet and build the Active Directory.

All in all, it took us the greater part of two hours to get all of the kinks worked out and to enable our Active Directory domain controllers to talk to each other.

It took two hours to get MSCS up and running on Windows NT 4.0 Enterprise Edition. Other than the time it took to install the Active Directory, we spent three hours installing Windows 2000 and configuring the shared storage devices.

Once we had our Advanced Server cluster up and running, though, a number of improvements became apparent. It supports Microsoft's ClusterAPI, which provides a standard framework interface for writing MSCS cluster-aware applications. The base API is the same in Windows 2000 as it is in NT 4.0, but there are three new APIs that enhance W2K: Cluster Automation Server, an API for cluster backup, and a replication API. The Cluster Automation Server is a set of COM objects that enables developers to build automated tasks for clustering. The API for cluster backup lets IT back up and restore cluster configurations, and the replication API lets developers write cluster aware applications for replication.

The new server also offers a number of wizards to more easily configure applications for the ClusterAPI. This enhanced support for clustering at the operating system-level alone should make Windows 2000 an altogether friendlier platform for hosting clustered applications.

Advanced Server’s native TCP/IP load balance clustering capabilities look to be phenomenal -- supporting up to 32-nodes -- and could be ideally suited for today’s demanding 24x7 e-commerce implementations. To help administrators keep track of clusters, Advanced Server’s management interfaces are integrated with the Microsoft Management console and, consequently, have a cleaner, more intuitive look and feel than do their NT 4.0 counterparts.

Advanced Server also offers an additional granularity of clustering configurability, offering support for virtual clusters, as well as native fail-over support for DHCP and WINS, while NT 4.0 does not. Both DHCP and WINS, however, already have a capacity for redundancy built right into them. A DHCP Server can be installed in one subnet and a DHCP Server can be installed in another subnet. A Windows NT Server can also be added to each of the two subnets to act as a "DHCP Relay Agent," which passes along DHCP requests across a subnet to a working DHCP Server, in the event that the default DHCP server for the subnet is unavailable. And WINS always lets administrators define primary and secondary WINS servers. So there is a backup right there. In other words, support for WINS and DHCP is not reason enough to upgrade to Advanced Server’s clustering.

One of the things that we liked most about Windows 2000 during prior testing experiences was its voluminous help support, and especially its greatly enhanced ability to make Windows NT’s once cryptic event log notifications much more understandable. But whereas Windows 2000’s support services can be invaluable for an administrator installing a daemon like Routing and Remote Access Service, for example, we found Advanced Server’s functionality a little bit lean insofar as MSCS help was concerned.

Concluding Observations

Companies that need only base-level fail-over clustering are better off not moving to Windows 2000 Advanced Server just yet, unless they need the OS for different capabilities. But companies that want better manageability through the MMC, support for more clustering APIs, and native TCP/IP load balancing will find the enhancements Microsoft made to Advanced Server a welcome change from NT 4.0.

Finally, companies that are looking to Windows for failover clustering beyond two nodes will have to wait until Microsoft’s forthcoming Datacenter Server ships later this year.

For more on Windows 2000 Advanced Server Clustering, please see: