Cluster Bombs

What's the best way to scale up to meet growing needs and still provide highly reliable systems? There are two clear choices: SMP and clustering. SMP addresses a single issue, processing power. Clustering has always been a more flexible approach, combining reliability and failover with the ability to share storage among many systems.

Having spent more than 15 years in a VAX/VMS environment, I'm used to having clustered systems. Sometimes it seemed that if we had two machines in the same zip code, we would automatically cluster them without another thought. Even though the architect for Windows NT was the same Dave Cutler who brought us VAX/VMS, clustering isn't the obvious choice for Windows NT systems.

The ability to share storage among several processors is a powerful idea, but many Windows NT administrators still look at clustering as a less desirable cousin of using traditional network shares. Despite that, International Data Corp. says enterprises will cluster 60 percent of their systems by 2001. Do analysts think clustering will become prevalent because of the new features in Windows 2000?

Anyone who remembers setting up a cluster with Microsoft Cluster Services (MSCS) under Windows NT 4.0 will appreciate how easy it is to set up new clusters in Windows 2000. When I used the Cluster Setup Wizard to build a simple, failover cluster, I was able to define the cluster with eight menu choices and add a second node in four. If you recall clustering under Windows NT 4.0 with the user-belligerent Cluster Administrator, you should celebrate the improvement.

Microsoft recently begun delivering cluster-aware versions of key BackOffice software, such as Exchange Server 5.0 and SQL Server 7.0. Adding Windows 2000 to these key applications should let system administrators reach for new targets of reliability.

It sounds great, but it hides one central fact: Microsoft is still behind in developing scalable and manageable clustering technology for enterprises that require highly reliable servers or share critical storage facilities among multiple nodes.

Windows 2000 Advanced Server only supports two-node system clustering. Those wanting real clusters with quorum to support nonreboots will have to wait until later in the year. Microsoft is scheduled to deliver a four-node clustering solution with Windows 2000 Datacenter Server. With Windows 2000 Datacenter, every new node joining the cluster gains access to the quorum device that defines the cluster. If one cluster member fails, the other nodes reorganize the workload and reallocate resources. In a nutshell, Windows 2000 Datacenter's clustering is far more capable than that in Windows 2000 Advanced Server.

The sad truth is that neither is really impressive. The idea that Intel-based, Windows 2000 clusters are approaching state-of-the-art levels is nearly laughable. It's impossible to ask Intel-based clusters to compete with, for example, Alpha-based clusters built on carrier grade 64-bit Unix systems like Tru64.

Some might be tempted to say "So what, NT clusters are intended to bring reliability to every enterprise with off-the-shelf, commodity systems and storage." In fact, significant shared-device clustering capabilities are still some time away.

If you want to see where Microsoft is going with clustering you only have to look as far as Compaq Computer Corp. After all, Compaq owns Tandem and the rights to Digital's clustering technology. Whether it's shared disk clustering or high-performance duplicate infrastructure, Compaq has the technology that Microsoft wants to capitalize on.

Digital has been working to bring its shared device clusters to MSCS. Oracle Corp. also developed a shared disk strategy called Parallel Server. Both Digital and Oracle are bringing highly scalable clusters to the marketplace.

The leader in high-reliability clustering has always been Tandem with its Nonstop software and duplication of components linked by a high-speed proprietary network.

Shared storage clusters represent one of the most important ways to add reliability to enterprise Windows NT systems. That we need to look beyond Microsoft for those solutions doesn't make them any less important. On the contrary, when thinking about storage, shared high-performance disk-based clusters are going to become an important response to the need for reliability and scalability. --Mark McFadden is a consultant and is communications director for the Commercial Internet eXchange (Washington). Contact him at mcfadden@cix.org.

Must Read Articles