High Availability Solutions Becoming Highly Available

End users who need highly available Windows NT/2000 environments are being flooded with new offerings. The last month and a half has brought a slew of high-availability product and service offerings, most concentrated in the areas of fault tolerance and clustering.

One analyst says the explosion of offerings has little to do with the perceived stability of Windows 2000 and post-Service Pack 4 Windows NT.

"This is not an operating system phenomenon, this is a ‘We need to have high availability’ kind of phenomenon," says Harvey Hindin, vice president of the high availability and clusters service of D.H. Brown Associates Inc. (www.dhbrown.com). "The bottom line is everybody’s discovered with all this 7x24 business and e-commerce, a differentiator is high availability, both in the hardware and software domains."

Helping to drive the demand, Hindin says, is that high-profile system failures are being profiled on the front page of the The Wall Street Journal. "There’s a lot more coming from not only the major vendors, but a whole bunch of third-party companies. Wait until you see what’s coming out for Linux," he says.

Fault Tolerance

While high availability is easy to understand -- keeping a system running as much as possible -- fault tolerance is a special subcategory. To qualify as fault tolerant, a system must preserve a specific transaction when part of the system fails. Common clustering configurations require a reboot of the application when a server fails, meaning they lose the transactions that were being processed at the time of the failure. Additionally, those applications remain offline until they are restarted.

For a long time, Microsoft Corp. (www.microsoft.com) could not point to much fault-tolerant computing going on with Windows NT systems, but that is changing.

Marathon Technologies Corp. (www.marathontechnologies.com) has been selling a fault-tolerant system on Windows NT at low volumes since 1997. The company only had 500 sales in the product’s first two years. An OEM partnership with Hewlett-Packard Co. (www.hp.com) in June 1999 helped Marathon make the jump to about 1,300 unit shipments. Marathon also had its product, the Endurance 4000, placed on the Microsoft Hardware Compatibility List this past February.

As Marathon ramps up its sales volume, other vendors are moving into the space. Stratus Computer Systems (www.stratus.com) unveiled a new line of fault-tolerant servers running Windows NT in April, and promises to deliver them this fall (See story in April 26 issue of ENT).

The Stratus ftServers double-up on Intel processors and memory and run each processor-memory module in lockstep. Should one module fail, the other carries on with the application. Stratus plans to release two-processor -- actually four processors -- and four-processor -- actually eight processors -- servers in the fall. Customers not comfortable with the two-module arrangement can opt for a third processor-memory module for each functional processor.

Lucent Technologies (www.lucent.com) this month released what it calls Software-implemented Fault Tolerance (SwiFT) for Windows NT. Lucent says SwiFT consists of a collection of software components that add fault-tolerant capabilities to Windows NT-based applications. Functions of some of those components include automatic error detection and recovery, check-pointing/message-logging, event logging and replay, and data replication. The components integrate with existing commercial or in-house applications, according to Lucent. The focus is low-cost, fault-tolerant software.

Marathon, the only veteran fault tolerance vendor for Windows NT in a market that is quickly growing crowded, isn’t standing still. This month, Marathon unveiled Endurance 6200.

A criticism of the Endurance 4000 was its inability to scale beyond single-processor systems. Marathon achieves its fault-tolerance through a complex splintering of the compute element and I/O system, effectively splitting a uniprocessor machine into four uniprocessor machines to get high availability.

Marathon ensures 99.999 percent availability through the process, but the configurations come at a significant price. And while the complexity has made it difficult to scale beyond single processor performance, the market for the boxes has been limited because the system costs a lot but couldn’t handle SMP workloads.

With Endurance 6200, Marathon is taking a major step toward better scalability by introducing two-processor compute elements into the system.

"Customers are getting three to four times the throughput from the 6200 relative to Endurance ‘Classic,’" says Craig Jon Anderson, world wide director of marketing at Marathon.

At the Marathon announcement, HP unveiled a new line of server packages that combined the Marathon Endurance 6200 with specific HP NetServers and HP support for the whole bundle.

"You can get everything from HP -- the product, the servers -- everything is tested from HP," says Calvin Nieh, worldwide assured availability solutions product manager at HP. "You can buy servers that [Marathon certifies] on competing platforms, but you’ll be dealing with two different companies in terms of support."


High-availability enhancements extend past the specialized fault tolerance environments. Standard two-node Microsoft Cluster Services (MSCS) clustering capabilities are also being improved on by a number of vendors.

NSI Software (www.nsisoftware.com) began shipping software this month that eliminates shared disk as a single point of failure in MSCS configurations and shatters the current distance limitations between two MSCS nodes.

Under the NSI Double-Take GeoCluster solution, an end user can separate each cluster node by any distance, configure each node with its own local disks, and connect the nodes via IP LAN, WAN, or SAN. The GeoCluster software continuously replicates the clustered data using NSI’s Double-Take replication software.

By using Microsoft’s clustering APIs, the software runs transparently to any application that is already Microsoft cluster-aware. The software works with Windows NT Server 4.0, Enterprise Edition, and Windows 2000 Advanced Server.

According to NSI, GeoCluster is designed to support multinode MSCS configurations. Microsoft is supposed to release four-node MSCS clustering with Windows 2000 Datacenter Server later this year.

Moving down the complexity chain, IBM Corp. (www.ibm.com) is flattering a few other server vendors by imitating their packaged cluster offerings. Data General Corp. (www.dg.com) has long been in the habit of packaging nearly every conceivable Windows NT configuration with its "In a Box" product line. A cluster in a box solution from Data General is designed to simplify configuration and speed rollouts of MSCS-based applications. Last year, Compaq Computer Corp. (www.compaq.com) added a packaged cluster of its own.

Last month, IBM Corp. rolled out a similar package, although Big Blue claimed some originality by saying its preconfigured clusters were the first to ship running Windows 2000 Advanced Server. IBM also sells the package running Windows NT Server 4.0, Enterprise Edition.

IBM points out that its package means customers order one part number that consolidates about 50 components. According to IBM, the two-node bundles deliver a 20 percent savings over purchasing and configuring the individual components.

IBM’s HA Cluster Servers come in two flavors: a rack cluster starting at about $24,000 and a tower cluster starting at about $19,500. The rack option builds on a pair of IBM Netfinity 4500R 3U servers, which are two-processor capable. The tower option also scales to two-processor SMP and ships with IBM Netfinity 5100 servers. Both sets of servers include IBM’s X-Architecture features such as a dedicated service processor, Light Path Diagnostics, hot swap redundant power and hot swap fans, redundant cooling, and a cable management system.


In addition to its work packaging Windows NT Servers, Data General also built a name for itself over the past 18 months among high-availability seekers with uptime service offerings. Data General’s offerings came to 99.9 percent uptime on Windows NT, SQL Server 7.0, Exchange Server 5.5, and recently for Windows 2000. That is lower than the 99.999 percent Marathon offers, and the 99.9999 percent Stratus promises with its triple-module servers. Nonetheless, three nines is respectable among commodity Windows NT clustering environments.

In an effort to leverage its uptime services for increased sales of parent company EMC Corp.’s (www.emc.com) storage hardware, Data General issued a high-availability strategy with five levels. The strategy builds on partnerships with UPS vendor American Power Conversion, and with NSI for its GeoCluster software.

The levels are system protection, a single system with Clariion storage, Data General servers and uptime utilities, and an APC UPS; data protection, a single system with backup capabilities and EMC Symmetrix storage; site-level data protection, a single system with replication to support multisite management; application protection, a clustered configuration with Symmetrix storage and NSI distance clustering; and business protection, a clustered configuration with EMC Celerra storage and the 99.9 percent uptime guarantee.