Stratus to Bring Fault Tolerance to Windows 2000 Servers

Stratus Computer Systems wants to bring its fault tolerant computing technology, which serves as a pillar of high-end financial transaction systems, to commodity Wintel servers.

Stratus ( announced a new line of fault-tolerant servers earlier this month at Windows World in Chicago. The company is promising to deliver the servers, which had gone by the code name of Melody, this fall. The boxes are designed to run Windows 2000 Server, Advanced Server, and Datacenter Server.

The unveiling earned the blessing of Microsoft Corp. ( and Intel Corp. ( in the form of press release quotes, and enjoyed praise from several major analyst firms.

Stratus calls the new servers the Stratus ftServer 5200 and the Stratus ftServer 6500. Both servers will reach fault tolerance by doubling up on Intel processors and memory and running the processors in lockstep. If one processor-memory module fails during a transaction, the other is already working on the same transaction and carries it out. According to Stratus, the approach -- called dual modular redundancy or DMR -- will bring five nines of availability, or roughly 5 minutes of downtime per year.

Customers with higher availability needs, such as securities exchanges or telecommunications companies, will be able to opt for triple modular redundancy (TMR), which keeps three modules running. When one fails, two more are available to run in DMR mode until the first is replaced. This more expensive approach offers 99.9999 percent availability -- six nines -- or less than a minute of downtime per year.

Stratus is counting on price to help it break out of the exclusive high-availability market. The two-way capable ftServer 5200 will start at $23,300, with one 550-MHz Pentium III Xeon and its DMR twin in September. The four-processor capable ftServer 6500 will start at $30,500, with one 700-plus-MHz Xeon and a twin in October.

Stratus’ initial goal is to price its server to be competitive with a pair of clustered servers, says Mike Thompson, the company’s senior vice president of worldwide sales and marketing. In addition to the failover processor-memory units, the Stratus systems also include dual power supplies, fans, and clocks.

"This is an opportunity for Stratus to move out of its niche market," says Harvey Hindin, an analyst that focuses on high availability and clustering at D.H. Brown Associates Inc. ( "They can now offer fault tolerance at the same price as two-node clusters."

Fault tolerance is very different from high availability, Hindin explains. In a transactional environment, a highly available system is nearly always on to accept transactions. A fault tolerant system completes a transaction even if something happens to the system that a transaction is occurring on.

In a report on Melody, Vernon Turner, an analyst at market research firm IDC (, writes that once cost is factored out, Windows 2000 users will gravitate toward the Stratus fault-tolerant approach because it is simpler to administer. "When the purchase price for a fully fault-tolerant system is the same as a high-availability system engineered out of clustering systems, the discussion over the two approaches quickly moves to another topic," Turner writes.

While a clustered system requires an administrator to use a cluster-aware application, to configure a cluster, to script the failover, and to test the configuration, the Stratus’ fault-tolerant systems only require administrators to load the operating system and the application.

Stratus will steadily push its entry-level server costs lower for Windows 2000 customers, Thompson vows. "It’s our intent to get a server down to around $12,000 in the next 12 months," he says. Stratus told IDC prices for ftServers will come down to $3,000 by 2002.

Another company currently offers fault tolerance to Windows NT customers. Marathon Technologies Corp. ( offers 99.999 percent uptime and fault tolerance through its Endurance 4000. The product separates out a single system into four processes -- two compute components running in lockstep and two I/O components running asynchronously. If one "tuple" -- a combination of a compute component and an I/O component -- fails, the other completes the transaction and works alone until the first tuple is recovered. According to Marathon literature, the asynchronous approach to I/O prevents lock-stepped systems from crashing over the same glitch.

Marathon, which has been around for 5 years, has yet to deliver multiprocessing support within its systems. Assuming Stratus can deliver two-way SMP support with the ftServer 5200 in September and four-processor SMP support with the ftServer 6500, the company would gain a scalability advantage -- assuming Marathon will not deliver SMP support before then.

Within a Stratus box, a DMR system with four processors would contain eight chips, and a TMR system would contain 12 chips.

Marathon, which in the last year picked up an OEM agreement with Hewlett-Packard Co. ( and a services and support agreement with IBM Corp. (, has seen a rapid growth in sales volume that a spokesman attributes to the growing acceptance of Windows NT in environments where availability is important. Marathon had sold 500 units up until May 1999, but now is up to about 1,200 installations.

Stratus also hopes to ride that wave. "We think that with what is happening with Microsoft and Windows 2000 -- how they’re putting more into the reliability -- our timing is perfect," Thompson says.

[Infobox] Stratus Fault-Tolerance Roadmap

If Stratus Computer Systems delivers on its roadmap, the company will introduce two new elements to fault-tolerant computing in the Windows environment.

SMP Support

SMP SupportTimeframe
2 processorsSeptember
4 processorsOctober
8 processors2001

Entry Level Fault-Tolerant Server