Up-Time is a Matter of Endurance
Soyou want to be up more than 99.999 percent of the time? That’s a tall order,especially in today’s power-hungry, 24-hour-a-day, Internet-driven world.Marathon Technologies may have the solution that provides that kind ofreliability. With the Marathon Endurance 6200 product, an upgrade from theEndurance 4000, Marathon feels that they can provide near-100 percent uptime ona multiple CPU Windows NT server platform. And how they do it is unique.
Itis important to understand the difference between a cluster and afault-tolerant system to understand what Marathon is doing with the Endurance4000 and 6200 systems. In Windows terms, a cluster is a loosely coupled set ofindependent systems that work in a cooperative manner to provide fail-overcapabilities or load balancing, or both. A cluster of Windows NT or Windows2000 servers typically contains multiple independent systems connected byeither a common SCSI bus or a private LAN connection. In contrast, afault-tolerant system is a tightly coupled set of independent hardware andsoftware modules that work in consort within an individual logical server toprovide complete redundancy. There have only been a few true fault-tolerantsystems in the general computing marketplace. Clustered systems are targeted atboth failure resistance and scalability: Fault-tolerant systems are targeted athigh availability above and beyond the five-nines category.
Usingoff-the-shelf Windows NT servers from any manufacturer, Marathon Technologieshas built a fault-tolerant platform that can withstand just about anything youthrow at it and continue running. What makes Endurance different from the olderfault-tolerant platforms from Tandem, Stratus, Digital, or Sequoia is thatMarathon does not require a special operating system. The Endurance system runson out-of-the-box Windows NT servers. The Endurance configuration uses foursystems that appear to the clients as one logical server. Two systems operateas Compute Elements (CEs) running in lockstep, while the remaining two systemsoperate as I/O Processors (IOPs). Since the CD-ROM, hard drive, keyboard,mouse, and Ethernet controllers are not required in the CEs, this lowers thecost of the Endurance solution to about the cost of three systems with diskstorage. Marathon Endurance runs as a concurrently redundant system providingalternative components that run in parallel on the same task and, in thepresence of a failure, provides continuous service without disruption. Thesurviving component can continue to service requests and maintain real-timedata availability. Applications are loaded normally and run normally. As far asthe system is concerned, four systems are tied together as one and act as one tothe operating system, applications, network, and client systems. The CEs andIOPs are tied together with the Marathon Interconnect (MIC) subsystem. A CE isalways connected to each IOP via a MIC connection. A connection typicallyconsists of a physical link and a pair of Marathon-designed MIC adapter cards,one in each of the connected systems. The MIC adapter is plugged into a slot ofa standard PCI bus in the CE or the IOP.
Marathon’sSplitSite technology delivers site disaster tolerance as well by permittinggeographic separation of the two halves of an Endurance array. Each side willhave a CE and an IOP. During a system component failure, the system acts as ifit where in the same room; during a disaster fault, the surviving half computesthrough the disaster without interruption of service.
Isthis solution expensive? Yes and no. Yes, you are going to buy four serversinstead of one server, plus the cost of the Endurance hardware and software.But, it won’t go down due to a hardware failure. Considering the cost ofservers -- $10,000 for a typical high-end server instead of $100,000 or morejust a few years ago -- it’s a bargain at twice the price. What is your dataworth? What is your ability to service your customers worth? MarathonTechnologies is so confident about its ability to keep your data safe andavailable they have a $250,000 guarantee built into the product’s warranty.
Proven Track Record
Irecall working on the ftVAX (Fault Tolerant VAX) from Digital Equipment Corp.many years ago. I was impressed with the architecture and design. I rememberthe demo of the ftVAX that showed how the four boxes consisting of multipleCPUs and mirrored disks could be dismembered piece by piece and, as long as atleast one of each critical piece was still functioning and properly connected,the system kept on going without a hiccup. I mention that platform because thesame folks that designed it are the founders of Marathon Technologies. Thesefolks never gave up on the idea of a truly fault-tolerant system, they justredirected it to the largest server market that was emerging in the industry:the Intel-Windows NT market.
Marathon’scustomers report that the system is as durable as it claims. William Harris,systems support specialist at Ohio Utilities Protection Service (OUPS), says,“Our Stratus server gave us seven years of good service, but we needed toupgrade to a more powerful server platform. Although the conversion from Unixto Windows NT was somewhat painful to some of our developers, in the end it hasbeen a great move, and we are very happy with it.” OUPS has been using theEndurance 4000 series for more than two years of uninterrupted service and isnow in the process of migrating its application -- a call center for utilitiesthat pinpoints underground utilities prior to construction -- to the Endurance6200. The upgrade will provide faster CPUs and dual processors for greaterthroughput. Harris says that after about six or seven months of use on the4000, his company experienced a motherboard failure on one of the computeelements in the Endurance array. He was able to replace the motherboard andsome other components without a second of downtime. This is an example of whatMarathon refers to as being able to “compute through” a failure. Marathon’scustomers also find that support is above par and always available. OUPS willnot retire the Endurance 4000 when the 6200 migration is completed: the companywill use the 4000 as a Microsoft Exchange and IIS server.
Room to Grow
TheEndurance 6200 runs only on single- and dual-processor systems, so that limitsits scalability. Also, the Endurance 6200 does not currently run Windows 2000.Both of these shortcomings are expected to be addressed in the future but arenot available in the currently shipping product. If Marathon is able to combinethe Endurance fault tolerance and redundancy with Windows 2000 DatacenterServer’s clustering functionality, it would knock me out cold, but I guess I’llhave to wait for that. The development time for the hardware-softwarecombination to control the fault-tolerant systems seems to lag behind theoperating system by 12 to 18 months. Still, I feel the Marathon Endurance 6200is by far the solution of choice when looking for a solid fault-tolerant anddisaster-resistant server platform. It’s worth running Windows NT for a whilelonger to gain the benefits of Endurance.
MarathonTechnologies Corp., Boxborough, MA
Pricing: Preconfigured solutions with a choice of industry-standardservers start at $45,800. Other options are available.
+ True fault tolerance on an Intel-based platform
+ Split-site redundancy provides disaster tolerance at an affordable price
+ Proven track record in the industry
+ Excellent support
-Does not run Windows 2000
- Two processor limit
- Complex to install, requires significant training or the purchase of apreconfigured system