In-Depth
Clustering: What Some Vendors Don’t Tell You
When managing data I/O becomes problematic, many network managers look to server clustering technology as their solution. In the chicken-and-egg world of vendor marketing and user need, clustering has achieved "hot topic" status.
However, in the hype that clouds clustering solutions, we can dimly make out the actual problems that it addresses. The hue and cry for server clustering is to handle exploding storage requirements, enable faster data access and increase data availability. In order to accommodate this, it uses the following mechanisms: To expand along with increasing data and speed requirements, the unused server system capacities and bandwidth on selected network servers is harnessed. Because a clustered server system involves multiple servers, a single server crash will no longer affect data accessibility.
As vendors rush to custom-design clustering solutions within these guidelines, various architectures are available with varying degrees of efficacy. A hungry man has many options for what to eat; not all options are nutritious or satisfying.
A network manager’s dream is a networked server solution with no single points of failure, where end users have no downtime (continuous access to data and applications). In order for clustering to make this dream come true, it must resolve several issues.
Making Every Link in the Chain Strong
Depending on individual needs, a variety of vendors manufacture appliances to target specific points of failure. With multiple points of failure, solutions must provide simultaneous coverage and still be cost-effective.
There are several high-availability configurations with no single points of failure within the data subsystem. How well the solution resolves multiple points of failure is a larger concern to the network manager. The ideal solution may need to take into consideration any number of these common failure points including the server, operating system, applications, CPU, RAM, disk drives (failure is minimized by mirroring or RAID 5 configurations) and disk controller (RAID 5 controllers do not commonly support dual adapters). On external RAID subsystems the SCSI fan out – multiple channel SCSI cabling connection – is too common, even when using redundant RAID controllers.
Also, redundant RAID controllers often share a common battery that is, itself, a single point of failure:
- Power supplies (failure is minimized by redundant UPSs on redundant circuits)
- Fans (if a fan fails, CPUs and drives can fail due to improper cooling)
- Network interface cards (a path must exist from a client to the server, or it is effectively down)
- Multiple locations (failure due to power supply outage, fire, flood, earthquake, etc., is minimized by having backups in separate geographic locations)
When using clustering to "fix" all evident problems in a storage network, the true solution is to provide continuous access to all data and applications. In a multi-server environment when one server is down, other servers must fill in for the failed unit to fulfill all data/application requests.
Clustering Architecture
Clustering goes beyond hardware installation. In a complete clustering solution, the proper hardware and software must be combined to resolve network issues; otherwise, the system will be down at least part of the time.
With the correct configuration of hardware and software, each of the failure points can be compensated for and data/application availability can be maintained. Clustering also allows the ability to change components of the system without affecting users. A network manager can sleep more soundly knowing that when one node fails, access to files, printing and other applications is automatically supported on a different clustered node.
Unlike RAID, where a number of vendors claim to manufacture a single, superior product that meets all needs, effective clustering solutions depend on a collection of components that must suit individual uses.
Different clustering solutions are distinguished by considering each single component and how the complete solution operates together. A common multiple-node cluster to provide continuous data/application availability includes the following configuration:
- Two or more servers
- A server operating system
- A multi-initiated disk subsystem that supports two or more servers
- Clustering/automatic failover software
- Multiple network adapters for "heartbeat/keep alive" networks and client networks
However, a system that provides connectivity to one or more servers and increased disk capacity is not, in itself, a complete cluster. Critical to proper operation are cluster software, cluster-capable servers, clustering interconnect and a disk storage solution. Without the most knowledgeable I/O engineer on their side, network managers may be surprised to find that certain elements of the clustering solution itself can create a potential point of failure.
Currently, NT and NetWare seem to be the primary markets for failover technology. File access and print services must commonly be supported for failover and are critical to proper operation. The software vendor and clustering architecture in place each make a difference when implementing additional applications.
Manufacturers must understand that their varying clustering architectures inherently affect hardware connectivity for the user. How much hardware can be shared and how much hardware must be duplicated? How many nodes per cluster are supported?
Software has to provide for the creation and management of clustering with manual failover to let the user upgrade, add software, change hardware or perform other maintenance with data/applications still available on another node. Under an automated failover system, additional cluster software must be purchased and implemented to allow the same flexibility as manual failover.
Cluster-Capable Servers
Clustering is impossible for some systems due to I/O bottlenecks and a finite number of PCI slots on a finite number of buses. These are vital considerations for network managers who know their clustered server must, for example, have a minimum of six PCI slots: one for a hard drive controller for booting, one for a tape backup subsystem, one for a disk that is shared with another server, one for connection to the other server’s shared disk, one for a network card to connect to users and one for a network card to connect to a dedicated heartbeat/link to the other clustered servers.
Clustering solution vendors know that too much activity on a particular slot creates an I/O bottleneck. By current design, PCI (v2.1) only allows four-full speed PCI slots (33 MHz/132 MB/s) for each bus without a bridge. For most PC servers with "dual-peer PCI," a bus has a secondary bus with either ISA or EISA slots, along with an embedded real-time clock. The clock identifies interrupts, like the keyboard, and all 8 MHz devices slow a bus dramatically.
Clustered NT solutions may be used by servers like NetFrame 9000 (4 CPUs, 3 PCI buses, 8/16 PCI slots); Amdahl/Fujitsu M700I (6 CPUs, 2 PCI buses, 6 PCI slots) or Tandem/Compaq 7000 (4-8 CPUs, 2 PCI buses, 9 PCI slots, plus 2 EISA slots).
Although clustering keeps data available, in failover mode data/application access is still slowed. Redundant system components then become an important aspect of the clustering solution. To remain operational, a system may also need to include hot-swappable components such as fans, power supplies or drives, for example. In the event a component fails, hot-swappable elements allow for access while the component is fixed and reconnected to the system.
Clustering Interconnect
There is much to be said for the current preference for fibre channel interface, usually 200 MB/sec (100 MB/sec full duplex) data transfer protocol with as many as 126 nodes per loop; a node can be a disk or a server. Additionally, a built-in fibre channel interface used in clustering solutions can cost significantly less in the long run.
At present, only a few manufacturers offer clustering solutions with redundant host adapters, leaving the operation vulnerable to an additional point of failure. When considering an investment of this magnitude, network managers must be informed that the most trouble-free solutions have as many as four redundant host bus adapters.
However, when a fibre channel "fabric" connection is in place, the fabric itself becomes a point of failure. A point-to-point fibre channel connection, on the other hand, enables instant re-routing of data paths without needing to fix the fabric immediately, thereby enhancing maximum data availability. The problem with typical SCSI disk connections is that the network is limited to 13 drives per SCSI channel, with 15 SCSI IDs using SCSI Wide, one ID used per controller. The use of SCSI interconnect also requires bus termination. If a node fails, the SCSI buses’ termination point changes or even disappears, which usually causes disk array failure.
A point-to-point fibre channel connection’s success is based on a design that supports up to 64 drives per fibre channel adapter (with 18 GB drives, this allows for more than one TB of data per PCI slot used). The system also has eight slots for host connections, allowing the connection of either eight concurrent servers or four servers with dual path connections. It is engineered to connect to any server that has a free PCI (v2.1) slot, which allows an otherwise unclusterable server to be interconnected. With many other external disk subsystems, a company’s existing servers cannot be interconnected and new cluster-capable equipment must be purchased, significantly raising the cost of the solution.
Users need to make sure the clustering interconnect is load balanced to reduce the occurrence of I/O bottlenecks. With load balancing, I/O is constantly distributed to the least-used host bus adapter. Another important aspect to consider is the ease of adding security and sharing features; a large distance between hosts allows these options to be implemented reasonably and inexpensively. The solution should be tested with multi-mode fibre to one km between hosts and with single-mode fibre to 10 km between hosts. At a greater distance, shared storage can also be flexibly designed.
Despite the preference of some manufacturers to promote an Active/Passive cluster, this configuration only allows one-way failover. When failure occurs, the passive server – the designated target – assumes the identity of the active server. In this Active/Passive configuration users that were connected to the passive server lose their connection and are not able to access data until the active server is restored.
In an Active/Passive cluster model the second server has a mirror of data, but does not share access to all equipment. Again, the user is obligated to purchase twice as much hardware as will actually be used. When all nodes are active, all equipment is used on a dynamic basis and the total cost of implementation is lower.
About the Author: David G. Swift is a Senior Systems Engineer for XIOtech Corporation (Eden Prairie, Minn; www.XIOtech.com). David is certified as MCSE, MCPS, MCNE, ECNE, CNI, CNE, CAN, OS/2E, LSE, LSA and AIX-CSA.
***
SIDEBAR:
Getting Ready for SAN: How to Intelligently Begin Implementing a Storage Area Network
By Don Peterson
With the storage area network market in its infancy, some companies are unaware of how to begin working toward the goal of high-performance, continuous access solutions for distributed networks.
While continuous access to data is essential, building this type of network can be confusing and cumbersome, if you don’t have the right tools and technologies. Although it is a relatively new idea in distributed networks, the concept behind SAN is nothing new. All mainframe computing is done with one central storage repository, the DASD (Direct Attached Storage Device). Now this architecture is moving to the enterprise.
SANs enable storage to be taken out of the server and pooled so that the storage can be shared by multiple servers without affecting system performance or the primary network. An Enterprise SAN takes SAN technology one step further, in that it also includes universal data access, fault tolerance, a high degree of software integration, and the same scalability and flexibility characteristics required in a centralized enterprise.
The benefits of implementing an Enterprise SAN are many, including:
Increased application availability: In a SAN, storage is external to the server, making it independent of the application. This allows storage to be accessible through alternate data paths, significantly reducing, if not eliminating, application downtime.
- Easier centralized management: SAN configurations allow centralized management of volumes instead of individual drives, reducing time and costs.
- Greater flexibility: SANs allow higher performance, more scalable, more reliable and more serviceable configurations.
- Disaster protection: SANs offer cost-effective implementations of remote mirrored arrays. This is not only valuable for disaster planning, but also for data transfer, backup and recovery, vaulting, Y2K applications and data exchange with remote sites.
One of the hottest new trends in network computing, and a key application for SANs, is clustering of servers to provide automatic failover in case one server fails.
SANs are ideal for these clustering applications because the storage is shared. Other key applications for SANs, because of its interconnectivity and performance, include disaster recovery, data interchange, data protection and data vaulting.
How to Get Started
A SAN is a high-speed channel that establishes a direct connection between storage elements and servers or clients. The SAN can be interconnected using loop or fabric topologies via hubs or switches. Implementing a SAN is not usually a plug-and-play operation. The wide variety of products from different vendors often exposes interoperability problems. Finding the right SAN configuration for your application can be confusing.
The following elements are necessary to implement a successful SAN strategy:
- Centralized Storage: In order to have universal data access, storage must be centralized and shared by many heterogeneous servers.
- Availability: In a continuous access SAN configuration, storage and servers must be able to be added online.
- Manageability: Storage must be managed in a centralized management console, with features similar to a LAN manager, such as security, configuration, performance, etc.
- Upgradeability: SAN storage must be able to take advantage of newer, faster technologies in order to achieve a greater return on investment (ROI).
Implementing a SAN configuration does not come without some expense. However, once implemented, a SAN offers a significant reduction in management costs, increased performance, continuous operations and universal access. The cost of downtime varies, but is inarguably expensive for any company. Average financial impact of a system failure per hour can be staggering – from $27,000 to $6.5 million.
A SAN is essentially a high-speed subnet that enables storage to be taken out of the server and pooled, so that storage may be shared by multiple servers, without adversely affecting system performance or the primary network. By centralizing storage, it is possible to improve the availability of that application’s data (read: less downtime) to the end user. The storage is external to the server, making it independent of the application.
The most important SAN benefit is accessibility. The technology enables users to share storage and server resources, not only within the SAN, but also across the LAN and WAN. Not only is there an increase in data availability, but in application performance.
Clustering servers to provide automatic failover results in more robust and accessible data storage. The basic building blocks of a clustering setup are simple and include two or more servers, a multi-initiated disk subsystem that supports two or more servers, clustering/automatic failover software and multiple network adapters for a heartbeat/keep-alive network and a client network.
Those installations with a commitment to disaster preparedness will appreciate a robust SAN architecture with reliable clustering features. SANs can also offer cost-effective implementations of remote mirrored arrays. This is valuable for disaster planning, data transfer, backup and restore, vaulting and data exchange between remote sites.
Avoiding the Pitfall
The pacing item slowing the arrival of full-figured mainframe class SANs is interoperability. In order to implement a SAN, the integrator has to juggle fibre channel-based disk arrays, hubs and switches, host bus adapters, tape devices, bridges, storage management software and backup software. All these components have to work together.
With this kind of complexity, mixing and matching components from different vendors must be done with extreme care. What works in a lab may not work in the field. But those who need the functionality of SANs need not wait. Because of interoperability, many companies are assembling packages of pretested, pre-qualified SAN storage components.
One offering that has gone well beyond the component level is the MAGNITUDE from XIOtech (Eden Prairie, Minn.). The "SAN-in-a-Box" approach provides some immediate gratification for installations that immediately require a SAN.
The XIOtech MAGNITUDE is a centralized storage system that provides parallel processing and effective queue management to keep the I/O optimization high. The servers connect through PCI fibre channel adapter boards in either a point-to-point topology or a fibre channel arbitrated loop (FC-AL) topology to attain sustained throughput of over 80,000 I/Os per second. MAGNITUDE supports up to 192 heterogeneous servers and over 3 TB of data.
Integrated hardware means little without stable software to manage the SAN. MAGNITUDE incorporates a storage architecture and software known as Real Time Data Intelligence (REDI). The software combines the performance of many individual disk drives and shares the total available performance with every attached server. The company also offers a clustering software package that enhances system reliability.
Toward Acceptance: Software and the O/S
There are also software products that will bridge the chasm of diverse operating systems. Mercury Computer Systems and Transoft are already selling O/S extensions that add up to "SAN operating systems." It stands to reason that if LANs have operating systems (the NOS), there may be SAN operating systems in the future.
International Data Corp. estimates a single storage administrator can manage 7.5 times more data on a SAN than on a decentralized storage system. IDC predicts that by the year 2003, SAN will push annual sales of storage arrays to $10 billion, SAN hubs to $800 million and SAN switches to $1.6 billion.
About the Author: Don Peterson is a Senior Product Manager at XIOtech Corporation (Eden Prairie, Minn.), responsible for strategic planning and new products.