Fibre Channel Arbitrated Loop: An Adventure in Configuration
and Client/Server Labs review Fibre Channel Arbitrated Loop Technology
As networks grow more complex, sophisticated and powerful, solutions for storing, moving and safeguarding data present a unique set of challenges to system designers. One solution receiving considerable attention today is the use of Fibre Channel technology to create a "network behind the network," also known as a Storage Area Network (SAN).
The unique qualities of Fibre Channel are supposed to provide a host of benefits: increasing data access speed and reliability, moving backup tasks away from the normal data communications network, and creating server-less backup and an entirely sharable SAN. But the relative newness of Fibre Channel SANs also means caution should be taken before adopting this technology for just any application.
To get an idea of what is possible today with Fibre Channel technology, ENT and Client/Server Labs configured and tested a Fibre Channel network with products from a collection of manufacturers and software publishers.
How We Tested
Various components were installed into a small simulation network centered on two, matched IBM Netfinity 5500 servers, each running Microsoft Windows NT Server 4.0 with Service Pack 3. Each machine was configured as a stand-alone server in a test domain, connected using TCP/IP networking protocols over common Ethernet 10baseT connections. The testing was conducted in three phases. First, each disk storage product was connected directly to a server. Second, the disk storage products were connected to the servers using hub technology. Finally, the tape storage devices and the servers were connected into a shared loop using hub technology.
When testing the interaction between components, we moved sample data sets of 3 GB to 9 GB across the various devices, first using standard tools such as the Windows NT Explorer, and subsequently using backup software from Legato and Seagate. Because we were focusing primarily on features and compatibility, our testing did not use raw performance as a metric.
What Is Fibre Channel?
Contrary to what you might expect from the name, the Fibre Channel protocol does not require the use of fiber optic cabling or devices. Much, if not most, Fibre Channel implementations run on copper cabling, using common DB9 and High Speed Serial Data Connectors (HSSDC). These connections allow components to be separated by as much as 30 meters, a reasonable distance for most business uses. True distance gains do require the use of fiber optics. Using short wavelength transmission over multimode fiber optic cables, devices can be placed at distances up to half a kilometer apart, while long wavelength transmission over single-mode fiber affords up to 10 kilometers of range.
For network engineers and system administrators familiar with SCSI interfaces and Ethernet networks, the structure and organization of a Fibre Channel loop presents some peculiarities. A Token Ring network may present the closest analogy. In a Fibre Channel Arbitrated Loop, communication is not broadcast as it is in architectures such as Ethernet. It is instead transmitted from one device to the next, with each device repeating the transmission around the "loop" until the data reaches its destination.
This use of a logical ring topology is one of the reasons for Fibre Channel’s high speed. The structure eliminates the problem of data collisions and contention among devices for use of the channel. The network structure, however, can introduce some new problems. Situations where a device is receiving but not transmitting could effectively block communication between two other devices. This became unexpectedly important during our test.
Plugging It In
The first step in connecting a computer system to a Fibre Channel Arbitrated Loop is to install a host bus adapter (HBA). The process should be familiar to anyone who has installed HBAs for SCSI devices, requiring the installation of the card and the loading of appropriate driver software. In our test, we used two types of HBAs: the LightPulse 7000 from Emulex Corp. and the QLA2100 from QLogic Corp.
Once installed, we experienced two problems with the tested HBAs. One software package in our disk storage testing had a compatibility problem with the Emulex adapter, and the improper installation of a QLogic adapter turned up during our hub setup tests.
Based on what we found in our testing, it seems that while flexible, high-speed, high-availability off-network storage solutions are a reality, the true SAN is not yet here for disk storage in an NT environment. Present limitations prevent multiple NT systems from peacefully sharing multiple disk storage devices on a single Fibre Channel loop.
The first device we connected to our test network was the Clariion FC5703 Fibre Channel disk array from Data General Corp. Housing up to 20 disk drives in a large, two-part chassis, the Clariion is designed with a lot of redundancy -- from dual power supplies to dual ported drives cross-connected to two storage processors in the unit, even multiple levels of data caching and checking. The FC5703 follows Clariion’s philosophy of high-availability. At a retail price of $96,750, this is not an inexpensive system, either.
A key element in the Clariion design is the use of two "storage processors" in the unit. This allows a single unit to support two separate host systems simultaneously. Disks may be joined into logical arrays and assigned to one or the other of the two processors. Each processor can then support a single external host.
Administration of the Clariion is done using an application called NaviSphere. This software runs on any Windows NT system and communicates with the FC5703 through an "agent," which run as a service on a system directly connected to the Clariion unit itself. NaviSphere connects to the agent via TCP/IP and the agent may connect to the storage array through a serial connection or through the Fibre Channel connection to a storage processor. In an environment where two hosts were connected to the Clariion, an administrator could choose to load the agent service on those two host servers and run the NaviSphere management application from an office desktop system elsewhere.
Using the NaviSphere manager, the administrator can assign disk drives to arrays using a variety of RAID configurations, designate hot spares, check component status and perform recovery operations such as re-synchronizing disks following a failure. We found the software to be quiet powerful, even a bit daunting in the number of tasks it handled. Though there seemed to be little that an administrator could not do, finding the right spot to do a particular task wasn't always obvious. For an administrator who only needs to administer the array on rare occasions, a few task wizards might have been nice.
Next we hooked the Fibre Box from Box Hills Systems Corp. to our other NT server. This smaller unit housed eight disk drives without dual power supplies or similar redundancy features. It was also significantly simpler to install and configure. Unlike the Clariion, the drives in the Fibre Box are directly visible to an NT host as individual devices. As such, they may be configured into software arrays using either NT's Disk Administrator program or into arrays within the Fibre Box itself using Box Hill's administration utility.
Management of the Fibre Box is done through a utility called the Fibre Box Array Explorer (FBAE). This utility allows the administrator to configure arrays and monitor device status from the host system. When installing it, however, we located our first compatibility problem. We had originally connected the Fibre Box to an Emulex LP7000 HBA. This seemed to work fine, allowing us to see the drives, configure an NT stripe-set and move data back and forth. The FBAE software, though, was not able to see any of the drives and thus could not configure an array within the device. A check with Box Hill's technical support revealed that a support driver for the LP7000 had not been released and that the driver for the older LP6000 was not compatible.
We reconnected the Fibre Box to a supported QLogic HBA, reinstalled the FBAE software and were then able to fully administer the unit. Once properly installed, we found the FBAE software to be streamlined, offering few frills. We decided to leave our original four-drive NT stripe-set in place and add a four-drive RAID array under the control of the Fibre Box. Configuring the array took about two minutes, including the time needed to learn the interface.
Once we configured the array, we started an initialization process as instructed by the help screens. We were surprised that this process took over an hour and a half for our four-drive array. It did not, however, interfere with other operations, and once completed everything worked as expected.
Fibre Channel Arbitrated Loop SANs usually have a significant number of devices connected to a single Fibre Channel loop. Unlike SCSI devices, which connect in a single chain from one device to the next, multiple Fibre Channel devices are connected to a hub device, similar to an Ethernet hub or Token Ring MAU.
Once we established direct communication with the disk storage devices, it was time to bring the Fibre Channel hubs into play. Though simple hub devices are available, we elected to conduct our tests using some of the more powerful hubs, which use a component design and offer management capability.
As noted previously, Fibre Channel loops may operate on any of several different types of media. To accommodate the proliferation of connection choices, some hubs use Gigabit Interface Connectors (GBICs). Each GBIC is a module, slightly larger than a small box of matches, that slides into a slot on the chassis of the hub. The exposed portion of the GBIC presents a connector, such as the DB9 or HSSDC for copper-based connection or the dual SC for fiber optic cabling.
Management of a Fibre Channel hub may introduce some important challenges. Because a single loop can support over a hundred devices, isolating a failed component could be time consuming. Each of the hub vendors in our test presented interesting solutions to this issue.
The first hub we installed was the Rapport 2000 from Vixel Corp. Supporting up to 12 devices connected through standard GBICs, the Rapport has a slim profile and the customary mounting hardware for a 19-inch equipment rack. Multiple units may be stacked and connected, and one or more of the units may be equipped with an optional management card, which provides an Ethernet connection and supports Simple Network Management Protocol (SNMP). Management and testing functions are provided through a utility called Loop InSite, which may be run from any NT or Windows 95 system that has TCP/IP connectivity to the Rapport 2000.
When setting up the Rapport 2000 in our network, we installed a pair of copper GBICs and inserted the hub into the existing connection between one of the storage processors on the Clariion and an HBA on our first server. The server had no visibility to the arrays on that processor, though it had functioned correctly the day before. We removed the hub, but visibility did not return. We then inserted the hub into the channel of the other storage processor and discovered that this connection remained stable.
We then moved the Vixel hub back to the original connection, but still did not get data throughput. Curious, we loaded the Loop InSite application and asked it to run some tests on the connection and the devices at each end. Very quickly, the software indicated that the QLogic HBA to which we were connected was not functioning properly. It turned out that the HBA had been pushed loose during our connecting and disconnecting of cables, causing it to malfunction. As it turned out, the Vixel hub had done its job in preventing the malfunctioning device from joining the loop
Next we installed a Gibraltar GS hub from Gadzoox Networks Inc. The Gibraltar GS hub also employs GBICs to support up to 12 connected devices. An optional IntraCom Management Module provides management and configuration capability over a serial or a standard 10baseT Ethernet connection. A management application named Ventana provides advanced management and analysis capability.
The Gibraltar GS was used to establish a new set of connections for the tape devices we planned to add next. We also loaded and worked with the Ventana 2.0 management tool. The Ventana software communicates with the Gibraltar hub over the Ethernet connection and uses SNMP commands to retrieve information. Though Ventana lacks the explicit testing tools of the competing product, it presents a clear graphic representation of the device and the status of subcomponents.
We liked the fact that the system constructed a diagram of the devices connected to the loop, including location and contact information configurable by the administrator. Unfortunately, that information does not seem to respond well to the standard adds, moves and changes that occur daily in a network. When we disconnected the cable to a device, that device simply disappeared from the display. Reconnecting the cable did not restore the lost information. Removing and reinstalling a GBIC had a similar effect.
When we exited the application and returned, the information for the temporarily removed device was restored, but the information for an entirely different device was missing. This had no effect whatsoever on the performance of either the hub or the connected devices, all of which worked flawlessly during our test. But these quirks could introduce serious confusion when tracking remote equipment.
In preparation for our test of backup software, we decided to make our Fibre Channel network more complicated as we introduced the tape storage devices.
We extended our test network to connect our two servers and two tape libraries into a single SAN. To do this we planned to use the Gadzoox hub and connect an HBA on each of our test servers into the network along with a connection to each of two tape libraries. In planning our group of test products, however, we quickly discovered that there are no tape drives that directly connect to Fibre Channel networks. Most manufacturers instead use some form of bridging or routing technology to bridge SCSI interfaces to Fibre Channel.
The first unit we introduced was a P1000 tape library from ATL Products. This tape library is equipped with four DLT-7000 tape drives, a tape handling robotic assembly and storage for 30 DLT tape cartridges. We used a CrossPoint 4200 Fibre Channel to SCSI router from Crossroads Systems Inc. to establish the connection.
The CrossPoint router has two differential SCSI buses and a single Fibre Channel interface using a standard GBIC, as would a hub. SCSI devices plugged into CrossPoint are detected when the unit powers up and are given appropriate identities on the Fibre Channel loop. How those identities are assigned -- from a defined table or automatically by any of several precedence schemes -- is configurable through a simple, menu-driven interface accessed from a terminal or communications program across a serial connection.
Once the ATL library and CrossPoint router were powered up, the servers were rebooted. The NT control panel applet for SCSI devices showed the drives and library mechanism as standard SCSI targets. Tape drivers could then be loaded through the NT tape devices applet.
Next, we installed the ADIC Scalar 218 FC, which was equipped with two DLT-7000 drives, a tape handling mechanism and storage slots for 18 DLT tape cartridges.
Plug-ready for Fibre Channel networks, the 218FC is equipped with a built-in Fibre Channel-to-SCSI router, which happens to be technology licensed to ADIC from Crossroads Systems. Unlike the CrossPoint router used to connect the P1000 library, the ADIC unit does not use a GBIC for its connectivity. As with the ATL and CrossPoint combination, once the ADIC unit was installed and powered up, rebooting the NT servers gave visibility to the SCSI devices and the tape device drivers could be loaded.
A problem then arose. Because both tape libraries were being connected to the SAN with the Crossroads technology and both units were set up to use automatic assignment of Fibre Channel IDs, the IDs had a nasty habit of changing from one restart to the next. Which ID was assigned to which tape drive and to which library robotics, seemed to depend on the order in which devices were powered up. We turned to the Crossroads configuration menu to override this automatic assignment, but we found that care was needed in figuring out the proper assignments.
Tape devices are, of course, only useful when presented with data to record. Therefore our next challenge was to install and operate backup software.
The first package we installed was NetWorker 5.5 from Legato Systems Inc. Supplied on several CDs, the Legato software uses the concept of a server package and a client package to permit backups to be conducted across a network. In a system such as ours, with multiple tape devices on a single communications channel, the software is designed to establish an "ownership" relationship between a specific server and a specific tape library. One server controls one tape library, even if backing up data from another server or a client system.
Our first attempt to install the Legato software was unsuccessful. Although Windows NT could view the various SCSI devices in our tape libraries, they were invisible to the Legato configuration utilities. Consulting with Legato technical support, we discovered and rectified two problems. First, we had neglected to load the NT drivers for the tape devices themselves. Second, we needed to download and install a patch from the Legato Web site. The hardware and driver-level process of mapping SCSI devices to Fibre Channel addresses sometimes leaves gaps in the numbering sequence that would not exist with standard SCSI devices. The patch allowed the configuration utility to pass over those gaps without assuming it had reached the end of the device list.
With the corrections in place, we were able to configure the ATL tape library and drives and to make backups from both of our test servers. Neither system, however, was able to detect and configure the ADIC library. We believed this was due to the order of the SCSI IDs being assigned to the subcomponents of the ADIC unit.
The second backup package we loaded was Backup Exec version 7.2 from Seagate Software. This software, along with a component called the Shared Storage Option, had somewhat better success in dealing with the multiple tape libraries, though it too exhibited peculiarities.
During the installation process, the Seagate software also was unable to see the tape drives attached to the Fibre Channel loop. The installation program, however, recommended installing Seagate-provided tape drivers for the DLT drives to correct the problem. Once that was done, the drives became visible.
Even with this correction, though, we could not seem to reliably configure the libraries. Drives previously configured would suddenly appear to go off-line, and new drives not associated with the tape libraries would appear. It was not until the third iteration of this process that we finally realized what was occurring. As noted previously, the assignment of SCSI IDs to Fibre Channel IDs was being handled dynamically at boot time by the Crossroads routers. This meant that a device seen at one logical location might not be at that same location after a future reboot. Once this problem was resolved we learned that we could control the situation either by manually configuring the Crossroads routers or by following a carefully set boot sequence.
Having solved the riddle of the device IDs, we proceeded to configure the system for shared storage. With Seagate Shared Storage Option, the server running the software is considered to be the primary controller for managing the tape libraries and tape drive devices. It is not, however, the only system allowed to use those devices. Instead, it arbitrates among the various servers on the shared SAN, ceding actual device control to first one machine and then another.
In our tests, we easily set up a situation in which each of our servers was sending a backup job to the ADIC and ATL libraries concurrently. With those four jobs running simultaneously, a tape drive was being written to in each library while the systems passed control of the library armatures back and forth smoothly and without contention.
Today and Tomorrow
Fibre Channel SAN technology offers some great benefits, most notably speed and the number of connectable devices. The availability of longer connection distances can help with the development of creative solutions to some significant problems. Copper-based connections of 30 meters makes it practical to create storage farms housed in a centralized office location. The multikilometer distance option makes off-site backup and remote fail-over solutions more practical than ever.
On the downside, true SANs, where a single disk storage device can be made to serve the storage needs of multiple NT or a heterogeneous collection of servers, are not yet here. Far from being ready to share a storage unit, we found that NT is not even ready to quietly allow multiple servers to each use separate store units on a single loop.
With multiple NT systems connected to a single SCSI disk device, none of the systems is aware of, or tolerant of, the activity of the others. Each system may attempt to write conflicting ID information from the Disk Administrator utility. Each will also keep conflicting entries in cache memory, and will treat normal disk writing activity by the other systems as disk corruption.
Tape storage systems are much closer to leveraging the full potential of a SAN, at least from a functional standpoint. But even in this arena we found unrealized potential. Many vendors like to talk about "third party copy" or "server-less" backups, referring to the idea that backups might be conducted across Fibre Channel links directly between disk storage devices and tape storage units, without involving servers at all. The necessary technology for this concept is available today, but it is not mature enough -- yet.
In many situations and for numerous uses, Fibre Channel is a useful -- albeit maturing -- technology. Many businesses are investing heavily in it and will continue to do so. It appears that the technology’s growth in the near future will be explosive, both in terms of capabilities and user adoption. More importantly, it appears that growth is going to be more orderly, with more refined standards leading to better interoperability and more stable products than in years past.
For now, the buyer still needs to beware: Chose your components carefully and check reference accounts from potential vendors to make sure what you hope to configure is not ahead of the technology curve.