You Can't Manage What You Can't See
Looking into a SAN is murky business -- and usually unsuccessful.
Last week, I spoke with Mark Urdahl, CEO, and Jim Murphy, vice president of marketing at Virtual Instruments, a spin off from Finisar, maker of probes and software for monitoring and troubleshooting Fibre Channel fabrics. Virtual Instruments will continue to pursue the development, marketing, and sales of NetWisdom, a well-designed FC fabric monitoring solution.
Mark has a pedigree in Fibre Channel. He was instrumental in helping set up the FC Systems Initiative while at IBM in the late 1990s and was one of a "rebel" faction at IBM in Austin, TX when Fibre Channel was developed and submitted as a standard, competing with Serial Storage Architecture (SSA) for use in connecting RS6000 "Cray on the desktop" workstations to storage. He said that FC was developed to meet a challenge presented by high-performance computing environments in 1992 and 1993, and added that "No one conceived that it would eventually be put to use to handle thousands of switches, servers, storage arrays, and other gear in a fabric SAN."
Given his history at IBM, then Ancor Communications (an early FC switch maker), then at Finisar, Urdahl's commentary and expertise on FC fabric technology notable. However, given the nature of the solutions his company sells, Urdahl must walk a fine line between criticizing the hype that surrounds FC "SANs" and working with vendors who sell the gear that makes a fabric. Ultimately, he says, his talking points are aimed at the consumer (who needs better management of fabrics) and at the industry (which needs to acknowledge the management blindness that their FC products introduce, even if they solve other problems).
"The core problem of Fibre Channel," Urdahl noted, "is that you can't see inside FC. Unlike Ethernet, Fibre Channel is a black box." He adds that vendor hardware-specific tools don't provide the solution. Citing Hitachi Data Systems’ (HDS) tuning manager as an example, he said to use them requires that you stop other applications using the fabric. The outcome is a trial-and-error approach to troubleshooting and tuning storage resources that takes too much time and effort.
"CIOs need a way to determine what is going on in a SAN. With petabytes of data to manage, no one has time to keep up. The need is for alerts about what is happening in the SAN and how it relates to the applications that are supporting the business."
He noted that Finisar focused on granular tests and measurements, but questioned whether this is really the functionality that the market wanted. "Even tech guys aren't interested in queue depths. Finisar started with an analyzer 'probe' that would take a drink from the FC hose. You had to be an expert to use it."
Since spinning off NetWisdom to Virtual Instruments, Urdahl reported, more attention is being paid to simplifying collected data into dashboards that are easier to understand and interpret. To correlate what is happening in the fabric with what applications are doing, many sources of data are used, including standards-based data provided by vendor SNMP MIBs and proprietary sources such as virtual server data from VMware. NetWisdom acts as a universal metadata collator that enables administrators and IT managers to set meaningful thresholds and issue alerts when they are crossed. Urdahl thinks that this kind of information can have a real impact on disaster prevention.
"The CIO presumes that SAN architecture is reliable. The truth is that the frequency of failures in SANs is quite high," he said, and the time to rebuild and fix SAN problems is often protracted.
He did not cite specific data on this point, but a 2005 survey of 680 companies in a popular disaster recovery publication suggested that SAN outages were the third leading cause of downtime in tech shops, just after events such as hurricanes and WAN outages. Interestingly, the survey showed server failures to be the fourth leading cause of downtime.
This data point, perhaps the first of its kind, was significant because early adopters of SANs were led to believe that SAN architecture was a hedge against loss of data accessibility resulting when servers failed and their direct-attached storage became inaccessible. It now appears that SANs fail more often than servers in many shops.
Urdahl also noted that the lack of manageability of Fibre Channel fabrics accelerated their costs significantly. He said, "We are overprovisioning SANs today because we can't manage FC fabrics efficiently. The latest numbers show that companies are deploying a lot of SAN ports at $50K per port that are being used at less than 15 percent of capacity. Better optimization could be realized with greater visibility into configuration and traffic."
Such visibility, he said, would also enable companies to better tier their storage in a fabric, "Using Clariion, for example, instead of a Symmetrix, saves about $10K." To decide what platforms to use to host which data in a fabric, he offered, requires "better visibility into the end to end linkage between applications and storage at the I/O level" -- something that fabrics do not provide without management tools like NetWisdom.
Smart companies, such as Bank of Scotland, leverage NetWisdom to drive out costs and downtime from their extensive FC SAN, he noted. "They won't deploy anything without being sure that it can be managed or monitored using NetWisdom." To Urdahl's thinking, this goes beyond the technical concerns about resource utilization efficiency and uptime to bigger business concerns about compliance. If regulatory requirements mandate safe, secure, and disaster-proof storage of data, placing it on an unmanaged SAN may well be a violation of both the letter and spirit of the regulations.
Now, that's something you don't read in the brochures of FC equipment manufacturers. Says Urdahl, "Well, everyone is guilty of a little oversell. I usually find that hardware vendors get it when I suggest that improving visibility increases the utility and longevity of Fibre Channel technology in the marketplace." In the long run, he says, that is better for everyone's sales models than hyping FC SANs as unbreakable and self managing, which they most certainly are not.
Given the pushback from consumers, reported here and elsewhere in the trade press, regarding Storage Resource Management (SRM) software -- that it takes too much effort to deploy and inundates the administrator with too much uncorrelated data to be of much use in actual day to day work -- Urdahl does not want to call NetWisdom an SRM tool. "We want to create a new category of product to describe what we do."
Indeed. Perhaps SAN Metrics would be the new category. That would fit with the unofficial tag line for the new company, "If you aren't measuring, you aren't managing."
Virtual Instruments is worth a look. Your comments are welcomed at firstname.lastname@example.org.