Getting the Real Data about Your Data
SRM is probably the most important investment that storage managers can make, but it’s riddled with problems.
In a recent telephone interview with Finisar, makers of storage protocol analysis and traffic generation equipment, I was introduced to several storage terms I hadn’t heard before. One of these new expressions was “fabric blindness,” which Finisar’s spokesperson referred to as a condition that occurs far too often with Fibre Channel “SANs” when fabric complexity, switch, storage, and host bus adapter heterogeneity, software- or hardware-enabled virtualization, and technology-change-over-time combine to make it impossible for a storage administrator to understand the cause of any “SAN” issue he is facing.
Simply put, according to the fellow, “You don’t know about issues that you can’t see (at the protocol level, for example, or in equipment firmware) and you don’t know what to do when things go very wrong. That’s fabric blindness.”
I’m not kidding. Finisar, the company that cajoled an industry to embrace short wave fiber optics as a lower-cost transport for FC Fabrics, thereby helping to encourage the adoption of SANs as a one-size-fits-all storage topology in larger companies worldwide, was now cautioning about fabric blindness. As the presentation continued, I heard additional expressions such as “brown outs” and “black outs” to describe other Fibre Channel foibles. I didn’t ask for technical explanations of the terms because he gave me a few war stories to describe what he meant.
In one recent case, he said, Finisar was contacted by a customer who was dealing with significant performance problems on his SAN, which comprised hardware from several vendors—each of whom was pointing an accusatory finger at the others to explain the problems. Finisar went directly to the wiring, deploying “taps” (which they use as an acronym for “traffic analysis points”) to “listen to the conversations” between servers, switches, and storage devices. They quickly discovered that the brown out (I’m assuming that means fabric performance slowdown) was being caused by a host bus adapter that wasn’t properly handling its communications. The HBA was changed out. The problem was resolved. Case closed.
Another case involved a customer having a blackout (fabric completely down, I’m assuming). They had some Finisar equipment in their shop, but they were confronting a lengthy delay before a vendor could come out, install the gear, and troubleshoot the problem. Finisar walked them through the steps on the phone for deploying taps, then helped them to interpret the data they were seeing. Again, the problem was resolved quickly once the errant device was discovered.
Bottom line: Finisar is moving up the food chain in storage today—from a provider of test and analysis gear to a management software vendor. Their NetWisdom Enterprise software correlates information from taps, and also takes data from their ProbeV switch monitoring software and ProbeFCX appliance, to provide an on-going wire-level inspection of the health of the fabric. Their key challenge is to convince companies to buy their wares and to implement them as an ongoing source of proactive intelligence on the health of their SANs.
A key obstacle in their path is their dependency on original equipment manufacturers, their primary path to market and their bread and butter, to help promote their sales. In a sense, their products run afoul of vendor marketing around SANs—the idea that customers might be smart to deploy fabric monitors suggests that all is not right in the house of FC fabrics, and that they are more prone to failure (fabric blindness, brown outs, and blackouts) than the vendors might have us believe.
Bottom line: Finisar has some great tools that, while not inexpensive, would certainly pay for themselves in terms of fabric visibility and the reduced downtime that vigilant monitoring and proactive management enables. To succeed in their efforts to build out their product offering, the company will need to do some diplomatic dancing to avoid upsetting their channel partners while convincing consumers to shell out bucks for their useful tools.
Northern Software, by contrast to Finisar, doesn’t confront channel obstacles, only consumer reluctance to invest in and deploy third-party storage resource management (SRM) software. A couple of weeks ago, they took a novel approach to demystifying SRM: they announced that a capacity analysis utility was available for free download from their Web site. The idea is to give consumers “a taste of SRM,” according to Lee Taylor, product manager for enterprise storage, who spoke to me from Stockholm (the company’s U.S. headquarters is in Tampa, FL).
The free utility can perform a scan of a designated storage path, discover the storage volumes available, and return some data on capacity utilization, types of files, their ages, and whether they are orphaned (belonging to accounts that are unknown or obsolete according to active directory). Said Lee, “You just point the utility toward a storage path, drink some coffee, and wait for the response. The process takes about five minutes.”
It’s a good introduction to Northern, whose Northern Storage Suite has enjoyed a quietly profitable sales record for several years. However, it is also a good introduction to the vicissitudes of SRM.
Obviously, many storage administrators face a lack of insight into the status of storage devices and their capacity utilization. Often times, admins build a repertoire of scripts cobbled together over time to collect whatever information they can glean about the arrays they have deployed.
SRM vendors such as Northern, CA, Symantec/Veritas, Tek Tools, and others have long argued that such a “quiver of arrows” approach, while a statement to administrator tenacity, is the wrong approach. Scripts are rarely documented and knowledge about their use must be handed down like so much oral history when a storage admin passes the torch to his or her successor. The real deal, they argue, is a professional, well-documented, and consistent view offered in the form of shrink-wrapped software and backed by a cadre of developers who keep it up to date with the latest storage devices.
Such platforms have a cost to acquire and deploy that make them difficult to sell to the bean counters. Moreover, deploying SRM generally takes a bit more work than brewing a pot of coffee. Northern’s freebie underscores some of the challenges.
For one thing, SRM tools only see what the equipment vendor allows them to see. In some cases, they can only see the volumes that are advertised to servers and applications, which may not be a reflection of the actual capacity of the array for many reasons.
Some vendors carve out a fairly sizeable chunk of their arrays for their own use. It is termed “reserved but not allocated” and more than a few consumers have bristled when they discovered how much of their expensive storage capacity is being held in reserve by their vendor. In the case of several name-brand vendors, such reserves cannot be seen by SRM tools but only by using utilities that the vendors make available only to their system engineers.
Bottom line: SRM might not capture a complete picture of array capacity versus “customer usable” array capacity (you might hear the expression “T bits”—the technical capacity of the array—versus “B” bits—“business bits” or how much of the array capacity the vendor lets the consumer use.
In addition to this bit of slight of hand, there are increasingly a set of value-add processes on arrays, referred to collectively as “thin provisioning,” that mask the actual capacity of the array as a matter of function. Vendors argue that thin provisioning enables companies to operate as though they have 50 Terabytes of usable capacity when they actually have only 20 Terabytes. Thin provisioning is voodoo that helps them avoid over allocation and under utilization.
Thin provisioning might be just what the doctor ordered in some cases, especially given the massive amount of waste you find in many shops where capacity has been allocated to application bosses or database admins who never fully used the space provided to them but reserve it just the same.
That isn’t a knock on DBAs, of course. At a retail company I visited recently, DBAs asserted that they routinely ask for more capacity than they needed because it takes quite awhile to get additional capacity signed, sealed, and delivered via internal approval processes. Some of this has to do with bureaucracy, of course, but it also has a lot to do with cost and obtaining approval. Their last capacity upgrade to a Hewlett-Packard-branded Hitachi Data Systems array cost $53,000 for two Terabytes (one Terabyte usable)—about the price of a Lexus.
Thin provisioning might help redress some of the capacity reservation issues that companies experience, but it can also obfuscate the measurement of real capacity to SRM tools. Bill Chambers, CEO of LeftHand Networks, which offers thin provisioning as a value-add feature of its iSCSI storage clusters, says that he can see that such a problem could present itself, but offers that LeftHand provides actual capacity data on its SAN/iQ cluster management software and in the form of an SNMP MIB, usable by any SRM product, to help redress the problem. Doubtless other thin-provisioning array manufacturers would make the same claim.
Even in a perfect world, where vendor space reservation and thin provisioning functionality did not obscure the view of actual storage capacity, you still have the issue of storage manager tricks. At the same retail shop, I was told by one very bright storage admin that he routinely made disk images of software CDs and stored them out on his arrays. “That way,” he explained, “when I get an urgent call about space problems in the middle of the night, I can just log in remotely, delete a few of my disk images, and free up space for use.”
Nifty strategy though that may be, your standard SRM tool has no way to discriminate between capacity that is being used by legitimate data and that which has been allocated to storing disk images so the admin can get some uninterrupted sleep. This is only a small sample of the amount of capacity wasted by stale data, multiple data copies or versions of files, and “contraband” data (for example, someone in HR thinks that his next wife’s last name is JPEG and he has built up a sizeable cache of pictures of her collected from all over the Internet).
While SRM tools are getting better at detecting such junk data, they generally require the manual setup of policies—a process that is about as enjoyable as configuring a spam blocker or junk filter in your e-mail inbox. It makes for another nail in the coffin in efforts to cajole consumers to invest in SRM.
Still, for all these foibles, SRM is probably the most important investment that storage managers can make. By purchasing an SRM suite, then using manageability with the tools a criterion in storage platform purchase, storage managers have a greater chance of being able to contain storage costs and reduce downtime.
That’s my view. I am always interested in hearing yours: firstname.lastname@example.org.