The Holy Grail of Storage Efficiency (Part 1 in a Series)
Why is storage efficiency so abysmal?
This summer I will be moderating a "supercast" for ESJ.com and 1105 Media on storage efficiency -- a term that has suddenly become a recurring theme in the latest marketing messages of many storage vendors. In preparation for that event, it seems appropriate to develop this subject as a series of columns to help all of us to understand what storage efficiency means and to address the not-so-subtle thesis at the argument's core: that data storage, as done today, is terribly inefficient.
Storage efficiency can mean different things depending on whom you ask. To the IT architect or engineer, efficiency comes down to a simple calculus -- a ratio of input and output. Efficient storage infrastructure delivers great amounts of valuable output based on a limited amount of input. To the engineer, storage infrastructure efficiency is evaluated in much the same way as you evaluate an automobile -- the focus being on performance and tuning. The goal of efficiency is to refine the infrastructure model so that it delivers performance to meet the needs of applications in a consistent and predictable way and so that it avails itself of monitoring and metrics-based analysis on an ongoing basis.
To the storage administrator, the engineering view may seem a bit too geeky to capture the real meaning of efficiency, which comes down to operational efficiency. Administrators link the idea of efficiency to a set of processes, augmented by automated tools, that enable a lot of work to be performed at the end of the day with the least amount of human effort.
The reason is simple. Generally, storage administration is a poorly staffed activity. In fact, these days, the job of administering storage often falls on the shoulders of server administrators because the storage staff have been downsized out of existence. Their view of storage efficiency is concerned less with the nuances of infrastructure design than on the practical need to resolve storage problems quickly, the first time, whenever they arise.
The administrators I talk to are concerned that storage downtime reflects negatively on management's view of their job performance. Repeated "disk full" errors that interrupt applications or dismounted file folders that elicit aggravated help desk calls from end users will, over time, cost them their job. Worse yet, such events could encourage management to take seriously the articles they see in Forbes, Barron's, or The Economist suggesting that outsourcing everything to the cloud is the one true path to IT cost-reduction and service improvement.
That takes us to senior management's view of storage efficiency. They see the world in terms of a simple business case. You can think of this in terms of a triangle, much like the diagrams featured in every issue of Harvard Business Review, bounded by the terms "cost-containment," "risk reduction," and "improved productivity." From this perspective, storage efficiency comes down to driving CAPEX and OPEX cost to the bottom line while promoting top line growth. Viewed this way, storage represents a horrible investment to management: while they understand that hardware acquisition costs will grow in response to the business data being captured and stored, they see little indication that the investments they are making in storage infrastructure are the right ones, let alone that storage resources are being used and managed in a way that delivers the best return on investment.
Such concerns are reinforced by reports in the business trade press and the steady stream of commentary from vendor sales teams that routinely circumvent the back office (where IT lives) in an effort to sell their wares directly to the business decision-maker. Consuming 33 to 70 cents of every dollar of the IT hardware budget, storage looks like a big nail in search of a cost-cutting hammer.
Defining storage efficiency creates a situation reminiscent of the old story of the blind men and the elephant. In that tale, each man touches a different part of the animal -- a trunk, leg, ear, etc. -- and believes it to be a different object -- anything but an elephant. Like that conundrum, each of the three perspectives on storage efficiency is merely a partial description of the same beast. Combining the perspectives, a consolidated description emerges: efficient storage comprises the smooth operation of well-designed and highly manageable infrastructure that delivers measurable business value in the most cost-effective manner.
Unfortunately, in most organizations, storage isn't efficient. It's a mess and, by most estimates, getting messier. That assertion, which has been underscored by multiple reports from server virtualization advocates, is likely to receive nods of agreement regardless of whom you ask in the business.
In distributed computing environments, as has been pointed out here and by analysts time after time, storage is a very poorly leveraged resource. Industry insiders claim that storage allocation efficiency hovers at less than 17 percent of the optimal level. In normalizing the findings from storage assessments conducted in more than 10,000 large, medium, and small companies, we reported last year that roughly 40 percent of the space on every hard disk deployed in a firm is used to host data of that is of archival quality -- which is to say, data that could be offloaded to tape or some other low cost archival media. Another 30 percent of the space is occupied by what could be characterized as junk data and orphan data that could be deleted outright, and space that is being held back by vendors in the form of "hot spares" or "capacity reserves."
Reclaiming the space you already own and returning it to productive use would require better data management and archiving. Unfortunately, data management isn't even among the tasks that most storage administrators believe to be within their pay grade. Their hands are full managing capacity, performance, and data protection processes -- the three tasks that define our concept of "storage management" today.
As a result of unmanaged data growth, the demand for increased capacity keeps climbing. In a recent discussion with an IT manager at Graybar, he noted that his capacity requirements have doubled year over year behind the company's SAP enterprise resource planning implementation. This is in line with IDC's claim that every firm will grow its storage infrastructure capacity by an average of 300 percent over the next three to five years.
Most of this capacity growth will occur in the absence of intelligent data management and without any sort of coherent or unified storage resource management capabilities. Without meaningful hardware management, the difficulties in managing capacity growth -- and, of course, the total cost of ownership (TCO) -- which includes "soft costs" such as administration, warranty, and maintenance, environmental controls, and backup -- is skyrocketing. Gartner says that storage TCO is 4 to 8 times the acquisition price of storage kits on an annual basis.
The industry seems to like it this way. As a stopgap against inadequate management, consumers are advised to purchase warranty and maintenance contracts with their gear, providing a lucrative annuity for the vendors. The price to renew these agreements when they reach maturity is typically equal to the original purchase price of the rig itself, again according to data from IDC. Too often the warranty is framed as a hedge against TCO expense.
To be sure, there have been efforts over the past two decades to develop a common management model for storage infrastructure, the latest being the Storage Networking Industry Association's Storage Management Interface-Specification (SMI-S). As I have documented in my columns, SMI-S adoption by vendors has been halting at best. Part of the explanation from vendors is the technical challenge and effort required to implement the standard. Another reason for such lackluster adoption is that storage resource management has not been a hot-button issue for consumers, who rarely make adherence to a common storage management approach one of the top ten checklist features when they buy equipment.
Management pain and high TCO costs are exacerbated by a heterogeneous infrastructure, of course. Administrators must endeavor to manage capacity, performance, and data replication/protection processes using multiple point management products that quickly become unwieldy. Without unified management, analyst proclamations, such as Forrester's recent statement that buying all storage from a single vendor is the only certain way to drive down storage infrastructure cost of ownership, might seem to make sense.
The problem with the idea of "better management through the single sourcing of storage infrastructure" is that it doesn't address the core problem of storage efficiency. (The same applies to the idea of outsourcing storage infrastructure from a cloud storage provider, by the way.) This idea embodies a lock-in strategy that eliminates any leverage consumers may have in bargaining over hardware prices. Single-sourcing also negates the consumer's ability to take advantage of best-of-breed technologies if these technologies do not come from the chosen vendor. Single sourcing likewise implies that all of the lines of gear offered by a single vendor can be managed using the vendor's own unified management utility software, which is typically not the case.
For their part, vendors have stepped up with their own storage products that portend to address the problems of capacity management through on-box functions such as thin provisioning, or to tackle issues of performance management with controller-based technologies such as sub-LUN tiering (using Flash SSDs to augment disk storage by temporarily relocating "hot" data to Flash, then moving it back once it cools), or to solve data protection management issues with automated box-to-box mirroring (provided all boxes have their name on the bezel). As appealing as it may seem to automate these functions on an array controller, the functionality tends not to scale.
Moreover, all of these value-add features (1) add complexity to the array controller, increasing the likelihood of downtime due to software faults, (2) make capacity management of the infrastructure as a whole even more challenging, especially when services do not scale and the only way to grow capacity is to roll out another box, and (3) drive up acquisition price of each rig.
TCO is hard hit, too. Adding capacity to a value-add array requires the extension of value-add software licenses and warranty and maintenance contracts, driving up the cost by more than 100 times. To paraphrase a recent observation from a consumer, there is something intuitively wrong when adding two 1TB SATA drives to an existing rig -- drives that can be purchased at retail for less than $200 -- costs $53,000!
All of these issues contribute to abysmal storage efficiency. They conspire to drive up the cost of acquisition and ownership, increase risk of outages and investment obsolescence, and reduce data availability and responsiveness in ways that impact users. What's needed is a strategy that will help companies improve the efficiency of their storage.
We will begin exploring the options in upcoming columns, culminating in a Supercast in mid-Summer. Stay tuned.
Your comments are welcome: firstname.lastname@example.org.