The Holy Grail of Storage Efficiency, Part 3
The impact and importance of virtualized storage and the cloud.
In a previous installment of this series, I argued that storage virtualization has the potential to improve the architectural, operational, and economic efficiencies of storage infrastructure. This column will expand on that topic and bring in a discussion of virtualized storage's cousin: the storage cloud.
Storage virtualization has a lot to offer on the economic front. Virtualization breaks hardware vendor lock-ins, enabling planners to purchase what they need from the source they want at prices they can afford. Plus, storage virtualization can also boost the performance of underlying spindles by a factor of 200 to 300 percent. This has the effect of making discount JBODs perform with the same speed as a brand-name rig costing several-hundred times the price, so from a CAPEX perspective, a key contribution of the technology is that virtualization lets you leverage what you already have today and buy only what you need (and can afford) tomorrow.
Architecturally speaking, storage virtualization sets the stage for the deconstruction of proprietary array controllers, allowing storage architects to abstract away from the "heads" of pricey storage systems those "value-add" functions they find useful. Doing so, in turn, enables the value-add functionality to scale outside the confines of a single hardware rig and to be applied to the entire infrastructure.
An example is thin provisioning, a "value-add" function that has lately found itself embedded on proprietary array controllers. The basic idea of thin provisioning is that it enables companies to capitalize on that time-honored dynamic of disk drives: they double in capacity every 18 months and cost half as much per GB every year, making it extraordinarily cost-inefficient to buy capacity before you actually need it. Thin provisioning offers a solution to inefficient capacity acquisition by allowing administrators to oversubscribe their arrays -- that is, to assign users or applications a lot of storage, but to apportion space from the disk array only as it is actually needed. A forecasting engine keeps track of how space is being provisioned and builds some sort of predictive facility that lets the administrator know, in advance of total resource depletion, when more disk needs to be purchased.
Thin provisioning is a great idea if it can be done across your entire storage infrastructure. Conversely, it is a really bad idea if you do it inside a specific rig given (1) the fixed capacity of the storage rig itself in terms of the numbers of shelves and drives it can contain and (2) the inability of forecasting algorithms to anticipate "margin calls" -- sudden resource demands that cannot be forecasted made by applications that think they have already been provisioned the terabyte of storage they are requesting. When margin calls occur and capacity is unavailable, someone is likely to have a really bad day.
Fans of on-array thin provisioning claim that margin calls won't happen if application administrators and storage administrators coordinate their work. Maybe they are right, but the real world is not always filled with happy people holding hands. Others say they avoid the problem by keeping disks loaded in reserve on the array, which basically compromises the value case for thin provisioning.
Bottom line: in the on-array implementation of thin provisioning, inefficiencies arise because of the fixed size of the storage pool. Moving this service onto a virtual array controller set up across all storage in the infrastructure gives this functionality more efficacy, reduces risk, and prevents thinly provisioned storage islands from becoming the architectural meme of your storage infrastructure.
Truth be told, the concept of thin provisioning was first advanced by ... wait for it ... the storage virtualization people. It took about a half decade for it to find its way onto array controllers.
As illustrated by the thin provisioning example, hosting services at the storage virtualization layer enables them to be used across all storage rather than being dedicated to a specific rig. Service extensibility delivers a greater return on investment. Moreover, doing so can simplify the application of services in a selective manner to particular data that require it and not to others.
For example, the storage virtualization layer provides the ideal location for instantiating services related to data protection. To be sure, many hardware vendors offer many data protection features on arrays ranging from RAID to controller-based value-add services including point-in-time mirror splitting (a form of continuous data protection that uses your most expensive disk to make copies of your most expensive disk) and inter-array mirroring -- synchronous and asynchronous -- using embedded controller software.
That's a very comprehensive set of functions, to be sure, but the downsides of this architecture are typically (1) that all disks in the array must be "signed" by -- and purchased only from -- the vendor to participate in a RAID set, and (2) that a premium price must be paid for the PIT mirror splitting and sync/async replication software. Signed drives usually carry a substantial markup over the price for which the drives could be had from their manufacturer or a discount reseller.
Furthermore, the value-add software, in addition to being a price accelerator for the array, is usually a harbinger of a hardware lock-in strategy: in most cases, the only target rigs that can be used for mirroring and replication are those that are identical to the source rig. This is another huge cost, doubling or, in the case of multi-hop mirroring, tripling the basic cost for a storage array.
With best-of-breed storage virtualization products such as DataCore Software's SANsymphony-V, "defense-in-depth" strategies for data protection are handled by the virtual controller using any and all disks in the infrastructure. Out of the box, SANsymphony-V provides I/O operation logging (a granular form of continuous data protection) to safeguard against data-level disasters, plus synchronous mirroring to protect against localized disasters involving equipment failure or computer room outages, and asynchronous replication for keeping data safe from CNN-style disasters with a big geographical footprint. These services can be turned on and off for different data selectively -- in the case of CDP, by clicking a box next to the virtual storage volume mapped to a particular application.
In short, storage virtualization nails the three operational management requirements most commonly associated with storage: capacity management, performance management, and data protection management. With a bit of work, other services such as data tiering and archiving can also be provided on the abstraction layer -- comprising, collectively, that elusive fourth storage management function: data management.
Server Virtualization Helps Drive Storage Virtualization
To some extent, storage virtualization has been gaining mindshare thanks to server virtualization. Marketing around hypervisor computing has contributed to the foothold of storage virtualization in two ways. First, it has familiarized everyone with the "V" word, which not so long ago was a pejorative in the storage realm. When software storage controller plays such as DataCore, FalconStor, and others first appeared in the market in the late 1990s, the hardware vendors pushed back with vigor. It was only when the likes of EMC and IBM created their own storage virtualization gateways to make their FC fabrics more management-friendly that storage virtualization actually became part of the lexicon of storage. However, even then it was treated as a stepchild of the hardware vendors that preferred the pricing model of on-array services to a virtual storage infrastructure approach.
Server virtualization marketing also drove home the idea that hardware was becoming commoditized and was being used very inefficiently by the applications and operating systems of the early Aughties. Abstracting software away from hardware, it was argued, would fix the problem. That opened the door to some intelligent discussions of storage resource utilization inefficiencies, and of the potential to do in storage what was already being done in servers and desktops: separating software from commodity hardware as a means to drive up utilization efficiency. As a result, storage virtualization drafted, NASCAR-style, off of the lead cars on the server virtualization track.
Server virtualization helped advance storage virtualization in other ways as well. For one thing, server virtualization projects have created massive issues within storage infrastructure. Gartner recently projected a 600 percent increase in storage-capacity demand brought about by server virtualization, a reflection of the need to replicate guest-machine image files on multiple storage volumes serving multiple virtual hosting machines. For virtual server "motioning" and failover to work, every possible server host for a guest machine must have the data required by that desk machine. That's a huge capacity driver and such added cost is causing many virtualization projects to stall before they are 20 percent complete.
With virtualized storage, the placement of data within the storage infrastructure, the pathways to access the data, and the flexibility in assigning and re-assigning volumes to hosts and guests on the fly make much of the storage pain of server virtualization disappear. Moving data between different hardware platforms, even those connected by different interconnect protocols, is a no-brainer. In addition, when data must be replicated for high availability, the storage virtualization kit provides the gains without the pain.
Storage virtualization is becoming, in companies pursuing server and desktop virtualization, a complimentary technology that is quickly moving from "nice-to-have" to "must-have" status. Without it, many server and desktop virtualization projects are DOA -- not to mention many "cloud" initiatives.
Into the Clouds
In the great taxonomy of technology, cloud storage is a misanthropic subspecies of an equally misanthropic genus called cloud computing. Just as 19th century paleontologists frequently assembled the dinosaur bones they had unearthed into creatures that never actually existed -- in truth, early diggers identified hundreds of species that were later found to be built from parts belonging only to a few big creatures -- so, too, cloud computing has produced many kludgy outcomes.
The term "cloud" is a hodgepodge of ideas, mostly meaningless and mostly derivative of other things already well understood in the Great Tree of Information Technology. There is no agreement on acronyms (does SaaS mean software-as-a-service or storage-as-a-service?), which is the first sign of either a nascent technology meme or one manufactured by vendors seeking to sell old equipment using a new moniker.
Plus, the more you listen to the wooing about the future of cloud computing, with its "dynamic allocation of pooled resources," its "predictable service levels," its "promised security," etc., the more you confront the feeling that you have heard it all before -- under the name application service providers (ASPs) and storage service providers (SSPs) in the late 1990s or service bureau computing in the late 1980s. In reality, all of the properties attributed to future cloud computing offerings are available today: the platform is called, simply, a mainframe.
That said, storage virtualization will likely be required to make possible the golden dream of "cloud storage," in which multiple service providers offer capacity and users purchase "futures" to meet capacity resource requirements next month, three months from now, or a year out. Getting to this ENRON-like scheme of spot pricing storage (readers may recall that ENRON tried the same thing with utility power) will require interoperability between storage clouds that doesn't exist today.
In fact, the idea of shared, interchangeable cloud services is hampered generally by the interests of key vendors in the market that are busily cobbling together several proprietary hardware and software stacks that are less open than mainframes are today to serve as cloud hosting platforms. Against this backdrop, a common, extensible storage virtualization platform may well be needed to provide a Rosetta Stone between non-compatible infrastructures within competing service providers.
For now, storage virtualization, properly deployed, could be used as a foundational component of internal clouds, enabling the dynamic allocation of storage resources from a common pool and the selective application of services for managing data across infrastructure and for providing storage in the form requested by applications.
The final issue that stands in the way of storage efficiency is one of coherent systemic management. We will turn our attention to that issue in the last part of this series.
Your comments are welcome: email@example.com.