In-Depth
Bright, Shiny Things for Storage Managers
When designing storage infrastructure architecture to achieve efficiency, performance, cost-containment, scalability, and ease of management, it is easy to become distracted by the latest announcements from brand name vendors -- the "bright, shiny things" phenomenon. Like clockwork, every six months vendors insist on rolling out a new feature that they hope will differentiate their commodity wares from their competitors'. It can be a real task to separate the important technological developments from those that cater to the "storage is just an accessory for my iPod" attitudes of certain segments of the marketplace at any given time.
De-duplication, for example, may fall into the shiny-new-thing category, especially when it moves beyond the original use case -- squeezing down backups performed each day so that 30 days of backup data can be stored cost-effectively in a near-line repository for use in restoring files in a more timely way than would be possible with tape backups -- to the "de-dupe everything" use case, such as those currently being advanced by most brand name vendors.
The first use case makes some sense to me. De-duplication after write is done just as efficiently using free software included with CA Technologies' ARCserve product as it is using expensive rigs that perform this operation as a function of a proprietary array controller. The latter approach is preferred by hardware vendors, of course, because it enables them to jack up the cost of the underlying disk by 100x or more.
However, from an architectural standpoint, it makes more sense to perform de-dupe on production data (if you really want to do that at all) using the file system itself. Significant work is being done by just about every file system developer today to enable this functionality in the next evolution of their products. Given that no special hardware kit would be required to "re-hydrate" de-duplicated production data with file system native de-duplication functionality, it strikes me that this method might be more appropriately categorized as "strategic" and "efficient."
Someone a lot smarter than I am once said that truly strategic technology is ubiquitous and invisible. Native file system de-duplication would fit this definition.
The appeal of shiny new things such as de-dupe has much to do with capacity management, one of the three core functions of storage administration today (the other two being performance management and data protection management). Capacity management challenges are driving a lot of shiny (if not so new) features on the latest arrays -- including not only de-dupe but on-array thin provisioning and tiering. These features address one issue in storage today: lack of knowledgeable storage managers.
Lately, organizations have reduced their storage operations staff, those who managed capacity manually, and assigned their tasks to server people who know very little about storage. Under the circumstances, automating functions such as the allocation and de-allocation of storage capacity intrigues many consumers because it seems to enable more work to be done by fewer staff.
The simple fact, however, is that technologies such as de-dupe are not the things that need to be -- or should be -- performed by array controllers. From an architectural standpoint, these functions don't need to be done on the array at all. Moreover, when they are part of the hardware kit, they tend to increase storage costs in many ways -- offsetting the labor cost savings they portend to provide.
New Stand-out Products
Certain things need to be done close to the disk to guarantee performance, resiliency, and bang for the buck. Most other "storage functions" need to be done off array -- delivered as shareable services across infrastructure broadly rather than being dedicated to a single stand of disk drives.
My friends over at Xiotech understand this, as do those at DataCore Software. Each company has recently announced new products with all the ruffles and flourishes of a "shiny new thing" -- but which stand out in my view as truly strategic.
Xiotech makes storage hardware -- the Intelligent Storage Element (ISE) line that I have discussed before -- but they eschew most of those "value-add" software features that their brand-name competitors insist on embedding in their internal controllers. They have focused instead on building a product and a value case that appeals to intelligent, not desperate, consumers.
Xiotech's latest entrée is Hybrid ISE, which integrates some Flash SSD components with its disk, as well as technology for using the SSDs not as disk targets but as adjuncts that optimize subsystem performance. The company has, for the past several years, utilized RAGS, their own patented array-making technology, rather than RAID as a means to pool disk resources and to provide a basis for data migration and replication (protection). RAGS uses disk cell coordinates instead of traditional cylinders and tracks as a means to "virtualize" all available disk space. You hear ISE mavens using the metaphors of "books," "pages" and "sheets" to describe read/write operations.
With the latest Hybrid ISE, when a sheet of space containing data gets "hot" -- that is, when multiple concurrent accesses are made to the data on a specific location of the disk -- ISE automatically moves that data into an SSD where access-related I/O can be handled more effectively. When the data "cools," it is moved once again down to a spindle. The result is another bump in the already industry-leading performance story around Xiotech rigs, which engineers at the company have lately begun calling "storage blades" or "bricks."
Bricks is a good description. The Hybrid ISE fits well with ISE 1.0 infrastructure (bricks without SSD) and enables the creation of a field of storage building blocks that can be treated as one or more single disk drives. This infrastructure can be "surfaced" for use as NAS or Block using 1u rack servers designed by Xiotech for this purpose, or using any number of available virtual controllers or software NAS plays offered by third parties. Bottom line: the field of bricks scales effortlessly and presents itself to applications using an elegant meme.
In the past, I have gushed over Xiotech's innovative application of Web Services REST-ful management, which it calls CorteX -- substituting this standards-based approach for a proprietary API or clumsy SMI-S provider. I last wrote that I could manage several petabytes of storage using an iPhone, iPad, or any browser-based client using ISE Manager.
It turns out that this smart design choice keeps paying dividends. ISE is getting smarter both as a function of innovative management apps like ISE Analyzer, and customer-developed apps contributed at CorteXdeveloper.com -- an open forum where Xiotech publishes its code for everyone's use.
On the Drawing Board
At Xiotech's development center in Colorado Springs, they are working on making bricks capable of "friending" each other -- that is, making each brick capable of understanding the capabilities and status of other bricks to better load balance and tier data across an ever-expanding infrastructure.
They are planning to leverage this capability for capturing granular information about how apps use the ISE field of bricks so they can auto-provision the right flavor of storage to an application based on what the application historically requires.
That blows the socks off HP's claims to provide "disk profiles" to more readily allocate the right storage to applications. It also lends credence to Xiotech's claims to have a three-decade lead over its competitors in terms of dynamic storage provisioning and to possess a native ability to provide VMware with the storage services its guest machines need automatically -- while competitors scramble to support the hypervisor's new vStorage APIs for Array Integration (VAAI) primitives. (VAAI is a clumsy workaround to the knotty problems that VMware has created by getting in the path of all guest machine I/O. It now enables certain storage instructions such as back-end replication to be offloaded to an array that is smart enough to understand yet another storage API developed for vSphere.)
ISE is an example of what a vendor can do when its sales model is less about recurring warranty and maintenance revenue and more about delivering to the customer a top-notch product that doesn't break, doesn't quit, performs well, and delivers what is likely to be the best return on investment of any "enterprise-class" storage rig on the market. However, before I sound too much like a cheerleader, it is worthwhile to look at what Xiotech's failure to load up their rigs with "value-add" capacity management functionality means to the consumer.
In a word, nothing. Conversations with the firm's VP of engineering, David "Gus" Gustavsson, and with CTO Steve Sicola, and maestro technologist Rich Lary suggest that quite a lot of careful consideration has been given to what is and what is not a "constituitive" meaning of storage subsystem hardware. Things such as thin provisioning, de-duplication, on-array tiering, etc. -- much as consumers may view them as desirable components of a one-stop-shop approach to building storage -- have nothing whatsoever to do with the foundational operation of storage itself. In fact, such technologies often fall into the categories of "eye candy" or "workarounds to other problems that vendors have created for themselves on their arrays" or "tactical fixes to data management failures."
More often than not, value-add functions are placed on array controllers to increase the price of commodity disk while providing no real additional value. Over time, they lead not to the realization of good storage architecture -- in Xiotech's vision, a generic "field of bricks" comprising components with predictable performance and linear scalability and all managed in common to simplify provisioning and de-provisioning as needed -- but to a profoundly different storage model, characterized by isolated islands of storage with no predictability in terms of performance and scalability, no common management method, and a huge associated labor and warranty cost model.
DataCore’s Virtual Storage Controller
If I am not buying thin provisioning on the rig, where do I get it? DataCore Software's latest product, SANsymphony-V may provide an answer.
Last month, the Ft. Lauderdale storage virtualization software maker announced its new flagship product, which combines the best of its previous generation products into one easy-to-use "virtual storage controller" that works with Xiotech's (as well as everyone else's) storage hardware with hand-in-glove simplicity.
SANsymphony-V takes little knowledge of storage to deploy and configure. Wizards and extensive layman's language help, on a graphical user interface similar to the latest Microsoft Office products, discover storage assets and combine them effortlessly into managed pools. The software provides a platform for doing most of the things with capacity management, performance management, and data protection management that need to be done to maintain a storage infrastructure at business-ready service levels.
Only using SANsymphony-V, functionality such as thin provisioning (which the company invented, by the way) isn't isolated to a single hardware rig; it is a service extensible to all capacity in the infrastructure regardless of whose name is on the bezel of each rig. In effect, software value-add that need not be joined at the hip to proprietary array controllers is hosted instead on the storage virtualization layer created by SANsymphony-V, where it can be shared effectively and simply, and implemented readily for any application that needs it.
I tested SANsymphony-V recently and found several assertions by the vendor to be absolutely true. First, the allocation of storage to servers is simple: once storage is pooled, just drag the storage icon over to the relevant server icon in the GUI and you're basically done. A logical, thinly-provisioned disk is provided to the server and its apps or guest machines. Moreover, DataCore keeps track of the paths that are being used for data traversing between the storage and the server/apps and auto load-balances traffic to avoid latency.
I can testify to the fact that storage I/O experiences a 3-5x performance bump when SANsymphony-V is managing it, primarily because all reads and writes are cached -- so I/O works at memory speeds and not at the speed of spinning rust. The practical ramifications of this are several: for one, the overall speed of infrastructure is not determined by the slowest spindle. For another, older gear can be kept in service longer, even as newer and speedier disk is added to infrastructure, which helps to bend the storage cost curve. Best of all, brand doesn't matter. You can build a better infrastructure with the combination of Xiotech ISE and DataCore Software than you can with most of the available brand-name rigs from the standpoint of resiliency, manageability, value-add service sharing, performance, and labor.
I/O caching and pooling also mean that functions such as data replication (even across different brand rigs), tiering, and other functions are a snap to implement. You want continuous data protection to safeguard a volume against a data corruption event? Easy -- just tick the check box on that volume as presented by SANsymphony-V and every write made to the volume is logged, enabling you to fail back to an earlier time before the corruption event occurred. Want high availability? Easy -- just mirror data synchronously between two or more volumes on different rigs in the same room or the same metro area network. Concerned about CNN-style disasters? Asynchronous replication and snapshotting -- even between heterogeneous hardware -- is readily provided.
Bottom line: the tools for building a strategic storage infrastructure are available to intelligent consumers. Fortunately for all of us, intelligence is becoming the new meme as cost-containment initiatives get into full swing.
Your feedback is welcome: [email protected].