Really Big Storage on the Cheap
How a pair of protocols and unique interconnect software can lead the way to inexpensive storage
The revenue objectives of big storage vendors and the cost containment priorities of their consumers are often at odds with each other. Talking with many consumers this past week in London and New York has confirmed a growing perception that storage costs too much. While disk prices have been dropping fairly consistently over the past 10 years or so—about a 50 percent decline per year per GB—the price of disk arrays has actually accelerated by as much as 120 percent. (Some consumers tell me that it is more like 200 percent when you factor in the cost of money.)
The explanation provided by storage hardware vendors to explain this non sequitur is almost always the same: “value-add software.” This is consistent with the stated goals of many vendors over the past few years.
Back in 2001, then CEO of EMC Mike Ruettger addressed a conference of investment bankers and venture capitalists in California, where he stated that the only way for vendors to maintain their revenue growth on increasingly commoditized arrays would be to join software functionality at the hip with array controllers. Otherwise, he said, everyone would just be selling a box of Seagate hard drives with little or no product differentiation.
Ruettger’s observation sticks in my mind as I consider the problem of storage costs. To be sure, acquisition costs are a small portion of the overall cost of ownership in storage (which is primarily a reflection of the limitations of current storage management technology). However, acquisition costs are nonetheless significant to companies where the money to buy more disk arrays and storage hardware accounts for between 35 and 75 percent of all IT hardware spending.
With International Data Corporation currently projecting a nearly 500 percent increase in storage capacity purchases between now and 2008, many IT managers are scratching their heads looking for ways to build really big storage on the cheap. They may need to look to the Boston area for a clue.
Not Hopkinton, but Cambridge: the home of the Massachusetts Institute of Technology’s Media Lab. A project there, intended to facilitate our understanding of early childhood development, involves the collection and mining of massive collections of speech and video data. Both the research project and its enabling data storage technology are breaking new ground in terms of very large scale audio-visual data collection and analysis. To host the data and its post-processing products, Media Lab will be deploying a lot of storage. And I mean a lot of storage—in the multi-petabyte sense of the term.
While there is nothing new about academic researchers using lots of disk capacity, their infrastructure economics are usually irrelevant to the business world—especially because arrays are usually donated by one vendor or another as a marketing tactic. Such is the case with the Media Lab project. Several vendors are donating equipment. However, the architecture of the repository is interesting. MIT wanted to also demonstrate how it would be possible to build effective storage inexpensively, capturing the underlying commodity pricing of disk itself.
Stepping up to the plate is Bell Micro, the value-added distributor of IT hardware which now seeks to push its own branded arrays (called Hammer Storage) into the market. What is unique about Bell Micro's offering (which is otherwise a box of Seagate SATA hard disks) is the connection method. The arrays are universally connected to servers not using an expensive controller architecture with specialized “value-add” software but via the Internet Protocol (IP) and Universal Datagram Protocol (UDP) which are harnessed by unique interconnect software developed at Zetera in Irvine, CA.
The details of the project and the storage-over-IP infrastructure will shortly be available on-line at http://www.media.mit.edu/events/movies/video.php?id=zetera-2006-05-15. You owe it to yourself to take a moment to view and listen to the playback.
I have previously discussed Zetera in this column. I think that the company’s storage-over-IP interconnect, called Z-SAN, is about the most important thing to happen in storage since the invention of the disk drive. By eliminating the need for an expensive controller, a host bus adapter, a Fibre Channel switch, virtualization software, and even RAID, Zetera-attached Hammer storage chases enormous cost out of storage. What you are really paying for is exactly what you get: solid performance at a price that captures the falling cost of disk capacity.
I would assume that MIT could have nudged Hopkinton or HDS or IBM or some other vendor of proprietary array technologies into contributing their wares to this project, appealing as it is from a sales and marketing perspective. It is to their credit, however, that they chose to use the project as a test bed for real innovation in cost-effective, massive, scalable storage technology.
I can already hear the pushback on what I am saying: Big Iron is expensive because it needs to operate at performance levels that are the stuff of legend. You don’t use a SATA array to support the capture of millions of credit card transactions per second. That much is true.
However, the number of business applications today that present truly high-performance requirements can be counted on one hand. In most cases, companies that have deployed FC fabrics have not done so after a careful consideration of the performance requirements of their apps, but because a vendor convinced management that their overpriced technology was strategic, scalable, and one-size-fits-all from a data-hosting perspective. In other words, they've been saying that fiction is better than truth.
If MIT’s Media Lab project is any indication, even massive-scale retention storage requirements can be handled on much less expensive wares. Retention storage hosts data that isn’t being updated a million or more times per second. It doesn’t require the speeds and feeds, or the costs, of FC fabrics. What it really requires is a storage modality that captures the underlying economic dynamics of disk: media is cheap and getting cheaper every day.
The MIT project also suggests another important consideration in massively scalable storage architecture: the array controller doesn’t need to be the point of intelligence in the storage infrastructure. Services such as RAID or virtualization (LUN aggregation, in truth) may be better provided as services of a good old-fashioned LAN switch (think IP multicasting) or in the form of an IP-connected storage server.
Microsoft would agree with the latter, I believe, after hearing their presentations in London and New York last week around their Windows Storage Server R2. At a presentation on May 11 at the Rainbow Room in New York City by Redmond’s Group Project Manager for Storage, Claude Lorenson, it was abundantly apparent that Microsoft sees storage services as its domain, not EMC’s or NetApp’s or any of the big iron storage vendors out there.
Over time, I expect to see more and more of the “value-add” functionality that is propping up insanely expensive storage hardware sales models co-opted by smart guys in the network storage services (Zetera) and operating systems (Microsoft) space. Third-party independent software developers will have mechanisms for plugging their wares into this new model just as they did when IBM was the dominant force in computing in the 1970s and 80s.
What remains to be seen, and the one caveat that must be made to the opinion above, is whether consumers will be willing to get back to work on doing IT. Recently, I had an argument with an IT maven for a large financial firm who told me that he really had no interest in cobbling together cost-effective storage infrastructure. To him, it was all about managing a couple of name-brand vendors. He bristled when I told him that he was no longer doing IT, but vendor management. His point was that the front office was not going to allow him to innovate, but only to choose the safe path: buying everything from name brands regardless of the cost.
That kind of thinking, while understandable, also contains the seeds of its own undoing. Moving forward, if storage growth is anything like IDC’s projections, failure to treat storage disks as commodity inventory, observing the time honored rules of cost-effective inventory management, will cost this guy his job.
Your opinion is invited. firstname.lastname@example.org.
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.