In-Depth
Managing Storage, Part 2 of 2
Size is NOT everything … it's all in how you manage your data.
In last week’s column, we discussed some of the challenges confronting present-day storage management, including the myopic focus on hardware platforms and the lack of an “industrial era” disk-as-inventory metaphor that might help us to allocate capacity more efficiently to data. In this column, I want to talk about the other half of storage management: application-facing management, or more simply, data management.
Companies do an abysmal job of managing storage capacity today. In many client shops I visit, tools provided with arrays must be supplemented by manual spreadsheets to keep the storage administrator up to date about available and used capacity. This is usually a reflection of limited information supplied by the vendor through device managers, APIs, and MIBs provided on the array—mainly to obfuscate the consumer’s view of how much space the vendor is using on the array to support its own “storage management value-add software.”
Consumers have accepted this model based on a traditional view of storage as a set of “dumb” peripheral devices slung off the back of a mainframe, whose operating system provided all of the necessary discovery, monitoring, and control of direct access disk storage (DASD). The problem is that distributed computing isn’t the mainframe platform, and the move to distributed environments saw the loss of management and control over the storage infrastructure as a function of the processor OS. This is especially true because storage has been detached from direct-attached configurations and placed into complex FC fabric interconnects, oxymoronically referred to as SANs.
Today, simply putting the data from a specific application onto a set of disks offering specific services (such as archiving, compression, or encryption) has become a daunting task. Lack of visibility into the storage hardware is part of the problem, as is contending with overlay technologies, both hardware- and software-based, in a complicated FC SAN infrastructure. It is difficult enough to keep this storage infrastructure (with its myriad device incompatibilities) up and running, let alone managing capacity, so that the dreaded “disk full” error message does not stop a mission-critical application in its tracks. The result has been the need to spend inordinate amounts of money on more support personnel and for more disk capacity at an increasingly accelerating rate. Forget purpose-building storage to support the optimal performance of applications. Today, storage management is all about cost containment, and it is a war that is being lost.
Only part of the solution is storage capacity management. The other part of storage management is data management.
To effectively manage storage, we need to manage both infrastructure and the use made of infrastructure by data itself. Ideally , we need to separate and classify data: to segregate the bits that require special handling from the bits that don’t. Some regard this as “boiling the ocean,” given the current state of data-classification technology. File systems do not make data self-describing, and correcting the situation is no easy task. Network Appliance’s senior director of strategic technology, Bruce Moxon, observed at an event a few weeks ago that information lifecycle management, a process for managing data itself, is a process waiting for formalized standards from ITIL, ISO, and other bodies.
In the meantime, if we can’t manage data by class, we can manage it by type. This means creating silos for archival data in the major categories we find in businesses today: databases, workflow content, e-mail, and user files. Database archiving tools are available from numerous sources including Hewlett Packard (which just bought OuterBay’s tool set), Princeton Softech, and a comer I really like, GridTools. These tools enable you to automate how stale data is extracted from production systems and stored in a nearline (or offline) silo for later use or regulatory compliance.
A growing number of vendors are available to help handle e-mail archiving, from Ilumin (now a Computer Associates brand) to newbie Mimosa Systems. The same holds true for workflow content, where vendors ranging from the well-known (Documentum, now an EMC brand, Filenet, and others) to the newly hatched (Xenysys, for example) are providing pretty good archive creation and management tools.
Open files remains the bugaboo of the manage-by-data-type world. There are many ways to skin that particular cat—currently, ranging from technologies out of UK-based BridgeHead Software and Symantec/Veritas to those from little startups you’ve never heard of but which are leveraging a lot of out-of-the-box thinking and a bit of open-source googling tools to sort and classify user files. Alternatively, if you can get the necessary user buy-in, you can leverage a file folder methodology like the one embedded in NuView’s global namespace manager.
Data management by type is not as granular as data management by class, but it does introduce foundational concepts and gets folks primed for more granular data classification schemes as they become ready for primetime. Another caveat worth mentioning is that there is no one-size-fits-all archive manager, or—more precisely—no good manager of managers, though Bridgehead and others are advancing the ball in this arena.
It is only a matter of time until data management decides what it wants to be when it grows up—whether it should be implemented in data itself or in an externalized directory service somewhere. For now, by archiving selectively using the best available tools, you may be able to boost the utilization efficiency of storage hardware investments while improving the performance of application software—simply by continuously moving stale data out of your most expensive disk.
Going forward, substantial work is needed to unite the tools from the storage-centric and the data-centric worlds. Efforts like CAPSAIL at StorageRevolution.com, an open source initiative to build an application-facing storage management framework that launched last month, are a beginning. In the end, it is storage consumers who will need to ride herd on vendors and drag them kicking and screaming into the new storage management.
Comments are welcomed at jtoigo@toigopartners.com.
About the Author
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.