The Interconnected Dimensions of Grid Storage
Despite its name, grid storage has little or nothing to do with grid technology. But the technology holds promise.
Grid storage, as the term is currently used by many storage vendors, has little or nothing to do with grid technology. At best, it is an amorphous marketing term that leverages a lot of the same buzz phrases as early SANs.
Asked recently about the commercial possibilities of grid storage, two storage integrators offered their insights. Said Mike Linnet, President of Zerowait, a high performance storage engineering firm in Newark, DE, “The promise of grid storage is an infinitely scalable storage farm comprised of heterogeneous storage that presents a single system image with manageable quality of service to each host for prioritization by task.”
Dr. Scott Carson, CTO of Silver Spring, MD-based integrator IMS Systems Inc., added: “Grid storage is definitely not here yet, but [here are the promises]: 1) a distributed storage facility that seamlessly integrates new members (i.e., new capacity) on an ad-hoc basis and integrates them (it) into a common pool; 2) a self-healing facility that can withstand the loss of some number of components or connectivity with a graceful loss (and eventual restoration) of functionality; and 3) a self-managing facility that automatically creates and manages replicas of information for performance and resilience.”
Carson goes on to say, “You can picture a grid storage facility being useful within one data center, where you could simply add capacity and have it integrate itself into the pool (kind of analogous to the way Beowulf clusters work in the Linux world). You could also picture it being useful across geographic locations, where users have a view of a common pool from different places and ‘the grid’ places (and manages) replicas to provide a consistent view.”
From these perspectives, I find two separate but interconnected dimensions to the concept of grid storage: scalable, virtualized infrastructure and extensible file system services.
As a description of scalable storage infrastructure, grid storage could describe many storage topologies ranging from a true SAN (the one in the visionary white paper, not the one you have now) to a “clustered NAS” or “NAS/SAN hybrid” system (see last week’s column on Network Appliance’s strategy for an example).
The extensible file system (FS) part is somewhat less well defined and thus more difficult to pin down. Today, it seems that everyone is talking about a clustered file system or extensible file system. Most implementations require that the file system provided with the server operating system be replaced with a “universal” FS, or, at the very least, that the server FS be “augmented” via some special sleight-of-hand software layer.
Usually, these are thinly veiled efforts by a vendor to wrest control of the customer’s data from the clutches of an “evil” server OS vendor. Ostensibly, this is for the customer’s own good, as the new and improved FS will permit better data sharing across heterogeneous servers and/or will alleviate address size restrictions imposed by the native file systems of server operating systems. (Even the renewed fascination within Microsoft and various DBMS vendors with the substitution of object-oriented databases for server file systems fits with this model.)
IBM’s Storage Tank is one example of this strategy. A key part of the IBM universal storage management strategy is the substitution of Storage Tank file system controls for server OS file systems. More important from the standpoint of “grid storage” are a pair of projects underway in Big Blue’s Almaden Research Center.
One project, according to research staffer John Haswell, is focused on exploiting the capabilities of the latest version of the Network File System (NFS version 4 or v.4) to create “a federation of storage devices” with improved file-sharing capabilities. Aimed at “large scale enterprises,” Haswell sees a need for a distributed, standards-based, global namespace capability to help manage the data from numerous storage repositories and to facilitate file sharing between different server operating systems.
Says Haswell, “NFS v.4 offers basic services to enable this functionality, but no one is taking advantage of them right now.” Ease of deployment and ease of management are his stated design goals.
The output of that project will be of use to Leo Luan, research staff manager on IBM’s Distributed Storage Tank (DST) project. IBM’s website refers to DST as “grid storage,” and given IBM’s financial contribution to grid computing development efforts, their interpretation of the term merits closer attention.
Luan says that “baseline Storage Tank” provided enhanced storage management capabilities for smaller shops: including virtualization services, file services, and centralized management. DST takes Storage Tank functionality to the global level by enabling the construction of a much larger grid with a single global namespace across a geographically distributed environment.
The company is endeavoring to create a mechanism for sharing files (rather than copies of files) independently of server operating systems. Ideally, the scheme will also work without the deployment of a proprietary client on servers themselves—a goal that researchers hope to achieve through the creation of a standards-based Lightweight Directory Access Protocol (LDAP) server acting as a master namespace server.
Taken from the perspective of file systems, grid storage confronts more daunting hurdles than those presented by hardware topologies. In addition to the reluctance of consumers to deconstruct basic operating system services, alternative global namespace services from vendors such as NuView or Tacit Networks are already in the market today and can be deployed without disruption to existing storage environments.
Despite these practical and technical challenges, Carson remains guardedly optimistic about grid storage, “There are some very tough problems to solve to make this work, but it's a pretty cool vision.”
What do you think? Your comments are welcomed at firstname.lastname@example.org.
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.