In-Depth
Storage Clusters and Beer: It’s All in the Hops
Is a software-only solution to storage manipulation on the horizon?
Oktoberfest is in full swing wherever beer is popular, and enthusiasts will tell you that the secret of a good brew is in the hops. The secret to good storage clustering, by coincidence, also has to do with hops—that is, the number of locations that must be visited to ensure that data gets written to the right place on the right node in the cluster. By this basic measure, all storage clusters are not alike.
A recent visit to Denver reacquainted me with the storage clustering realm by providing the opportunity to chat with insiders at EqualLogic, 3PAR, LeftHand Networks, and others about their clustered storage offerings. The topic was on everyone’s mind because of Network Appliance’s recent announcement about its clustered storage offering, GX, and because of a just-concluded conference dealing with storage clustering issues.
Most of the vendors didn’t want to frame their value proposition in terms of “my cluster is better than their cluster”—too geeky. The exception was LeftHand’s CTO John Spiers, who noted that if the cluster isn’t done correctly, the entire storage solution might be a slow performer.
To Spiers, it comes down to the number of hops. Understanding what he is saying requires a basic understanding of clustered storage as it exists today. Often used interchangeably with the term “grid storage,” clustered storage systems are essentially a collection of network-connected storage arrays (nodes) having a common administrative function that manages writes to the underlying disk drives. Application servers can create volumes from this collection of disk drives on demand and new arrays can be readily added to the cluster to grow capacity. With added disk- and node-level RAIDing, the infrastructure can protect data and heal itself in the case of a disk, node, or interconnect failure.
Sounds pretty good so far: clustering can provide resilient storage capable of scaling to meet the demands of the data explosion. The bad news is that a lot of clustering solutions are poorly designed and expose data clustered storage offerings to potential access interruptions or to declining access speeds over time.
Spiers complains about clustering solutions built on “scale up” architectures. This means that the storage solution has been built on a “master controller,” which is usually the controller on one of the nodes in a multi-node cluster; its task is to coordinate the work of other nodes. When you begin to scale such a system, you create a single point of failure in the cluster (the master controller) and you run into performance issues when more nodes are added and compete for the attention of the master.
Spoofing
Some vendors taking this route throw cache memory at the performance problem. They signal the completion of writes that haven’t actually occurred and store the related commands and bits in a memory queue so that the application writing the data can go on its way. This is called spoofing and it has a long history in computing, beginning with certain types of mainframe channel extension products in the 1980s. More recently, many vendors of tried-and-true standalone arrays have been using spoofing to speed up the function of their box. The technique also helps to make performance tests work to the advantage of the vendor, since metrics such as “I/Os per second” do not actually measure disk write performance, but only how effectively caching is being done.
Spiers is worried that the approach, when applied to clustered storage, can produce unanticipated consequences—especially as the cluster scales. For one thing, he doubts that consumers are getting what they pay for in terms of real-world performance of the solution as it scales.
The alternative to “scale-up” architecture is to distribute the management function, which is what LeftHand Networks does with its SAN/iQ software. Instead of having all communications to the cluster pass through one node, a distributed architecture would allow I/O to be load-balanced across a federation of intelligent nodes. This, to hear Spiers talk, is LeftHand’s secret sauce.
All you need to do is look at I/O forwarding schemes (“hops”) in some of the clustered storage platforms out there to see his point. In the cases of most clustered storage solutions today, the application host sees a virtual image of a disk or volume as presented by the clustered storage platform. The host has no understanding of where specific data blocks physically reside. When an I/O request is directed to the target volume, the first communications hop is to one of the nodes in the cluster (or to a freestanding metadata host system)—essentially to consult with the data-layout map stored there—to see which node in the cluster is the appropriate target location for the data write. Following the consultation hop, there is then a hop to the appropriate target node.
With this architecture, the probabability of the I/O being forwarded (extra hop) to another target array (node one) or metadata server increases as the number of arrays in the cluster increase. The extra hop to check the data layout map impacts the performance of the overall cluster as it scales.
LeftHand’s solution to the problem is a patent pending Device Specific Module (DSM) that plugs into Microsoft’s Multi Path I/O (MPIO) specification, which is widely adopted in the storage hardware industry. The MPIO DSM software creates and updates a map of data layout across the nodes in the cluster and enables data writes to be targeted directly to the specific node where the relevant storage blocks for the data are located. Spiers says that this enables LeftHand’s performance to scale linearly as nodes are added to the clustered environment.
There are other efforts afoot to reduce the hop count, of course. Panasas, an early pioneer, uses its own object-oriented file system which targets data writes to specific nodes. Spiers dismisses this approach as a “fat, proprietary stack” that consumers seem disinclined to implement at the moment. The same holds true, he says, for file system replacements such as Lustre and Ibrix Fusion, which utilize a metadata server to play the role of data layout map. “The extra hop to the metadata server creates the same problems as the extra hop to the data layout map in each node.” LeftHand's patent-pending MPIO DSM plug-in provides the same benefit as these proprietary client software file-system driver stacks, but using standard MPIO stacks in the O/S. Lefthand’s unique DSM design for Windows was approved by Microsoft and is widely deployed by customers, unlike proprietary stacks that have been shunned by customers and relegated to the HPC market.
Even Network Appliance seems to be in a quandary about how to solve the hop count problem. (NetApp was contacted for comment but did not respond at press time.) After a much-delayed effort to leverage clustering technologies acquired in their purchase of Spinnaker Networks several years ago, the company instead integrated what can only be described as a “metaphor” from Spinnaker’s Andrews File System-based clustering approach (which is fundamentally incompatible with NetApp’s Berkeley Fast File System-derived file system) and overlaid the result with a global file system to provide the look and feel of a federated cluster.
From the resulting product, GX, Network Appliance’s strategy to storage clustering appears to be a mix of hype and hope: hype for an extension to NFS that will enable something like the creation of a metadata map at the client and hope that the resulting extension will be fasttracked as a standard in the next couple of years. In the words of strategic architect Bruce Moxon at NetApp, "While we expect to be able to leverage pNFS (parallel NFS extensions planned for NFSv4.1), we do not require those extensions to provide scalable storage system performance." Yet, in documents provided to this writer by sources inside the company, NetApp is clearly advising its sales force not to push GX into service behind applications requiring Windows file services (CIFS), block services (FCP or iSCSI), databases, or other business applications with high rates of random I/O—caveats suggesting the sluggishness of the technology in the face of myriad data-map lookups. (We will gather more insight on this point following an interview in the coming week with Moxon.)
Spiers and LeftHand CEO Bill Chambers seem painfully aware that expounding on the differences in their underlying cluster architecture may not be the key to success in selling their products. Furthermore, the company’s unwillingness to provide money and share sales numbers to Gartner, IDC, and the rest of the analyst community makes LeftHand appear to be low man on the totem pole in the iSCSI storage solutions space.
In recent analyst charts, Network Appliance and EqualLogic appear to dominate the market. However, NetApp’s positioning is hotly debated, by LeftHandand EqualLogic, which refers in its product presentations to the fact that just because iSCSI target functionality ships with each Network Appliance Filerdoesn’t mean that all (let alone most) customers are using the box as an iSCSI target.
Vulnerabilities
For its part, LeftHand has an impressive installed base and appears to lead the pack in several other areas in addition to hop count minimization. For example, says Spiers, EqualLogic’s architecture appears to be vulnerable to the failure of a nodal box. If one node in a cluster fails, the entire cluster fails. Perhaps this explains, he speculates, why most EqualLogic customers have not scaled beyond three nodes in a group. LeftHand customers, by contrast, have expanded well beyond three-node clusters.
Network Appliance addresses the same vulnerability to single-node/entire-cluster failure scenarios in its GX cluster by recommending pairing up controllers on the heads of each node. Dual redundant controllers represent a heavy-handed but necessary hardware-centric approach in the absence of a software-centric federated approach like LeftHand’s.
At the end of the day, hop counts might be of greater interest to beer connoisseurs and geeks than they are to garden variety storage consumers. From our viewpoint, however, LeftHand’s technical nuances are on point for anyone considering clustered storage. Another plus in LeftHand's approach: their disks are no longer proprietary. The company has begun selling not only a controller-and-drives one-stop-shop solution, but also a software-only solution that enables consumers to use any drive rig that they want. As part of a “meet in the market” deal with HP, you can now run LeftHand’s software on standard HP servers. The stated direction of this Boulder, CO firm is to provide a software-only offering within a short time.
This comes at a time when many proponents of open storage clusters seem razor focused on locking customers into their hardware and overcharging for the solution's commodity components—the disk drives. Sounds a bit like monolithic storage to me: these vendors want to be EMC’s mini-me.
Then again, it’s Oktoberfest. Maybe everyone’s got another kind of hops on the brain. Your comments are welcome: [email protected].