In-Depth
Virtualization, Storage, and the Truth about Hypervisors (Part 2 of 3)
Even if the hypervisor does its job reasonably well, if a resource misbehaves, the consequences for the virtual server environment could be huge.
In my last column I discussed the odd idea of a VMware "strategy." I noted that server consolidation is a strategy, and IT resource optimization might be a strategic objective, but calling VMware a strategy is a bit of an oxymoron. I explored two benefits of virtualization, but I also said that there were problems. For example, I pointed out that no one can readily discover which physical server is currently being used to host which virtualized application when a hypervisor is moving VMs around infrastructure at will.
In this column I will explore hypervisor issues in more detail.
From a technical perspective, hypervisors -- the software engines of virtualization -- are subject to three distinct sets of problems. First, just because the hypervisor might be written to respect the code extents developed by the processor manufacturers to enable multi-tenancy on x86 chips, virtually no application software has been designed with multi-tenancy in mind. That puts the hypervisor in the role of traffic cop. When a resource request is issued by an application in a VM, the hypervisor must intercept and broker the request to actual machine resources. This process appears to work fine as long as the application resource requests are understood by the hypervisor and are handled using well-established routines.
Unfortunately, some applications don't follow the rules. They make "illegal" resource calls. Even in the case of "well-behaved" applications, their once "legal" requests are too often subject to change: modified in the latest round of patches from their developers and becoming "illegal" resource calls. Curiously, this phenomenon is viewed by some industry observers as giving an edge to Microsoft's Hyper-V virtualization technology over all other hypervisor vendors for two reasons. (1) Microsoft applications are notorious for making resource calls that do not respect x86 extent code and (2) Microsoft owns its code and is in a better position to accommodate its eccentricities than is a third-party hypervisor vendor.)
The second issue with third-party hypervisors has to do with the capabilities (or lack of same) for dealing with concurrent I/O requests from multiple VM-hosted applications. One CEO for a storage management product vendor has stated that VMware, a company with which his firm partners, is "brain dead" when it comes to balancing I/O traffic, preferring to direct traffic to the first available port. This insight is being reinforced, if you read between the lines, in press announcements made by VMware storage product partners such as 3PAR.
On April 21, 3PAR bragged in a press release about the fruits of its work with VMware to develop "a new adaptive queue depth algorithm." This technology, according to the release, was "developed in response to direct customer feedback [and] dynamically adjusts the LUN queue depth in the VMkernel I/O stack to minimize the impact of I/O congestion detected by the 3PAR InServ Storage Server. As a result, organizations can increase the number of virtual machines and add higher performing applications to their ESX servers when attached to 3PAR arrays."
This is less of a technology improvement statement than a confirmation that significant problems exist at the heart of VMware's virtual server nirvana. The recognition of the I/O congestion problem introduced by VMware and some other hypervisor products is growing, escalating companies' concerns about whether VMs are actually suitable platforms for more I/O-intensive applications (aka anything more than a low-traffic Web server or file server). Absent an intelligent load-balancing paradigm, this I/O congestion problem gets worse as you stack more VMs on a single host platform.
3PAR's assertion might lead to the conclusion that the only workaround is for each hardware provider to develop its own proprietary drivers for VMware. Those who have done so are hoping that their work will, in turn, favor only the use of their products with VMware virtualized servers. That's a stovepipe architecture model that I am not sure I want to pursue.
The practical question raised by the 3PAR announcement is whether an innovative adaptive queue-depth measuring algorithm is really all you need to balance I/O loads. I don't think that it is, and would furthermore state that no one should be deploying VMware for interconnection to enterprise storage architectures like FC fabrics without using Virtual Instruments. With NetWisdom, a product from that company, you can determine I/O pathing based on 90-plus parameters and target all of the storage devices in your infrastructure, not just the proprietary hardware enabled by a proprietary driver enhancement.
The third big issue that confronts VMware and other virtualization products is what I think of as "a revolution from below." Even if problem number one is solved, and the hypervisor does a good job of brokering both legal and illegal application resource calls, and even if problem two is solved, and some generic mechanism is discovered for automatic I/O load balancing and optimization, the potential problem remains of what would happen if an advertised resource isn't actually what it says it is.
Thin provisioning, pioneered on disk arrays themselves by 3PAR and Compellent but now being adopted by EMC and other name brand finished-array sellers, provides an illustration of a fundamental problem. Thin provisioning is a high-tech shell game that takes storage that has been allocated to an application (but is currently unused) and provisions it to another application. A proprietary demand-forecasting algorithm is then applied by the vendor to anticipate capacity requirements.
The vendors proffering this technology say that over-provisioning storage capacity in this way simplifies capacity management, enabling more storage capacity to be managed by fewer hands. The problem is that the scheme is only as good as the vendor's forecasting algorithm, which is top secret. If there is a sudden demand for space by an application that believes it owns the space -- the I/O equivalent of a margin call -- and that space is not available, the impact could be huge.
Jokingly, a Compellent spokesperson responded to a query on this matter from an audience member at a recent conference: "If there were suddenly an unforecasted demand by an application for space that has been thinly provisioned elsewhere, the application would most likely abend. If the application was hosted in a VM, the VM would probably fail, as would all of the other VMs. Then the hypervisor would fail, the underlying server operating system would probably fail, and smoke would start billowing out of the side of the server…"
Nobody found the joke amusing.
The point is that even if the hypervisor does its job reasonably well, if a resource misbehaves, the consequences for the virtual server environment could be huge. The same outcomes of application and server OS failure might arise in a non-virtualized application hosting environment, of course, but having a catastrophic failure in a virtualized server setting runs the risk of jeopardizing not only the single application requesting the resource but potentially many other applications that are hosted in other VMs in the same physical box.
At the end of the day, server virtualization doesn't fix the underlying problems of infrastructure at all. It may worsen them by adding an abstraction layer that obfuscates effective resource management and troubleshooting. At least one vendor of storage management software is making a business case for their product by citing the problem of "dark storage" -- storage allocated by the storage admins to the server admins, but which is then only partly overlaid with a file system with the remainder forgotten by both administrators. This problem, which occurs in the physical server world all too often, is even more difficult to ferret out in virtual server environments.
I'll tell you more in my next column.
Your feedback is welcome: [email protected]