Virtualization: Where's the Business Value?
How did IT ever fall for the vendor's business value proposition for server virtualization?
The spin on server virtualization begins to wobble when you drill down into both the marketing hype and the technical details. When you consider the difference between the vendor's business value proposition and reality, you might well wonder how the idea managed to get into the orbit of meaningful ideas at all.
The business-value case for server virtualization falls into the typical three domains: cost-savings, risk reduction, and improved productivity. From a cost-savings perspective, server virtualization is supposed to reduce the number of physical boxes that need to be managed and powered. Virtualizing gear does reduce the hardware footprint, but the corresponding improvements in management efficiency have not been demonstrated -- at least not consistently.
Virtual server sprawl is only one dimension of the management problem; others are resource allocation inefficiency, application binding, and I/O latency. Most virtualization product vendors are working these issues as I write, but I really need to question whether even the current market-leading VMware has the necessary talent aboard to bring order to the Wild-West, standards-free universe of "open systems" storage. Depending on hardware vendors to work and play well together on a common method for connecting and managing storage in a virtual server world is simply substituting hope for real strategy.
No scientific research has been done to demonstrate any actual improvement in resource allocation or utilization efficiency in a virtual server world. We have only parabolic accounts from a few happy consumers to attest to any real gains.
My take: Given the propensity of the technology to further mask inefficiencies such as dark storage and I/O congestion, we probably aren't going to find a huge improvement in storage allocation efficiency behind virtual servers versus physical servers. As for storage utilization efficiency, data management, not virtualization, is the key to the successful placement of data on infrastructure based on business use.
The second component of a business-value case for technology is risk reduction. The risk reduction claims of VMware and others come down to the high-availability (HA) functionality that has been bolted onto the technology.
VMware boasts several "integral strategies" for failing over VMs to other VMware servers in the same sub-network and to alternative hosts across a WAN. Some of these high-availability solutions are actually third-party products working under the covers, such as data replication tools from EMC or NeverFail Group. These components are bolted on rather than built in.
Same sub-network failover has been an integral part of VMware's ESX product for a couple of years but seems to work best only if all storage is overlaid with the VMware File System -- something you need to think long and hard about doing. VMFS might not deliver what you are looking for in a file system and represents a fairly substantial effort if you want to apply it to terabytes or even petabytes of existing storage infrastructure.
Offsetting any real risk reduction value is the cost. HA features aren't part of the basic software kit from VMware as they are with Virtual Iron and some of the other hypervisors on the market. You need to buy them separately, which represents a huge price hike over the advertised base sticker price of a VMware solution. Plus, you need to send a cadre of IT troops to school to learn the nuances and intricacies of setting up HA functions and maintaining them over time: another unforeseen investment.
Another problem with the VMware risk reduction case is that it is undifferentiated and overly simplified. HA feature discussions suggest that all virtualized applications and their data are of equal importance. Truth be told, ongoing WAN-based data replication is very expensive. Add failover and high availability is actually the most expensive approach ever invented for protecting IT against the remote possibility of a facility disaster. As a result, continuous replication and failover are disaster recovery services that you want to provision only to your most mission-critical applications, and not, as a rule, to a bunch of consolidated file servers and low-traffic Web servers, which may be well served by simple tape backup. If you are hosting mission-critical applications on virtual servers, considering the many things that can go wrong with the other applications that are sharing tenancy, maybe you need to rethink the strategy altogether.
Improved productivity is the third component of a classic business-value case, and the server virtualization folks are keen to point out that the automation introduced by the hypervisor reduces the labor force required to manage more gear. That may be true, but it may also be irrelevant.
If a trend toward more virtualization-based server consolidation is the reality, I will be interested to see how much downtime accrues to this strategy. Right now, an hour of downtime to patch a server takes a certain number of users off the line for a certain amount of time. Bringing down a virtual server environment takes down many applications and many more users in that same hour's time.
The server virtualization advocates might argue that dynamic re-hosting of VMs is the solution to this issue because it would (in theory) enable you to shift a workload to other servers while you maintain the primary host. This may well be the case, but it raises the question of how much spare capacity you must build into my infrastructure to give your VMs alternative-host locations for use during maintenance. Where are the tools for coordinating maintenance activity and dynamic re-hosting logistics?
Certainly, the metric that matters is how much more work gets done at the end of the day by business users dealing with applications hosted on virtual servers versus those hosted on physical servers. If the answer is less work, because of increased downtime or lower performance from applications hosted in VMs, then server virtualization may become just another passing fad regardless of all of the marketing dollars spent to promote it.
If measurements show no impact at all from server virtualization on top line growth, interpreted as increased user productivity, then the question must be asked: Why do it at all? By contrast, if user productivity increases measurably as a result of server virtualization, then it may be the right tool for meeting specific business requirements.
For now, I am not seeing the business-value case play out in a meaningful or replicable way -- but that's just me.
Or is it?
The discussion of the real results experienced from server virtualization has been stymied by vendor influence. Analyst reports are paid for by the vendors themselves and, until only recently (and in the wake of some fresh and frank reporting in business publications such as the Wall Street Journal), few trade press publications have carried any content critical of VMware or the other server virtualization players. In the words of one editor, "Why would we write anything critical of VMware? They are buying all of the ads in our book!"
It is essential for factual reports to be made by the companies experimenting with this technology, so we all can learn from collective experience. Your e-mail on this point is invited: jtoigo@toigopartners.com.