Server Virtualization and Storage: Part I

Is server virtualization the answer to storage worst-case scenarios?

Vendors have many reasons for recommending that their customers virtualize their servers—not the least of which appears to be a desire to sell them newer server hardware, blade systems, and storage gear. Some of their arguments make sense, though many strike me as hyperbole.

Case in point: A white paper from VMware suggests that companies are still suffering from their conspicuous consumption of standalone and rack-mounted server technology reminiscent of the 1990s and require their V-wares to provide their organizations with “sustainable competitive advantage” by alleviating “high costs, slow response times, and inconsistently managed infrastructure.” I’m not sure that server consolidation does anything meaningful to help companies achieve and sustain competitive advantage, but that’s the claim.

Touting more than 20,000 enterprise level VMware infrastructure deployments to date, VMware says it has helped companies to do everything from minimizing power requirements and heat generation in support of data center greening to reducing “pervasive over-provisioning for worst-case workload scenarios” to facilitating cross OS management support. If you are seeing these effects in your company, then server virtualization is just what the doctor ordered to solve the ills of contemporary IT.

However, if you are new to server virtualization, if your curiosity is piqued by the success stories being trumpeted in the trade press, there are some gotchas you may need to consider before you jump on the bandwagon. Many have to do with storage—or more specifically how you will connect your storage to your virtual server environment and how you will fail it over to a recovery site should the need arise.

In a previous column (see, we heard from a user of CA XOsoft, the University of Texas, who told us that one strength of the product, which facilitates failover between data centers, is that they plan to take un-virtualized Microsoft Exchange Server clusters connected to a Fibre Channel fabric and fail the whole thing over to a VMware server environment with internal disk at an ISP. The key value of CA XOsoft is that it enables such a failover with minimal hassle. Otherwise, a lot of time might be wasted reconfiguring storage to the VMware environment.

That interview was followed by another, this time with Brian Trudeau, CIO of Amerex Energy, a wholesale energy broker based in Sugarland, TX. I heard Trudeau speak at an event sponsored by disk array maker, Compellent, where he gave kudos to VMware and also to Compellent for the ease with which its array (referred to as a “SAN”) could serve up storage to the virtualized server farm. He said that he had chosen Compellent over Dell, EMC, and Network Appliance for this very reason.

Trudeau said that Compellent’s array was more compatible with VMware than even the V-ware’s then-parent company EMC’s array products. “Sure,” said Trudeau, “you could map virtual LUNs to spindles with EMC, sort of the same way that you can tweak a ’57 Chevy to make it run faster, but why would you want to go to all that trouble? Compellent simplified this process, plus it lets us snapshot LUNs and replay them to other LUNs at the recovery center, making failover processes easier to manage.” The solution, he said, enables him to recover all servers within hours of any disaster.

There are several ways to expose storage to a VMware server. Each one has its pluses and minuses.

The most common method for configuring the server-to-storage connection in VMware environments is to create a VMware datastore, a partition running the VMware file system, VMFS, in your storage environment and link a server to a virtual disk within this datastore. Once storage has been provisioned to the VMware ESX Servers, the VMware administrator is responsible for provisioning virtual disks to applications with most operations run through VMware Virtual Center.

Scott DesBles, principal storage architect at Compellent, notes that the Compellent SAN virtualizes at the disk-level so an administrator can create a volume for VMware where all the drives are used together in a shared pool. Built-in wizards guide the administrator to connect the host servers to the SAN.

“You would first create a volume on the SAN and specify its RAID using our GUI,” DesBles says. “The Compellent software will then walk you through mapping the volume to your VMware ESX servers; after which, you initialize that volume as a new datastore. Once the datastore is ready to store virtual machines, you simply repeat the process, mapping it to additional ESX servers to create a clustered datastore. The whole thing takes just a few minutes.”

The plus side of this approach is its familiarity to server administrators who know VMware; however, the solution may degrade if too many virtual machines (VMs) are connected on the same datastore. This is the case in non-VM server to storage connections as well, but it's important to keep in mind that, using this method, the storage fabric or network and I/O queue are shared among all VMs residing on the datastore. Bottlenecks can be alleviated by balancing virtual machines between multiple datastores.

The downside is that troubleshooting performance issues can be a pain in the neck and scaling requires that you continuously create new datastores and balance the load more or less manually amongst them. VMware advocates are quick to point out that the vendor is adding functionality to alleviate and simplify some of these tasks.

The second option is to map raw storage devices manually to the VMware ESX Server using what VMware calls "Raw Device Mappings" or RDMs. Each virtual machine has its own I/O queue mapped directly to a LUN, thereby eliminating I/O contention, because only one virtual machine accesses the LUN. When using a datastore, some of the Fibre Channel or iSCSI target storage is shared, but when using a raw device-mapping scheme, the I/O queue is not, thereby preventing bottlenecks. This is not unlike the way storage is provisioned to servers today: with careful consideration of path diversification and queue depths, headaches can be averted.

This approach has the strength of providing higher disk performance in certain cases (particularly with high-performance, transaction-oriented applications) than does the VMware datastore approach described previously. Also, raw LUN mapping lets you retain your physical clustering approach used with applications such as e-mail or SQL Server (such as Microsoft Cluster Services) in a virtual machine environment. An increasing number of storage vendors are working to make their hardware easier to set up with VMware environments by adding LUN mapping aids.

The downside of this approach is that it requires a lot of cooperation between VMware and storage administrators to configure, deploy, and manage over time. Moreover, it may be difficult to replicate data efficiently for DR purposes unless, as in Compellent’s case, wraparound features such as LUN snapshots, remote replication, and replay have been designed into the array itself to make these operations easier. (VMware is also planning to release some new functionality this year to facilitate these tasks.)

In a Compellent environment, the administrator can schedule snapshots at regular intervals—every 15 minutes, for example—so if a data hazard or virus strikes, the admin can revert back to a clean point in time. Using snapshots and remote replication together, admins can also replicate multiple datastore volumes to multiple sites. By creating and storing space-efficient snapshots of VMware instances at the DR site, in the event of a server failure, administrators can map a backup ESX server to the SAN and be up and running in minutes.

The third approach for syncing storage and VMs is to use the Network File System (NFS) to store virtual disks. Vendors of NAS platforms like this approach, which was introduced in VMware ESX Server 3.0. However, performance and some loss of functionality, compared to options one or two above, remains an issue of debate among vendors and in the blogosphere.

The thing to keep in mind is that the server-to-storage link that, in the non-virtualized world was made using a host bus adapter or network interface card and software, continues to exist in a virtual server world. IT planners need to concern themselves with the ramifications of this fact before diving into VMware nirvana. Some key questions that must be asked (and too frequently are not) include the following:

How do your servers currently access storage? Do they boot from the storage platform or use internal disk? Do the applications you are running share storage? Are they high-performance apps that require the highest I/O rates that can be had from the storage device? The answers to these questions might steer you to one of the three storage connection approaches I've outlined. For shared, high-speed storage requirements, a raw storage-device mapping scheme might be required.

Second, be aware that the VM datastore and raw-attach methods may require significant hardware changes to work well. Depending on your VM configuration and the applications you are running, you may need to retrofit servers with more expensive HBAs to support multi VM access to multiple datastores or multiple physical LUNs concurrently. Make sure that server hardware upgrades are factored into the cost-savings estimate you expect to realize from server virtualization.

Third, ask yourself whether the VMware-storage solution you are considering locks you into a particular vendor’s hardware. Compellent and others make a good case regarding the complimentary nature of their products with VMware, and the friendly services they can provide to support resiliency and failover, but the fact remains that you need to deploy only that vendor’s gear for the full solution to work. Lock-in costs apply, so do the math.

Finally, and we will explore this more closely in my next column, consider how you plan to recover this environment if you ever have the datacenter collapse around your ears. While VMware offers server images as a shortcut to getting new devices up and running at an alternate location, many consumers have found that these image files, especially those associated with more complex multiple-VM hosting platforms, are huge. They are difficult to replicate over a wire and may incur a significant recovery time penalty if you need to write them and read them back from tape.

I will address some of these issues in greater detail next week. For now, your comments are welcome: