Capacity Planning in a Virtual Environment: A New Approach for an Old Problem

More proactive capacity planning can get IT closer to a fully virtualized data center.

by Jon Reeve

Capacity planning isn’t new and it isn’t brain surgery. In fact, whether it’s about trying to fit visiting family into limited guest room space or packing your suitcase for an extended vacation, you’ve been solving capacity management issues for years. Understanding how to maximize space without sacrificing the experience is critical to capacity planning success. Different environments require different approaches, and capacity planning in a virtual infrastructure is no exception.

Capacity planning has become a critical component of virtual deployments where sharing underlying hardware resources (and the contention that inevitably arises between them) is built in by design. It requires consolidated views across the myriad of IT silos of the virtual infrastructure, where consumption and waste can be understood in the context of the real-world business processes and applications. For this article, we define capacity planning as the process of ensuring that the IT infrastructure can support agreed-upon or target service levels in a cost-effective and timely manner.

At its simplest level, capacity planning can be thought of as the task of balancing supply (CPU, memory, storage, I/O) with demand (applications/SLAs). It seems like simple economics, but the virtual environment brings new twists to traditional capacity planning practices.

First, virtualization involves a different level of abstraction, where the relationship between shared resources is in a constant state of flux. Although enterprises view the supply side of capacity planning as pools or clusters of virtual resources, they also need to understand which cluster or pool to use for an application and what the broad impact on a virtual environment will be.

Second, although capacity planners have always been concerned about waste (or how much oversupply or “headroom” to have in reserve), the concept takes on a whole new meaning in virtual environments, where the ease with which virtual machines (VMs) and applications can be created has led to a proliferation problem known as VM sprawl. Depending on the environment, waste due to VM sprawl can be considerable. Without proper insight into consumption requirements, enterprises risk vastly overestimating the resources needed to support their virtual environment.

Finally, VMs don’t exist in isolation. They run complex (often multi-tiered) applications that support many lines of business. Being able to profile how applications and business departments utilize (or potentially waste) the underlying physical resources is key. Even without a formal chargeback process, companies need to understand how resources are being consumed across the business so budget contributions can be calculated and sensible use of resources can be encouraged to meet demand.

Critical Factors

I’ve worked with countless organizations that assume they can take IT management processes they’ve used in the physical world and easily adapt them for their virtual environment. All too often the result is failure. Capacity planning in a virtual environment requires a different approach altogether.

Following are six critical elements of effective capacity planning in a virtual environment:

1. Identify performance issues. Performance can be affected in a number of ways, including CPU, memory, and storage I/O contention. To understand these bottlenecks, enterprises must track several metrics across clusters, hosts, VMs, and data stores so that as VM demand grows, organizations can answer the question, “Are we running too hot?”

2. Forecast when resources will run out based on historical trending. Accurately predicting the resource needs of an application requires access to historical trend data. Having a holistic view of the virtual environment, complete with critical metrics (such as average CPU utilization), enables data center managers to plot key points relative to the overall cluster utilization and determine when additional resources will be needed.

3. Estimate how many more VMs can be added to the virtual infrastructure. One of the most common questions I hear is, “How many more VMs can I add to my infrastructure?” This is more than just a matter of the resource demands of the average VM. If you are like most administrators, you likely have tiers of VMs and want to understand how many large, medium, and small VMs you have the capacity to support.

One large technology enterprise recently came to us after trying a number of capacity planning strategies. Every solution they tried provided capacity planning estimates based on the hypothetical “average VM.” This didn’t work because like many organizations, they had different tiers of VMs; they needed to understand how many VMs from each tier could fit in their infrastructure. Additional detail, including “usage profiles” that search for and model the usage of arbitrary groups of VMs, provided this company with a more accurate estimate of how many bronze-, silver-, and gold-level VMs would fit in each cluster.

4. Understand capacity usage from both an application and workload perspective. The resource demands of a given VM can vary drastically based on the nature of the workload it supports. For example, is this VM a Web server, application server, or database server, or perhaps a machine being used in dev/test/QA or production? Enterprises need to understand the top-to-bottom stack of the virtual infrastructure as well as the workloads that run across it. Examining the historical demands of existing workloads on the virtual infrastructure helps to more accurately estimate when and how you’ll need to roll out new applications in the future.

5. Tie it back to the business. Companies need better transparency into the “black box” of virtualization to justify existing investments, optimize performance, and ensure costs are divided equitably at budgeting time. Virtualization management solutions can provide open APIs and interfaces, both inbound and outbound, so resource consumption by department or business unit can be calculated and shared with the business.

6. Generate useful management reports that are actually useful. Reports are only as useful as the information contained in them. Recommended metrics include KPIs such as VM growth, consolidation ratios, and uptime. It’s also important for searches, reports, alerts, and trends to be targeted for specific departments, units, teams, projects, etc. These insights can be shared in a variety of ways, such as embedding into external portals so teams can start to get a handle on their respective consumption (and waste) patterns.

Capacity planning in virtual environments is a serious dilemma that is challenging traditional IT management practices. Too many solutions just focus on the virtual infrastructure, but the usage profile of a VM varies dramatically depending on the type of workload it is running. Organizations should seek solutions that can show what is running inside the guest so they can accurately profile the virtual resource usage of Web, application, and database servers, among other resources.

To date, virtualization has been slow to deliver on many of its promises. However, by taking more proactive capacity planning steps, we can get closer to our goal of a fully virtualized data center.

Jon Reeve is the director of product management at Hyper9, Inc. You can contact the author at jonathan@hyper9.com