Q&A: Solving VM Management Challenges

IT has used a mish mash of tools to manage VM. It's time to find something better.

It has been tackling VM management issues with a variety of tools. There must be a better way. We asked Alex Bewley, chief technology officer of uptime software about what features IT should look for in a VM management tool and best practices for VM management.

Enterprise Strategies: What VM management issues are the most challenging for IT?

Alex Bewley: A large part of answering this question depends on the level of sophistication of each IT department. However, there are always core challenges present in every environment.

For me, the biggest issue is identifying resource bottlenecks, especially ones relating to storage I/O. Finding CPU/memory issues relating to ESX servers and VM's is fairly simple. If you have a bottleneck with storage though, it can be very difficult to track down the where and why it's happening. In fact, there could be several reasons, including too much network traffic, disks that are not fast enough, and too many VMs on a SAN volume. There can be a lot of metrics involved to try and resolve this sort of issue, and some tools just won't provide that depth you need to solve the problem.

Other VM management issues that are challenging for IT include storage allocation and the rapid consumption of any and all storage resources (this lead to very costly storage sprawl) and reporting on resource usage trends, monitoring system availability, showing VM sprawl, and displaying resource bottlenecks.

How has IT been managing VM environments? What deficiencies are there in IT's current approach(es)?

Generally, IT is trying to solve these problems with a mish-mash of tools that includes a collection of point tools (aka tool soup) for application monitoring, virtual infrastructure monitoring, and storage monitoring. With this process, there is a considerable amount of manual work to be done. IT teams must correlate data between multiple tools and understand application performance from the end-user point of view all the way down the physical/virtual stack of application servers, middleware servers, databases, and storage.

As you can imagine, finding the true problems by trying to navigate around different toolsets can be a nightmare, especially if the metrics are all different. This can lead to metric apples-to-oranges comparisons and tool users fighting as to which tool is right. Better to get all of this in one tool, as the problem gets solved quickly and there is no finger pointing or blame game.

What are the key features a VM management tool must provide?

Here's my short list:

  • Monitoring of performance, availability, and capacity across all servers, applications, and services in a single tool, so you have one "view of the truth" from the top of the stack to the bottom.

  • Automatic discovery of virtual instances and their dependencies (e.g., as they come online, they immediately appear in your dashboard with no manual actions needed).

  • Automated monitoring of discovered instances, meaning that monitoring and alerting is automatically applied to all VMs as soon as they spin up. This saves time as admins don't have to continually update the VMware deployment in the monitoring tool. In other words, no VMware blind spots.

  • Understanding of applications running in the virtual environment from an SLA point of view -- e.g., knowing that components of an application can scale up and down, and not alert on instances scaling down, adversely (and erroneously) affecting the SLA.

What are the best practices you recommend for VM management?

Here are a few:

  • First, understand your applications and how they are built -- which are I/O-intensive or compute-intensive so you can optimize your VM density without hurting performance

  • Second, get a tool that understands your virtualized environment and can monitor the physical components as well (e.g., nothing is totally virtualized yet, so you still need to monitor the performance, availability and capacity of the physical components in delivering an application)

  • Third, create templates of configurations that are standard so you can ensure license compliance and correct software and security versioning. Ideally, your monitoring tool can auto validate VM resources allocated, licenses, etc.

  • Finally, look for wasted storage -- that is, identify VMs that have been powered on recently but are unused (e.g., leftovers from testing, staging, or development). This is a great way to start attacking sprawl. Remember, wasted storage costs real dollars.

What advances have you seen in VM management tools in 2011?

This year, a greater understanding of the components of a virtual infrastructure (resource pools, storage performance) has been reached. IT departments are starting to span the entire virtualization stack from application monitoring to low-level metrics. There's also been a more in-depth understanding of metrics within the virtualized environment to more relevant performance information. Lastly, this year we've really seen a lot more integration with automation tooling.

Where do you see such tools heading in 2012? Do you think they're headed in the right direction?

Providing the "single-pane-of-glass" view from physical, virtual, and cloud is key. That's because it's important that vendor tools continue to make the system/VM administrator's life easier. Any tool that is difficult to use just exacerbates the complexity of managing a virtual environment. As Einstein said, "Out of clutter, find simplicity. From discord, find harmony." That's what your tools need to do for your IT environment.

Cloud is also becoming prevalent. Tooling will need to understand how internal virtual environments will burst or expand to the cloud while still providing a consistent reporting method for SLAs (it is crucial to how how applications are doing).

How has up.time 6 responded to current and future demands in the VM management space?

Up.time 6 integrates closely with vSphere's Virtual Center, which enables IT departments to monitor their infrastructure in real time. It includes automatic instance discovery and monitoring as well as sprawl killer reports.

Up.time is an all-in-one, complete VMware management suite that is licensed per-physical-server, so you needn't pay for instances, sockets, CPUs, etc. Our systems management software deeply monitors, reports, and provides alerts about the performance, availability, and capacity of virtual, physical, and cloud data centers and IT services. [Editor's note: a full list of features is available here.]

At the end of the day, we designed up.time 6 with three goals in mind for mid-enterprise and enterprise companies: save IT users and managers time with easy administration and automation of common monitoring tasks, save money by improving capacity utilization while ensuring performance, and improving service levels through faster mean-time-to-repair, intelligent outage handling and automated incident avoidance.

Must Read Articles