In-Depth

Avoiding Buffet-Style Overindulgence in Your Internal Cloud

Although clouds may never provide a truly carefree lifestyle when it comes to resources, these steps can ensure that they don’t create an expensive operational nightmare.

By Andrew Hillier, CTO and Co-founder, CiRBA, Inc.

Private internal clouds and related technologies are becoming a significant focus for IT organizations. The promise of increased agility and standardization in the supply of IT capacity is extremely compelling. Unfortunately, these benefits come at a cost in terms of the technical challenges and the behavioral changes that can arise when users are given “self-service” access to capacity. A combination of low perceived cost of cloud capacity, lowered barriers to access, and a lack of visibility into new application requirements (requiring users to err on the side of caution) can combine to create a situation where too much capacity is deployed. Like an all-you-can-eat buffet, plates will tend to be piled a lot higher when you help yourself than when portions are determined by the chef.

With the technologies available today, organizations can easily offer a menu of options for capacity and server instances to users. The menu itself is the easy part. The real challenges lie upstream and downstream.

On the upstream side, allowing application owners to select the capacity they need can be challenging when they are not familiar with the inner workings of the IT infrastructure. This effectively shifts the sizing problem from IT staff (who are experienced in these matters) to application groups (who may not be), creating inefficiency and inaccuracy.

On the downstream side, internal clouds can also significantly increase the cost of setting capacity if not managed very carefully, as organizations need to maintain spare capacity to ensure they can deal with fluctuating and unpredictable demand. In some ways, an internal cloud takes these two fairly significant IT challenges and links them together with a very efficient automation process, making the problem even worse.

Although most organizations are still in the early stages of planning internal clouds and haven’t yet tackled this particular challenge, some early adopters have developed ways to combat these issues and achieve the right balance. An effective solution emerging in forward-thinking IT groups is the concept of a “soaking pool” – that is, infrastructure earmarked as an incubation center for new or migrating workloads. With plenty of available capacity for profiling workloads before “releasing” them into the general population, this pool acts as a clearinghouse to isolate the risk from the main pools of capacity. This gives infrastructure and operations teams the ability to size and place workloads to ensure they get the right amount of capacity and the right kind of capacity (e.g., internal vs. external, SAN- vs. NAS-based, etc.).

Formalizing this process enables workload placements to be controlled by “cloud routing policies,” and the use of a soaking pool enables these policies to apply to both business criteria (e.g., SLA, data sensitivity) as well as observed utilization patterns (e.g., bursts vs. constant activity, high vs. low I/O).

There are several scenarios where this is beneficial, and there are also several methods of implementing a soaking pool. First and foremost, this model allows application owners to serve themselves, receiving the fast response they want while simultaneously allowing IT operations to reap the benefits from reduced risk and increased accuracy in assigning capacity and configuring VMs. It is also extremely useful in the migration of applications from existing legacy infrastructure, where platform changes (or even just the shift to virtual machines) can create uncertainty with respect to how the application will operate. This second value should not be underestimated. In many environments the challenge of “on-boarding” existing applications, which can involve migrating hundreds or even thousands of servers, is far more significant than the challenge of servicing net new capacity requests, which are often a trickle by comparison.

When it comes to implementing a soaking pool, there are many options. At its heart, a soaking pool is a concept that guides the process of deploying cloud instances, and the infrastructure to support this can be either physical or logical in nature. At one end of the spectrum, a soaking pool can be a dedicated virtual cluster that has an abundance of CPU, memory, and I/O resources, allowing rapid provisioning and accurate observation of an application’s requirements and patterns of resource usage. It can also be a portion of an existing cluster earmarked for new VMs, created by placing mature workloads on a specific set of servers and keeping other servers clear for volatile activity.

Perhaps the simplest implementation of this concept is to simply use reservations to set the capacity of new VMs, guaranteeing them an abundance of resources. Over time, the operational patterns of these VMs can be observed and the reservation “walls” can be torn down as confidence increases, allowing them to safely mingle with the mature workloads. Although not providing the degree of isolation inherent in the other approaches, this approach is simpler, provided there are tools in place to properly manage workload placements and resource allocations in a fairly sophisticated way.

Ultimately, this can be an effective way to enable organizations to optimize agility while providing greater control over resource allocations and capacity demand buffers that significantly impact the costs of internal clouds. A soaking pool will reduce the operational risk of unleashing unknown workloads into environments where they may impact others. Also, by eliminating the need to create elaborate (and likely inaccurate) upfront descriptions of workload demand characteristics, this empirical approach can greatly reduce the overhead of planning and greatly increase the efficiency of the resulting environment.

As J.P. Garbani from Forrester Research stated in his Nov. 10, 2009 report (Vendors Beware: Virtualization, PaaS, And SaaS Are Changing The Capacity Management Tools Market), “While it may seem intuitive that virtualization and Cloud computing would allow a carefree lifestyle for resource provisioning, they do not.” These are wise words, and capture the challenge facing organizations today in moving to an internal cloud infrastructure. By designing environments to include concepts such as soaking pools, and by employing analytics that enable the accurate measurement, placement, and allocation of cloud instances, some of these challenges can be addressed. Although clouds may never provide a truly carefree lifestyle, these steps will at least ensure that they don’t create an expensive operational nightmare.

Andrew Hillier is CTO and co-founder of CiRBA, Inc., a data center intelligence analytics software provider that determines optimal workload placements and resource allocations required to safely maximize the efficiency of cloud, virtual and physical infrastructure. You can contact the author at ahillier@cirba.com

Must Read Articles