In-Depth

Cloud Computing in the Real World: 3 Best Practices for Capacity Management

How performance management issues change -- and remain the same -- in the new world of cloud computing.

By David S. Linthicum, Founder and CTO, Blue Mountain Labs

[Editor’s note: Enterprise Strategies welcomes Mr. Linthicum as a regular contributor to our newsletter. He will discuss contemporary cloud computing issues facing enterprise IT in his new monthly column, Cloud Computing in the Real World.]

With advent of cloud computing, enterprises are looking for new ways to measure capacity and performance. Although the elastic nature of cloud means that resource-intensive applications can go from dozens to hundreds of virtualized servers at the press of a button, this is not always the best (or most cost-effective) way to manage overall computing power and performance.

This article takes a look at how performance management issues change -- and remain the same -- in the new world of cloud computing. Additionally, we’ll look at what best practices make capacity management in the cloud work today.

What’s Changed?

Cloud computing provides enterprises with the new options of outsourcing both applications, such as enterprise-class business systems (SaaS), and infrastructure, such as storage and compute services (IaaS). Even development and testing has a place in the cloud (platform as a service, or PaaS).

It’s typically accepted that cloud computing provides the following capabilities as defined by the Search resultsNational Institute of Standards and Technology (NIST):

  • On-demand self-service
  • Resource pooling
  • Rapid elasticity
  • Measured service or pay per use

This is typically accomplished through a multitenant and virtualized infrastructure where many remote tenants share the same pool of servers that can be provisioned and de-provisioned by tenants (subscribers), on demand. In some cases, we leverage public clouds, or clouds that can be accessed by any number of organizations and individuals over the open Internet. In other cases, enterprises choose private clouds to serve a particular organization and no others. Sometimes there is a hybrid cloud model, which is a combination of the two.

What changes in terms of capacity management in the emerging world of cloud computing? A few things come to mind.

  • We can no longer assume that computing capacity is dedicated to a group of users or a group of processes. Everything in a cloud computing environment is shared using some sort of multitenant model. This makes capacity modeling and planning much more complex.

  • Auto provisioning makes some aspects of capacity planning not as important because capacity can be allocated when needed. However, considering that cost is a core driver for leveraging cloud computing, using capacity that’s not needed reduces the value of cloud computing.

  • We now have the option to leverage cloud computing systems as needed to cost-effectively provide temporary capacity. Called “cloud bursting,” this type of architecture was difficult to cost justify until cloud computing provided us with a cheaper “public” option.

What’s the Same?

What has not changed in the world of cloud computing is that it’s still computing. Many in the emerging cloud computing space have a tendency to define cloud computing as the “new disruptive model” that will change the way we do computing from now on. Although the new technologies and service offerings are indeed exciting, cloud computing at its essence is really the same multi-user/multi-tenant hosted computing services we leveraged many years ago. Thus, most of the existing rules still apply when considering capacity management.

Moreover, while many would argue that cloud computing does not require as much planning as traditional systems, including capacity modeling and management, the more enterprises leverage clouds, the opposite is proving to be true. Indeed, the core value of cloud computing is the effective and efficient use of resources. You typically pay only for the resources you leverage. You can allocate and manage only those resources you need, such as storage and compute services. That means cloud computing will bring a much higher ROI.

The dirty little secret right now is that cloud computing is not always cost effective unless you put together a sound architecture and resource utilization strategy that includes a sound capacity-management plan.

Best Practices

Considering the relative immaturity of modern cloud computing technology, a few best practices are beginning to emerge for capacity management and cloud computing. Consider the following:

Best Practice #1: Model capacity should consider the characteristics of a multi-tenant platform.

We’ve been here before with traditional multi-user, but the emerging cloud-based systems are a different animal. Clouds typically offer up services or APIs to access very fine-grained and primitive resources (e.g., storage). Of course, the APIs call back to physical resources, typically virtualized servers that many other tenants share. This is the case for both public and private cloud computing.

These services might be multiplexed across many different requesting systems. Access to these services or APIs can be relatively random in terms of patterns. The resulting behavior under load can be relatively sporadic considering that many requesting resources are accessing the same physical resources (disk, memory, and processor) at the same time. Thus, you need to model capacity accordingly, assuming the randomness and sporadic nature of the behavior.

Best Practice #2: Make sure to account for distribution.

Cloud providers typically don’t centralize your processing in a single physical data center unless you specify that in the agreement (at an additional fee). Thus, your request for 100 server instances to support processing may mean that some virtualized servers are allocated in a primary center, but dozens of others could be allocated to remote data centers, some perhaps out of the country. This is more of a public cloud issue, but could be true with private clouds as well.

When modeling for capacity, you need to account for the distributed nature of the cloud computing services, where network latency can become a more important factor. Again, in many instances the cloud provider makes the call dynamically as to what servers will handle your request for service when you invoke an API. The serving of the request could be locally or widely distributed depending upon the architecture (including distribution schemes) leveraged by the cloud-computing provider. You need to understand how all of that works.

Best Practice #3: Focus on understanding, modeling, and monitoring services, not systems.

Most cloud-computing implementations leverage core patterns of SOA, including the decomposition and use of services to create and recreate solutions. Thus, when creating a capacity plan where cloud computing systems are in play, the most productive approach is to focus on the services (APIs to the resources) and how they behave under dynamic loading versus modeling a system holistically.

This means that we may have as many as 100-200 points to model, including infrastructure and business application services. Although this would seem overly complex in the context of a capacity planning problem, it’s really a matter of understanding the behavior of each service under different loading scenarios, and binding those services into a more holistic model that should provide a good indication of system performance under an increasing or decreasing load. Path Forward

Capacity planning and performance monitoring in the emerging world of cloud computing add complexity to existing procedures, but it’s really about modeling capacity and performance in a world that shares and distributes more resources. Moving forward, there is good news and bad news.

The bad news: Cloud computing providers and the use of cloud computing within enterprises will make the process of capacity planning much more complex. The modeling approaches and modeling technologies are changing to accommodate these added complexities, but many capacity planning professionals will need to update their skills to keep up.

The good news: Cloud computing providers will provide more performance and capacity monitoring and management services as time goes on. Their users are demanding them, and thus those charged with managing capacity and performance will have better interfaces for their tools.

Many of the rules are changing, but just as many remain the same. The need for a sound capacity planning exercise continues to add a great deal of value and removes many risks, cloud or no cloud.

David S. Linthicum is the founder and CTO of Blue Mountain Labs and an internationally recognized industry expert and thought leader. He is the author and coauthor of 13 books on computing, including Enterprise Application Integration (Addison Wesley). David is the keynoter at the 37th International Computer Measurement Group's annual conference (CMG'11), December 5-9 at the Gaylord National Hotel in the Washington, DC area. You can contact the author at david@bluemountainlabs.com.

Must Read Articles