Q&A: Managing Performance of Cloud-Based Applications and Services
Before you leap into the cloud, look at these challenges and best practices of managing performance of cloud-based services and applications.
The cloud is an exciting place: it can offer temporary resources to free IT from buying and managing seldom-used resources, reduces maintenance costs, and may offer the latest technology your enterprise needs to stay competitive. However, IT faces several challenges thanks to the cloud. How do you set metrics for measuring critical application performance? How can you test this performance and incorporate it into your SLAs?
To learn more, we asked Imad Mouline, CTO of Gomez (www.gomez.com), the Web performance division of Compuware, to answer a few of our questions.
Enterprise Strategies: What should companies consider when deploying applications to the cloud?
Imad Mouline: Companies need to carefully consider how the cloud may impact application performance for your customers and end users -- that is, the response time or speed at which cloud capacity becomes available, ramps up and down, and/or delivers applications and services.
These performance considerations are extremely important. From the perspective of internal end users (such as employees), poor performance often translates to an application that’s difficult to use, which can lead to impaired productivity. If your end users choose not to use an application due to its poor performance, many sought-after cloud benefits (for example, greater cost efficiencies) are squandered.
From the perspective of external end users (such as customers), if your cloud-based application or service performs poorly, your customers won’t care who’s at fault. Instead, they will simply blame you, which can put your brand, revenues, and customer satisfaction on the line. That’s why enterprises using the cloud need to demand service-level agreements (SLAs) that guarantee specific performance levels all the way to the end user, based on the enterprise’s unique needs.
What applications are suited to the cloud, and which should not be migrated?
Although the potential benefits of cloud computing are well-documented -- including pay-per-use billing, scalability, flexibility and fewer IT headaches -- many businesses are still reluctant to entrust something as important as the performance of their mission-critical enterprise applications and services to an outside vendor because there are potentially huge ramifications for end users’ Web experiences and, as noted above, revenues, brand image, and customer satisfaction.
For a business using cloud services on occasion to support behind the scenes, back-end applications such as number-crunching, slight and infrequent dips in application performance may be tolerable. For a business using cloud computing to support a highly visible, revenue-generating application or service for a worldwide base of end users, unpredictable stumbles on the part of the cloud service provider are just unacceptable. Unless cloud providers are willing to build high performance guarantees into their SLAs, businesses are ill-advised to rely on consumer-oriented cloud service suppliers for anything resembling a mission-critical application or service.
What are the inherent challenges of managing the performance of cloud-based services and applications?
There are two primary challenges in managing the performance of cloud-based services and applications.
First, your enterprise must manage and ensure performance across an extremely wide range of usage scenarios. Your end users can have vastly different experiences based on the unique combination of their location, connection speed, device, and browser, and you must ensure superior performance across all (or at least the most common) of these possible configurations.
Second, managing the performance of any outside vendor takes extra time and money, but it becomes even more complicated when one considers the large number of performance-impacting variables -- ISPs, third-party services, and content delivery networks, for example -- that stand between a cloud service provider’s data center and your end users. This is known as the Web application delivery chain, and poor performance anywhere in the chain can negatively affect the performance of your entire application.
The irony is, in the event of application slowness, errors, or worse, unavailability, end users will not blame the cloud service provider or any faulty element in the Web application delivery chain. Instead, they will hold you responsible, which makes you responsible for managing performance of the cloud service provider as well as all the components of the Web application delivery chain that lie between their data center and your customers.
Although some cloud service providers are already investing in solutions to improve performance management across the Web application delivery chain, many do not yet, which places the onus squarely back on you, the cloud customer. Unfortunately, enlisting a cloud service provider does not absolve enterprises from the ultimate responsibility of managing performance in the same way that they do for on-premises-based applications and services.
What kinds of performance SLAs do cloud providers offer? If they are insufficient, what should an enterprise do?
Many cloud service providers offer SLAs promising, for example, certain uptime guarantees (such as 99.9 percent), but these SLAs are woefully inadequate. The fact that a cloud service provider’s servers are up and running does not say anything about network connectivity or APIs, which can have a huge impact on the application’s performance for the ultimate end user.
In addition, where response time and speed are concerned, many businesses automatically assume Google.com- and Amazon.com-levels of performance from services such as Google App Engine and Amazon EC2, but this can be a mistake. Cloud customers must take the initiative to create a more performance-focused SLA. The keys to doing so are to clearly understand why they’re using the cloud; evaluate cloud providers based on their reasons for using the cloud and the associated performance requirements/success criteria; and set SLAs based on these performance requirements and commit to ongoing performance monitoring in order to validate SLAs, ensure performance requirements are being met, and verify that they’re getting what you’re paying for.
What metrics and business considerations should enterprises weigh when evaluating cloud providers? Which of these factors should be incorporated into a contract or service-level agreement with the provider?
As I mentioned, the ability to demand and validate performance for cloud-based services and applications lies in understanding why you are using the cloud in the first place; evaluating providers based on your corresponding performance goals; and then establishing SLAs to support these goals and continually monitoring to ensure these goals are being achieved.
Certain reasons for using the cloud, such as trading CapEx for OpEx, do not necessarily raise issues of performance. Other reasons bring critical performance issues to the fore, and these must be addressed for cloud investments to serve the needs of your enterprise.
For example -- perhaps a business uses the cloud:
- For elasticity, meaning speed (fast ramp-up) plus capacity to handle occasional, sporadic overflows
- In a cloud-bursting model, meaning that instead of investing in an internal infrastructure massive enough to handle occasional, extraordinary traffic peaks (but otherwise sitting idle), the business buys just enough infrastructure to handle usual traffic patterns and diverts only peak, special-event traffic to the cloud
- To take advantage of the cost efficiencies of the cloud for full-blown support of business applications, 24 hours a day, seven days a week
Once you have determined exactly why you're using the cloud, your performance expectations will be dictated (which should be based on the unique needs of your business and end users) and you can set up SLAs geared towards these parameters.
For example, businesses using the cloud for elasticity need to understand what capacity must be available, and how quick the ramp-up needs to be (contrary to popular belief, cloud capacity is not limitless). Businesses using a cloud-bursting model need assurances that complex configurations will work properly and seamlessly, with minimal impact on the end-user experience. These businesses must determine how quickly and at what traffic threshold they must cloud-burst, how much capacity they need, and what traffic threshold should initiate “ramp down” (ensuring they pay for only the cloud support that is needed, and no more). Finally, businesses using the cloud to support business applications must first and foremost consider performance requirements for end users in order to reinforce the requirements in SLAs.
How should enterprises test the performance of cloud-based services and applications? How can enterprises ensure that performance will be the same as (or better than) the (artificial) test environment when all data and users move to the application?
End users’ Web experiences are subject to an extremely wide range of Web “noise” -- ISPs, content delivery networks, carriers, connection speeds, browsers, devices, etc. -- which can ultimately color their experience. Any effort to understand, validate, and optimize the performance of a cloud-based service or application requires a clear understanding of how end-user segments around the world are actually experiencing the cloud-based service or application.
Artificial test environments may not deliver this view, and businesses should therefore leverage comprehensive testing networks comprising real-world desktops and devices from around the world. These networks offer a fast and easy glimpse into end-user experiences around the world and help to pinpoint performance-impacting variables across the Web application delivery chain. These “outside-in” testing networks can also help simulate real-world loads to test cloud-bursting configurations prior to putting them into production.
What is the optimal timeframe/frequency to test the performance of cloud-based services and applications?
Cloud customers need to conduct ongoing, rigorous performance monitoring -- during the cloud vendor selection process as well as before and after application deployment. After deployment, monitoring is the only way for cloud customers to make sure that not just implicit parameters (more general availability and security guarantees, for example) of their SLAs are being met, but also explicit parameters (specific, unique, business-goal-driven, performance-focused metrics). In addition, cloud customers should test every time any new, rich features (video, for example) are added to their service or application to make sure the cloud service provider can still ensure speedy delivery. Finally, cloud customers should load test in advance of any peak traffic periods, to ensure their cloud service provider can scale to support best case traffic scenarios.
How can a business evaluate if Web performance issues are due to the cloud or another factor?
By monitoring, cloud customers can proactively identify performance issues and diagnose their root causes (including those that lie both within and beyond the cloud service provider’s data center), ideally before end users are even aware a problem exists. In some cases, this ability can help cloud customers achieve negotiating leverage. For example, a cloud customer may identify application latency for end users in a particular geography and trace this performance problem to a cloud service bottleneck, such as under-provisioning in a regional data center. Armed with this data, the cloud customer can then incite the cloud service provider to take the steps needed (such as adding more dedicated capacity) to meet and uphold their application and service performance requirements.
What best practices can you recommend for controlling the performance of applications in the cloud?
Test before deployment, test after deployment, and test often. It’s important for cloud customers to test the right way -- which means, from the real-world perspective of end users in order to gain the truest, most realistic view of application and service performance. Only in this way can cloud customers determine what end user segments may be experiencing a performance issue, and then trace back through their Web application delivery chain to proactively identify and fix performance-impacting variables that stand between their end users and the cloud service provider’s data center.
The day should come when cloud service providers will comprehensively test and monitor performance quality (from the actual end user’s perspective, when appropriate) themselves, and be able to provide (and validate) more specific performance-based SLAs tailored to individual business needs. Ultimately, this ability to put forth more meaningful, relevant SLAs should be a key determinant as to whether a given cloud service provider makes it to “round two.” Until then, the onus for testing, measuring, and validating performance lies on the businesses considering or using cloud services.
What role does Gomez play in this market?
Gomez’s Web load testing and performance management solutions use a comprehensive worldwide testing network comprising real user desktops and devices, which give businesses a critical, realistic first-hand view into how different end users around the world are experiencing their cloud-based services and applications. These solutions are available in an SaaS, pay-per-consumption model, which makes them fast and easy to use while giving businesses flexibility to test when they want and as often as they want. Gomez’s Web performance management services can be a key to creating and monitoring SLAs that drive greater confidence and wider adoption of cloud computing, helping businesses derive the flexibility, scalability, cost, and management benefits they seek while maintaining confidence and control over cloud-based application and service performance and end-user satisfaction.