Trust in the Cloud is All About Transparency
Network-based application performance management may be just what service providers need to make the cloud ready for your mission-critical applications.
By Tanya Bragin, Senior Product Manager, ExtraHop Networks
The ramifications of Amazon's EC2 outages in April and August are still reverberating throughout the business and technology communities. Amazon faced significant criticism for its sparse communication during the April disruption but did a laudable job of providing post-mortem analysis. The Amazon episodes and similar cloud mishaps from Microsoft and Google certainly have resonated with organizations considering further adoption of cloud services. Everyone is asking: Is the cloud transparent enough to be trusted with companies' mission-critical applications and processes?
The demand for greater transparency is highlighted in a new study from the Cloud Industry Forum (CIF), which concludes that "there is a need for clarity and transparency in all contracts on key issues that impact and concern end user organizations." The onus lies with cloud service providers who must take measures to provide even more transparency about service disruptions or poor performance. To gain trust among potential users, these measures must be backed by legal guarantees.
This undertaking is not as easy as it sounds. For example, contract teams must negotiate service-level agreements (SLAs) that ensure quality and transparency. However, the real challenge lies beyond wordsmithing the legalese. On the technology side, delivering on new SLA transparency clauses requires application performance monitoring (APM) technologies that increase visibility in IT systems and operations.
What Does Cloud Transparency Really Mean?
Even though the Cloud Industry Forum's results found that nearly 80 percent of users are "looking beyond an SLA for comfort in the service to be provided," clear SLAs still are needed as a foundation for a trusted relationship between cloud service providers and customers. Joaquin Gamboa and Marc Lindsey, partners at Levine Blaszak Block & Boothby LLC, advise their clients to demand more from their cloud service provider SLAs in the following specific areas (among others):
- Clear thresholds for system availability or uptime, application response time, transaction throughput for software as a service (SaaS) offerings, and incident-response and problem-resolution times
- Proactive fault notification so that customers receive service credits for outages or poor performance without having to ask
- Root-cause analysis of service problems at no additional cost
- Escalation of chronic or critical service quality problems to senior management at both companies
Cloud service providers must deliver the type of transparency required by these SLAs before customers will trust them with mission-critical applications.
Barriers to Transparency
Unfortunately, as today's IT environments have become wildly more complex and dynamic than they were even three or four years ago, many cloud service providers simply cannot deliver this type of transparency. Cloud applications often span incredibly complex datacenter environments, and if performance suffers in one area, it can degrade the overall experience for customers and ultimately cause an SLA breach.
The challenge for cloud service providers and customers is that most network and application monitoring tools do not offer the necessary in-depth visibility across all application tiers needed to confidently provide definitive answers about the root cause of performance problems or errors. For example, with a NetFlow collector, a provider might be able to see where a throughput chokepoint is occurring and degrading the user experience for a SaaS application. However, these legacy tools include little or no visibility at the application layer, so the provider cannot actually see what is causing the slowdown and work to fix it.
Likewise, user-experience monitors offer details about the Web tier but often fall back to a sampled mode of operation when faced with heavy loads, frequently missing critical details. Other legacy APM tools rely on custom performance agents that need to be recertified and redeployed constantly. These agent-based APM tools can end up creating more problems than they solve. In fact, one large governmental organization spent $6 million to deploy 20,000 desktop agents for end-user experience reporting, only to find that their vendor would not support their environment if they upgraded to Windows 7. Such limitations and costly mistakes add to the challenges of cloud service providers striving to deliver additional transparency for their customers.
Giving Buyers What They Want
To deliver on these new transparency demands, cloud service providers require a non-intrusive APM solution that spans the entire application infrastructure end to end, providing the crucial visibility they need. A new approach called network-based application performance management (APM) meets these requirements.
Unlike the legacy APM solutions designed to target limited portions of the application, network-based APM can passively analyze all transactions on the wire in real time, providing a single view of the entire application infrastructure across the network, Web, database, and storage tiers. With no agents to configure, deploy, or maintain, network-based APM equips providers with the insight to deliver on Gamboa and Lindsey's vision for a more transparent cloud.
When deployed in a cloud provider's datacenter, network-based APM solutions can achieve unprecedented results:
- Measure performance against SLA thresholds for system availability and uptime, application response time, and transaction throughput
- Enable cloud service providers to use real-time visibility and benchmarked performance data to precisely determine if an SLA breach has occurred so that they take the lead and proactively notify customers of service degradation or failure
- Drill down into each tier–while still seeing the relationships of health and performance metrics between tiers–to perform accurate root-cause analysis of intermittent failures and slowdowns
- Provide real-time data on application performance that can be communicated to executive management at the provider and customer organizations
Gartner Research estimates that cloud computing will be a $150 billion business by 2013. Alongside this growth, Andy Burton, chair of the Cloud Industry Forum, cautions, "In the online world where the number of suppliers is growing at an astonishing rate ... we are returned to the basic fundamental principle of Caveat Emptor—or Buyer Beware." This warning is doubly important when it comes to mission-critical business applications. However, a delivery model as promising as the cloud deserves consideration, especially if service providers can provide appropriate levels of transparency into performance.
Network-based APM makes it possible for service providers to give their customers an unprecedented level of visibility into the behavior of cloud applications, provide proactive notifications about potential slowdowns, and take the initiative in communicating about service delivery problems. With this type of transparency into what is happening and why, backed by corresponding SLAs, the cloud finally can gain the level of credibility it needs to take on mission-critical enterprise applications. Cloud service providers, are you listening?
Tanya Bragin is a senior product manager at ExtraHop Networks. Previously, she was a senior consultant with Deloitte & Touche Enterprise Risk Services, deploying application performance management solutions for Fortune 100 clients. She received her Masters in Computer Science from the University of Washington with a concentration on designing large-scale, service-oriented systems. You can contact the author at firstname.lastname@example.org.