In-Depth
The Key to SOA Success: Application Performance Management
Given its widespread adoption, service-oriented architecture is clearly an integral component of cloud computing and software-as-a-service. However, without the management tools to address such issues as application performance, the rapid development and deployment of SOA-based services could fail.
by Paul Ellis
Given its adoption, it’s clear that service-oriented architecture (SOA) is alive and well. What is so truly intriguing now, 12-24 months after the SOA hype has subsided, is that SOA is now largely viewed as a given on a company’s journey to cloud computing or software-as-a-service (SaaS). As more organizations accept SOA as a mainstay of their IT infrastructure, especially as it relates to the cloud, one of the common themes that is emerging is how quickly new applications can be rolled out, particularly if they are aligned with a previous SOA initiative.
That’s the good news. The bad news is that if you don’t have the management tools to address such issues as application performance, the rapid development and deployment of SOA-based services could fail.
The Good News and the Unknown
With accelerated application development and delivery, measuring and managing application performance based on the actual user transaction experience is no longer a nice to have -- it’s a must have. Along with faster application development and deployment comes less testing and QA time. This drives an increasing reliance on faith that existing services will interact as intended with newly developed services to deliver a new business process, product, or service.
Unfortunately, all too many applications that function as intended in the confines of a test sand box crumble under real-world stress. The possible combinations of production-environment variables (including transaction loads) can’t be fully simulated in test environments. Production becomes the ultimate test and one that organizations can’t afford to fail.
As more enterprises open the door to customers, suppliers, and partners via portals and Web-based applications, the human visibility into customer satisfaction diminishes. Customer service representatives are no longer talking directly with a customer to place an order. Organizations no longer have the immediate feedback of hearing a customer’s voice and judging how pleased or displeased the customer is with the transaction. New self-service applications have replaced that. However, customers generally accept that lack of interaction as long as their expectations are met, why wouldn’t expectations be met? What could possibly go wrong?
Finally, what about those composite applications that invoke cloud services or depend on SaaS-related functions or data? It’s tough enough for organizations to manage what they can control inside the confines of their own data centers, but how do they manage off-premise services?
Laying Blame
We all now have extremely high expectations of transaction time. We are all conditioned to near-immediate response; our tolerance is measured in seconds. Nowhere is this more apparent than in the consumer market. Customers know alternate suppliers with the same or similar products and will defect and purchase from the competition in an instant if their service-level expectations are not met. These customers who used to vote with their feet now vote with their mouse.
Understanding and managing the actual customer transaction experience is critical to customer satisfaction, retention, and revenue growth.
Managing the Customer Experience
Managing performance and the impact on the customer experience is far more complex than producing historical reports on average response times and outages. Successful performance management must be based on real-time, end-to-end customer transaction monitoring and reporting across the entire SOA infrastructure including transactions that invoke mainframe, cloud, and SaaS-provided services. Historical data can, however, be a valuable input to correlation and probable-cause analysis.
Successful performance monitoring must help IT operations better execute its new charter of being the business relationship manager between the end user and the business unit. It must provide a comprehensive view of the entire customer experience with the ability to detect slow transaction times as they develop and provide alerts and probable cause analysis. Armed with this information, IT operations can proactively address the issue before the end user is seriously affected. Successful performance management provides a common language and reference data for use across the infrastructure for problem triage. Ideally, this is accomplished with actual, real-world customer transaction experience data and minimal overhead.
Managing Service-Level Agreements
Many organizations establish a contract in the form of a service-level agreement (SLA) to define both target and minimum acceptable performance levels for an application. Frequently, people think of SLAs when discussing external, user-facing applications in a business-to-consumer (BC) environment -- for example, retail sales, on-line banking, or other self-service portals.
In fact, SLAs should be part of the DNA of all applications. External customer or supplier SLAs may be more stringent, but an enterprise’s own employees need predictable and reliable internal application performance as well. Although external facing application failures or slowdowns can be costly in terms of lost revenue or damaged reputation, poor internal-facing application performance can also impact productivity and employee morale.
Much like the distinction between different SLA targets for external users and internal users, SLAs can be established at various points in the business process based on factors such as the nature of the transaction or the specific user. A brokerage firm, for example, would likely place the highest service-level priority on a specific business process (such as executing a trade). A delay in any step of a trade could make a significant difference in the amount of money saved or lost based on the overall transaction time.
The SLA for such a process would be set at a very demanding level. On the other hand, the service level required for a less-critical function (such as printing a portfolio summary) would likely be set at a less-stringent level. Similarly, the brokerage firm may choose to set a higher service level for a customer executing frequent or high-volume, high-value trades versus an occasional user executing smaller or infrequent trades. Although both users have high expectations for fast, reliable service, one may have a higher business value to the enterprise.
Another important aspect of application performance management (APM) in general and SLA management specifically is to have a common language and measurement for the SLA that reflects the actual customer experience the measurement must be understood by both IT and the business-process owner. Although IT will need much more detailed technical information for identifying and avoiding problems, this level of detail is of little value or interest at the business-unit level. Therefore, the tools used for SLA management should be capable of producing useful and meaningful information for both the technologist and the business-process owner.
SOA Environment Complexity
Heterogeneity is a way of life in today’s IT environment. Solutions span multiple hardware and software platforms provided by multiple vendors. These solutions may be geographically distributed and loosely coupled, but at the same time, highly interdependent. The management of these environments therefore must have visibility to the entire infrastructure that drives the application.
Even in a simplified SOA infrastructure, a typical SOA environment likely has many elements that a transaction must span. Any one of these elements could fail completely or cause a slowdown for an application or transaction. When this happens, rapid triage is required to quickly identify the failing component and direct corrective resources to that area before a slowdown results in a failure. This reinforces the need for APM solutions that provide visibility to the entire transaction path while highlighting why silo-specific monitoring alone is not sufficient for rapid problem identification and resolution. The most effective solutions provide a view of the full environment with a probable-cause indicator that helps pinpoint the failing component to help resolve the issue.
Beyond JAVA
Another variable that has come into play as SOA has become mainstream is that .NET has emerged as an enterprise platform for SOA environments, making it highly likely that a given transaction in its path through the SOA universe will span both J2EE and .NET platforms.
Our discussion is about managing a SOA, not managing parts of an SOA environment depending on what path the transaction takes. Ideally in this cross-platform environment, the enterprise will use the same toolset to monitor and manage the entire infrastructure. Using two different tools depending on the platform, even if they are from the same supplier, is not likely to deliver consistent results. You wouldn’t start a road trip in unfamiliar territory by trying to paste together multiple maps using different scales and in different languages and expect to get to your destination in the fastest time with the fewest roadblocks. An SOA performance management solution should facilitate using the same language and the same metrics regardless of the platform the transaction path takes.
Evolution and Growth
Although SOA is clearly a Main Street phenomena today, many organizations are still in various stages of evolution with a broad range of legacy and distributed applications, Web Services, and emerging SOAs. As organizations evaluate methodologies and solutions available for APM, a little extra effort to look at portability and scalability of an APM solution can provide significant payback.
Organizations should take into consideration a solution that adds minimum overhead, works in the current environment, and will grow, scale, and adapt as the environment changes. Talented help-desk and support staff are difficult to find, train, and retain.
APM solutions that provide views that can be used and understood by the novice while supplying valuable detailed information to the technical experts can help maximize the investment in personnel. Likewise, solutions that grow with the evolving infrastructure provide consistency in operational practices regardless of the organization is on the road to SOA adoption.
New Directions
SOA infrastructures have clearly matured and are at various stages of delivering on the promise of cost savings, efficiency, and business results. The importance of managing the user experience is paramount if SOA-based business processes are to be successful. Managing the user experience includes understanding the user, the business process, and how well that user is served through the business process. Application performance management is one of the three key infrastructure management tools organizations will need to successfully manage user experience and optimize SOA performance over the long haul.
Paul Ellis, CA Wily senior marketing manager for CA Wily application performance management and SOA initiatives, has over 20 years of IT experience spanning disciplines including worldwide marketing, product management, strategy and sales-related responsibility at such companies as IBM/Tivoli and Amdahl/Fujitsu Software Group. His background includes significant experience in infrastructure management software and on-demand applications, in addition to storage and communications hardware platforms.