Fundamentals of Application Performance Management

All APM solutions are not created equal.

Application performance management (APM) technologies provide IT managers and administrators with powerful problem-solving tools. However, not all APM solutions offer equivalent capabilities. To maximize APM’s vast capabilities, and select the right solution for your organization’s initiative, it’s important to understand how APM works. We’ll start by examining a few common problems IT departments face, and how an effective APM solution can solve these problems.

Consider, for example, an online retailer experiences disturbing indications of a slowdown. The number of online purchases through the Web site is steadily shrinking. An administrator checks the Web site performance indicators and finds end-user response times lasting many seconds for system responses and transaction completions lasting many minutes. Customers, fed up with waiting, are giving up and going elsewhere.

The search begins for a culprit. As usual, it starts with the database administrators. Their response is that the problem is likely in the application. Managers and administrators meet, and fingers are pointed in all directions across the IT infrastructure, from the Web server to storage devices. Finger-pointing and buck-passing escalates. Morale across the IT teams drops. Meanwhile, the public loses confidence in the company and its concern for customers.

A second example: End users across the company accessing critical network resources are experiencing slow performance. The diagnosis is that the slowdown is due to CPU contention, so more processors are added. However, the problem does not go away. Perplexing questions persist: what level of utilization is most cost-effective? Should we purchase more storage? Should we add more or faster processors?

APM can answer these and other challenging questions. Let’s quickly review how an APM solution should work.

End-to-end Correlation: The Critical Requirement for APM

When a performance problem strikes, the most immediate issue you have is isolating its point of origin in the stack—Web server, application server, database, or storage. By identifying the source of performance bottlenecks, you can end the finger pointing and focus on solving a specific problem in a specific layer. This requires an end-to-end solution, which is defined as a technology that monitors, analyzes, and correlates everything that happens between the end user’s keystroke and your system’s response on the end user’s screen.

Performance management tools are available from database and other vendors; these products analyze the performance of a specific layer or tier of the transaction path. However, these solutions can’t answer the most urgent question: which layer is causing the performance bottleneck? What if the problem is not in the level or the tier of the transaction path that your solution is monitoring? Are multiple levels contributing to this problem, and if so, how and to what extent?

While many vendors provide multiple point products, each of which can measure performance at a different layer, these solutions cannot correlate the data from the different levels to give you the overview you need. In a sense, each of the point products speaks a different language. They aren’t integrated and cannot collectively provide a single, unified picture of your organization’s application stack.

Leveraging a solution that provides end-to-end measurement in a live production environment is critical to the success of any APM initiative. Some vendors offer tools that can only be used in development environments with synthetic transactions, but these cannot be used to duplicate what happens with dynamic real-world traffic. To obtain meaningful measurements, you’ll need a system operating at less than full capacity and is performing actual business transactions. Synthetic transactions are just that—synthetic; they are not real transactions that end users originate.

Another potential difficulty with tools from some database vendors is that they provide immense amounts of data, obscuring the problem instead of simplifying it. It’s like trying to drink water from a fire hose. Specificity is necessary in order to isolate a performance problem.

Isolating and solving problems quickly and efficiently requires a fully integrated, end-to-end performance management solution that monitors all layers of your architecture, from Web server to storage, and correlates the information for you.

You don’t want a solution that affects the performance of your application because of excessive overhead. You don’t want your performance management software to become part of your performance problem. Identifying solutions that deliver maximum information, use less CPU, and enable administrators to monitor overhead at acceptable levels can aid in ensuring the success of an application performance initiative.

Making Sense of the Data: The Need for a Performance Warehouse

The most effective performance management solution will include three basic functionalities: performance measurement, alerts and reports, and analysis of collected data.

The solution will collect data from across the architecture and help use it to identify and deal with issues it discovers. Trouble in one layer may really be a symptom of something that’s happening in another layer. That is why correlation of data from different layers is essential, and why a performance warehouse technology is essential to effective APM.

It’s not enough to collect and store data from all layers of the architecture. A performance warehouse, where data can be assembled into a coherent sequence helps examine the chain of transactions that starts when the end user presses a key or clicks on a button. Your APM solution should use the performance warehouse to collect data and present it to you in a form that you can use immediately to find the root cause.

An administrator should never have to stare at a screen, waiting for something to happen. The solution should be capable of alerting you in your choice of media—e-mail, screen alerts, pages, etc.—that performance in a certain area has fallen below a specified level. It may tell you that a service level agreement (SLA) is not being met or that a server has become unresponsive.

On the other hand, if you suspect there’s a problem, you can use the historical data archived in the performance warehouse to perform trend analysis. The APM solution you chose should enable you to make useful comparisons between time periods. If you're interested in resource utilization for a certain batch process, you should be able to determine what happened last week, last month, or last year.

Faster Time to Solution

When you respond to an alert that tells you, for example, that a server has become unresponsive, you will probably be looking at a dynamic HTML dashboard that graphically identifies the problem tier. Click on an icon and the APM software takes you down through the layers. You can literally follow a transaction through the layers and see the response time on each tier.

For example, when an online shopper visits your company’s Web site, the APM program records the shopper’s IP address for reference and measures how long it takes for the data from the shopper’s laptop to reach the Web server. It then measures elapsed time for every stage of the transaction—J2EE execution on the application server, SQL statements executed on the database, accessing files and disks. It follows the transaction there and back—or, as we defined it earlier, end-to-end. You can easily find where the slowdowns are and can decide on the granularity of the measurements.

The benefits of this technique are clear. In dealing with performance issues, identifying the problem seems to require the most time, during which finger pointing may rage unchecked. Find the problem quickly, and you have improved the whole process of restoring service levels, productivity, administrator morale, and customer satisfaction.

Another important characteristic of the solution you choose is that it should have the same look and feel regardless of which database you’re working with. You should not need separate learning for Sybase, Oracle, SQL Server, or other database solutions.

Improving Hardware Decisions

Companies tend to throw money at performance problems—for example, buying more hardware or more CPUs to improve storage performance. Neither may be the right approach, depending on what the problem is. You may keep adding processors, as described in our scenario above, until performance starts falling off again. You can deal more successfully with this problem by using your APM solution to find out what’s really going on.

Drilling down to the storage layer lets you look at precise statistics about storage tier performance. They can equip you to make informed hardware decisions that are more effective. For example, many companies use storage and server consolidation as an effective tool for reducing hardware costs. This raises the following questions—What level of storage utilization is ideal? How many processors are too many? Would fewer and faster processors be a better way to go? Should I add more disk capacity?

APM can give you the hard information you need to deal with these challenges, and can unquestionably help with hardware budgeting.

The Bottom Line: Choose Your APM Solution with Care

As we’ve seen, the ideal APM solution:

  • Provides end-to-end measurement across all tiers
  • Correlates performance data to give you a complete overview
  • Functions with low overhead in a production environment
  • Lets you drill down quickly to identify the problem source
  • Alerts you when performance thresholds are crossed
  • Works with popular database solutions
  • Centrally stores data in an easy-to-use performance warehouse

This combination of features and technologies gives you maximum flexibility and power to sustain service levels and plan hardware utilization and purchasing. It can minimize finger pointing and customer frustration. And it can simplify life for IT managers and administrators.