In-Depth
Lowering TCO with Proactive Performance Management and Optimization
Four techniques enterprises can implement performance and optimization solutions that reduce TCO.
Organizations face a challenging economic environment. The need to contain costs is an absolute necessity, while demand for continuous systems availability and reliability is increasing exponentially. Web-enabled legacy applications have caused transaction volumes to explode, putting a greater strain on IT resources. The corporate directive mandates “do more with less,” but what is the best way to meet these demands while fulfilling existing service and performance service level agreements?
As the total cost of ownership (TCO) rises, business transactions become more costly. To address this economically demanding environment, companies must implement performance and optimization solutions that enable them to reduce TCO in four key ways:
- Model performance and plan for growth
- Manage application quality
- Streamline batch processing
- Optimize CICS processing
Model Performance and Plan for Growth
To reduce costs and process data efficiently, key resource consumptive targets must first be identified. Once targets have been identified for capacity optimization, testing tuning scenarios will help ensure that production response time and turnaround remains within agreed-upon limits. Capacity performance solutions automatically prepare representations of a company’s systems, enabling the IT manager to ask “what-if ” questions when answers are needed. These solutions also provide predictive capabilities for quantifying the impact of change on business application response time.
Let’s consider managing the hardware costs of disaster recovery (DR) as an example. DR strategies can be tested to ensure that acceptable performance is achieved in a variety of situations, and although the impact to the CPU generally can be assessed with a spreadsheet, the impact on throughput and response time requires an understanding of queuing theory (core to analytic modeling). Capacity management solutions help to manage the complex computations, calculating complex scenarios. These solutions help by focusing on resource-utilization rates and/or resource capacity rates measured over time. They rely on approaches such as trending, linear regression, simulation, or analytic modeling to predict the optimum points of resource required versus response time and throughput requirements.
“What if” questions answered by capacity-planning tools are crucial to maintaining acceptable performance at new volume levels. Capacity planners are often asked, “How much will this new application cost when it is rolled out to the users?” They quickly answer this question by modeling volume changes. Knowing exactly how much hardware is needed, and when it is needed, optimizes resource allocation and simplifies the budgeting process.
Manage Application Quality
Applications drive business and are the bread and butter of an organization. Additionally, application programs can also represent resource consumptive workloads. In many shops, more than half of application performance problems originate from inefficient program code or application design. As applications and systems have become increasingly complex, the need for specialization has increased. DBAs, system analysts, business analysts, and other specialists all help to manage the complex application environment, but few, if any, are responsible for ensuring that applications and programs run as efficiently as possible.
To ensure application efficiency, it is imperative to use an automated method to manage application quality. Without it, the costs from application inefficiencies will continue to rise and those inefficiencies may never be identified as performance improvement opportunities.
To effectively track, target, and optimize performance across the mainframe enterprise requires the application quality management (AQM) approach. AQM is a methodology for proactively optimizing mainframe application performance throughout the application lifecycle, resulting in significant TCO savings through deferred upgrades and resource optimization. AQM allows organizations to manage performance proactively by automatically targeting candidates for performance analysis while prioritizing and correlating performance opportunities for analysis.
A well-armed performance team will be able to proactively identify application performance opportunities as well as locate and solve problems quickly and efficiently when they do occur. Maintaining a proactive stance in preventing performance bottlenecks requires an accurate and timely collection of performance data. Prioritization of these opportunities is necessary, otherwise, additional manual effort is needed to organize and track these opportunities. Additionally, performance metrics must be tracked over time for trend analysis and performance value analysis. These trends and analyses of application performance provide crucial historical insight into the health and effectiveness of the AQM process in place.
To see an example of AQM in action, consider a large retailer with an online transaction processing customer requests from retail kiosks. The database had been growing and the volume had been increasing, causing transaction response time and performance to degrade. With the implemented AQM process, the inefficient online transaction was automatically targeted and prioritized for analysis. This led the performance group to a DB2 query with a poorly designed SQL statement. This statement had not recently been rebound, and RUNSTATS had not been run. Within hours of implementing optimization enhancements, over 50 percent of the delay and CPU for this DB2 query was reduced to less than one percent. The transaction then ran within acceptable service levels, alleviating the negative impact on their retail business during the busiest time of their financial year.
The AQM methodology arms an organization with the competitive edge in managing their application portfolio. Companies are able to tune their applications for optimal performance and realize the benefits of automation to continually identify and capture performance inefficiencies across their z/OS enterprise. Proactive application management identifies inefficiencies before they become critical performance problems. The performance baselines created by the AQM process also enable IT managers to improve application service levels, automate application tuning to reduce IT costs, and use the repository of historical and correlated data to assess and quantify AQM benefits.
Streamline Batch Processing
The batch cycle, while never a glamorous part of the z/OS processing environment, is still a key back-office workload that runs many of the core infrastructure processing for corporations today. As data volumes continue to grow, maintaining both an optimized and streamlined batch process is essential.
Well-defined batch- and online-processing windows once had the flexibility to shift processing times to take advantage of well-known periods of low activity (valleys). These were times when data volumes were lighter and resources more plentiful. Today, while batch processing is still a key workload, data volumes have increased exponentially and online processing has expanded to a 24/7 window, changing the picture of yesterday’s peaks and valleys into a plateau of near-constant demand.
Online applications have now become the priority workloads around the clock, and the static settings that governed batch jobs often fail to deliver optimal performance. To manage this change, batch performance optimization must be accomplished by improving data access, increasing parallelism, and optimizing data in memory from step to step within a job, and from job to job in a stream.
Three key techniques can reduce TCO through batch optimization:
- Data Optimization: Efficient I/O access can often provide large reductions in batch processing elapsed time. File optimization (i.e. buffer, string) changes help to reduce access to data stores on DASD, drastically reducing run times by providing improved access to data in memory. By automating the management of VSAM and non-VSAM processing, batch optimization is handled dynamically and automatically, eliminating the intervention of manual JCL manipulation.
- Job-Step Parallelization: Job steps can be run in parallel to maximize data sharing. By piping data from one step to another, steps complete sooner and resources are freed sooner (making them available for other sources).
- Job Parallelization (Job Piping): Parallelism can be further exploited by allowing jobs to pipe data into each other, allowing an entire job stream to complete faster. By extending the piping capability across LPARs within a Sysplex, jobs can run in parallel on alternate partitions or processors to further increase parallelism.
The techniques discussed above can be used separately or on combination to achieve the benefit of a widened batch window (shortened batch cycle). One BMC Software customer implemented a data and job optimization solution to shorten their month-end batch cycle by over five hours. This cut in half the time it took to complete their month-end processing, which improved availability of financial data and yielded two benefits: increasing customer service and driving additional business revenue.
Optimize CICS Processing
The Internet boom has dramatically increased distributed processing. This increased activity has created additional demand for online services, and when combined with increased enterprise database access, has increased reliance on CICS online processing. The challenge facing many organizations today is that static management of CICS environments often fails to deliver optimal performance. Peaks and valleys are now more frequent and less predictable. Thus, the need to automatically manage increasingly complex CICS environments is crucial to dynamically optimizing performance across workload peaks and valleys.
The process of optimizing CICS processing requires matching, in real time, the system environment to the processed workload. To be effective, an automation solution must dynamically manage and tune CICS system parameters and resources to address performance issues before they cause problems. This provides both increased region availability and optimization.
Availability: The management and optimization of key areas such as DB2 threads, VSAM, Transient Data, Temporary Storage, Trace facilities, Storage, and Transaction classes are essential to prevent CICS slowdowns and outages. Consider a region approaching a "short on storage" (SOS) constraint. With dynamic optimization, the EDSALIMIT would be proactively altered, preventing an SOS and possibly a region outage. This averts a potentially disastrous situation that would otherwise require manual and reactive intervention to tweak CICS parameters. The cost avoidance from this increased availability, often viewed as "intangible," becomes very "tangible" when a slowdown or an outage becomes a problem.
Optimization: While increased availability functions manage cost avoidance, optimization techniques assist in reducing CPU consumption. Reducing the CPU consumed by CICS regions optimizes resource utilization and directly benefits the company bottom line. For example, the number of MVS waits issued by CICS as it processes a transaction workload drives a significant CPU cost to highly active CICS environments. By dynamically optimizing the MVS waits on the quasi-reentrant TCB, transaction processing is enhanced by reduced CPU consumption, and improved throughput and response times. This allows critical work to complete faster and frees resources for other processes running in the system.
If a business depends on its CICS online processing for revenue, yet cannot effectively meet the demands of that environment, then its revenue is in jeopardy. I worked with a company in this situation; the company did not have the choice of a CPU upgrade, yet needed to deal with the peak processing for its online CICS processing. By implementing a solution that dynamically managed its CICS regions for peak and valley processing, an impending upgrade was avoided. The company also improved transaction throughput and response time, and reclaimed over 160 MIPS, saving over $500,000 in CPU processing annually.
Managing an online CICS environment is critical for today’s transaction processing environment. Achieving and maintaining a well-run CICS environment requires increased region availability and optimization. This ensures that regions perform at peak efficiency, within specified service levels, and with minimized system downtime and outages, thus reducing the costs for both the CICS environment, and the amount of dedicated resources required to manage it.
Conclusion
Today, doing more with less has become a necessity. To manage the ever-growing cost bubble, applications and subsystems demanding resources and running within the z/OS environment must be tuned and optimized. The key issues in managing TCO require that z/OS resources are managed effectively using “what if” planning for growth, application quality is proactively managed, and batch and online resources are automatically managed and optimized. These issues present a performance management challenge, making automated optimization solutions essential in translating it into tomorrow’s opportunity.