In-Depth
The Four Pillars of Batch Processing Management
Enterprise computing's poor cousin is batch processing. In the 1970s, the newinteractive online database systems were considered the way of the future. Some predictedthat online applications would eventually replace batch processing altogether. In recentyears the Internet and World Wide Web have taken center stage. However, management stillrequires that important reports be available each morning. The overnight update ofdecision-assist tables is now essential to an organization due to the rise in popularityof data mining and business intelligence applications. For better or worse, batchprocessing is still alive and kicking.
But with the move towards 24x7 operations, the batch window has been steadilyshrinking. Less and less hours of the day are available to do more and more batch work.Without an unlimited budget to acquire ever bigger and faster processors, it becomes moreimportant for an organization to be able to tune its batch processing. A tuned suite ofbatch applications can result in CPU cost savings, increased customer satisfaction, andavoidance of hardware upgrades.
Tuning is not a simple task, although it doesn't have to be complex. To effectivelytune batch, the organization needs to take a strategic, top-down approach. It needs amethodology that is repeatable, yet flexible. Situations change. Old applications fadeaway and new applications are born. Some applications are just modified until they areunrecognizable to their original designers.
There is no silver bullet to performance management, and methodologies that work in oneorganization may break down in others. It is worth noting that a particular tool ortechnology should never influence decisions on how application tuning is implemented.Rather, the organization should always choose the tool or technology that is mostappropriate based on what needs to be done to improve performance. Each shop should defineits performance goals and then work on improving those areas that will afford the biggestperformance improvement based on those goals.
Performance management is an ongoing process. First, you must gather the performancedata, which should consist of the CPU times for each job, the start and end times of eachjob (to determine elapsed times), and the number of EXCPs for each job step. Thesestatistics can be gathered by the standard SMF facility installed in the OS/390environment. Service Level Reporter (SLR) contains built-in tables that capture much ofthis data. You can build DB2 tables to store this data, or simply record it in sequentialfiles. The important thing is that the data is easily retrievable when needed foranalysis.
Just as an architectural design requires a solid foundation on which to stand, so tooan application performance methodology needs a strong foundation. Such a foundation isbuilt using universal techniques, or pillars, if you will. This article presents four suchpillars that can be used to develop an effective application performance managementprocess.
To tune a batch window, you need to tune applications. An application is a set ofrelated jobs. The jobs in an application share run cycles, are often dependent on eachother to pass and receive data, and are usually built at the request of one user group,such as accounts receivable or the payroll department. A job always has at least one step,although it may have multiple steps. The steps are usually logically related and performsome unit of work essential to the application. Within the job, the steps are alwaysexecuted sequentially, although in some cases steps may be bypassed due to conditiontesting.
The job step is the smallest unit of work in an application. The job step executes aprogram, either a custom program written specifically for the application, or a systemsutility program like a database backup or IEBCOPY.
The First Pillar: I/O Management
The first pillar of application performance management is effective I/O management. Thecentral processing unit executes instructions at incredible rates, in excess of millionsof instructions per second. When the problem program makes a request to obtain data thatis not already in central storage, the operating system requests an I/O operation from theI/O subsystem. To the CPU, waiting for an I/O to complete is like a human waiting 100years for a letter in the mail. Although the actual I/O may take only milliseconds tocomplete in real time, thousands of such interruptions in a job, multiplied by hundreds orthousands of jobs, eventually consumes a large amount of clock time. Hence, I/O is one ofthe largest contributors to elapsed time in an application, but it also affords one of thegreatest opportunities for improvement.
There are two factors involved in improving I/O performance that of reducing thenumber of I/O operations and that of reducing the response times of individual I/Ooperations.
The easiest way to reduce the number of I/O operations is to ensure that the datarequired by the problem program is already in central storage when it is needed. Thesmallest unit of transfer between a direct storage device (DASD) and central storage isthe block (used by non-VSAM data sets) and the control interval (used by VSAM data sets).A block (or control interval) is a grouping of records for the purpose of data transferbetween DASD and central storage.
Storage areas called buffers are allocated to hold the data while in transit. The I/Osubsystem will fill the available buffers with data during each I/O operation. From thebuffers, the data is moved into central storage where it becomes available to therequesting program. Therefore, it makes sense to have buffers that will hold as manyrecords as possible. Buffers that are too small will increase I/O activity because too fewrecords are being transferred with each I/O. Optimizing buffers is one of the easiest andleast expensive methods of improving batch performance, and can result in exceptionalperformance increases.
The other major factor in improving I/O performance is to increase the throughput bydecreasing the individual response times of each I/O operation. Cache controllers helpaccomplish this by placing highly used data in memory cache which is stored in the DASDcontrol units themselves. Thus, the data transfer is between storage areas, not hardwareand central storage. But this may require hardware purchases. Less costly methods are alsoavailable. These include spreading out the workload amongst various storage devices. Thisway the access heads are not being overworked seeking records on the same volume formultiple jobs simultaneously.
Another method is to place critical data sets on less busy DASD, even to the extent ofisolating them completely on their own dedicated volumes. This reduces contention withother I/O requests. Such contention will increase the response time of each individualI/O. At the very least, keep the index and data portions of VSAM files on separatevolumes. This is so that the same access head does not have to satisfy requests for theindex and data portions on the same volume, resulting in unnecessary and time-consumingmovement of the access head. This is especially useful for DB2 tables which usually makeextensive use of multiple indices.
Production data and test data should be separated so that the less critical test datasets do not interfere with production processing. The operating system does notdifferentiate between critical production I/O requests and less important test requests.
The Second Pillar: Scheduling
The second pillar of application performance management is intelligent scheduling ofthe batch workload. Most shops make use of automated scheduling tools which improve jobthroughput by the means of automated job submission. Even if scheduling is done in a moremanual way, analysts will be able to make improvements using the principles presented inthis section.
The two main ideas in improved scheduling are parallelism and cloning. As was mentionedabove, a job can consist of one or more job steps, each step executing a problem programor system utility. Many times one job step will be dependent on the successful completionof a predecessor job step. But there may be times when a job step will be independent ofprevious job steps. For whatever reason, the application designers may have bundled theseunrelated steps together in one job. Steps which could potentially run in parallel mustnow wait for previous steps to complete, thus extending the elapsed time of the job.
The objective of parallelism is to separate out as many steps as possible from thissequential scheduling flow, and run these steps in parallel. Depending on the number ofinitiators or available tape drives, this will allow for more work to be donesimultaneously.
Another facet of effective job scheduling is the concept of cloning. Cloning, as thename implies, is the act of copying a function so that it can do the same logical work,but on different data at the same time. For example, if a program updates a largepartitioned DB2 table, you could run one very long-running job step. On the other hand,you could clone the program so that each clone updates only one partition in the table.
Parallelism and cloning are two simple yet effective ways of improving batchperformance. They are not costly in technical terms, but they may require some thoughtfulapplication re-design and a new mind set on the part of development staff. Eachorganization should investigate these two options and identify candidates whereappropriate.
The Third Pillar: CPU Resources
The third pillar of application performance management is that of efficient use of CPUresources. If a program is inefficient in its coding constructs, the computer will happilycontinue to execute the wasteful instructions over and over, consuming valuable CPU cyclesand multiplying the cost of running the job.
CPU efficiency is really a programming design issue. The best time to fix bad codingstructures is at design time or, at the very latest, during code construction. It costssubstantially more money to fix a code problem once the application has reached theproduction stage.
Performance experts have found many reasons why programs execute inefficiently. Onereason is that of poor looping techniques. Care should be taken that only appropriateactivities are included as part of looping logic. For example, any code that doesn't needto be in the loop, such as assignment statements or condition testing that are notdirectly related to the loop logic, should be placed outside the loop. Here the statementswill be executed only once; if placed within the loop, the statements would be executed asmany times as the loop is iterated. This would consume unnecessary CPU cycles.
Try to combine multiple loops into one. Instead of having many loops that do variousactivities, attempt to have one loop that does the various activities all within the sameconstruct. Because a loop has to initialize the index, increment the counter, and checkfor the end of the loop with every iteration, this saves loop overhead.
It is also a good idea to place the busiest loops inside when nesting loops (as intwo-dimensional table initializations). The loop that will execute the most times shouldbe placed within the loop that will execute the lesser number of times. Similarly withnested IF logic. The most frequently occurring condition should be tested first, to avoidextra instructions being executed.
Watch for unnecessary data type conversions. Modern high level languages allowprogrammers to easily move data from one variable type to another. But under the covers ofthe compiler, many instructions are being invoked to make this translation happen. Theseinstructions consume CPU cycles. If the data conversion is not essential to thefunctioning of the program, this is wasted effort and can affect the batch window andincrease the cost of running the job.
A final area where code can be made more efficient is by ensuring that the mostefficient search or sort algorithms are executed. Binary sorts are more efficient thansequential sorts. Can the data be pre-sorted by a utility before being passed to theproblem program? Does the development staff take advantage of pre-written sorts as opposedto everyone reinventing the wheel (and less effectively, at that).
Automated tools are excellent at drilling down into a program's CPU usage andidentifying where time is being wasted. Quite often such tools show that a simple changeto the program logic can save many CPU cycles.
Summary of the Four Pillars 1. Effective I/O Management - Reduce number of I/O operations - use optimal buffers.
- Reduce response time of each I/O - use cache controllers; spread data sets over
DASD volumes; isolate critical data sets; keep index and data portions separate; keep production and test data separate. 2. Intelligent Scheduling of Batch Workload - Parallelism - run job steps in parallel where possible.
- Cloning - copy program functions.
3. Efficient Use of CPU Resources - Use only appropriate activities in loops.
- Combine multiple loops into one if possible.
- Place busiest loops inside when nesting.
- Test most frequently occurring conditions first.
- Avoid unnecessary data conversion.
- Use the most efficient sorts and searches.
4. Use Quality Metrics - Track and report number of job failures regularly.
- Initiate a formal process of root cause analysis.
- Use information management tools.
- Use automatic restart where possible.
|
The Fourth Pillar: Quality Metrics
The fourth and final pillar of application performance management is that of qualitymetrics. You cannot control what you can't (or don't) measure. What gets measured, getsfocus. What is being measured in the case of application performance management is thequality of the application code, as evidenced by the job failure rate.
Each job failure (or ABEND) directly impacts the batch window. First, there is the timedelay while the problem is being investigated. If a database has been corrupted, theoperator will have to run a restore to a point in time backup to recover databaseintegrity. The operator will then have to restart or rerun the failed job. All of thisconsumes valuable batch window time.
By tracking the number of ABENDs, whether daily, weekly, or monthly, the operationsgroup can statistically prove which applications are of high quality, and which onesaren't. Regular reporting to management on which applications cause the most problems,perhaps in the form of a top ten hit list, allows the operations group to focus on theareas that require attention from the development and support groups.
Analysts can then do a root cause analysis on the reasons for the failures and applythe necessary fixes to ensure that such failures do not occur again. Over time, theoperations group can bring the total number of failures down to a level where the numberof ABENDs does not adversely affect the batch window any longer.
Such measurement activities are greatly assisted by the use of information managementtools that can track the job failure information; as well as automatically trap an ABENDmessage as it hits the system log, building a failure record and creating standard reportsbased on your criteria.
Whatever tools are used, it is imperative that the operations group have a process inplace to ensure that the ABENDs are being tracked, investigated, and that correctiveaction is being applied to ensure future quality is not compromised.
Every organization that uses batch processing should have a performance managementprocess in place to increase customer satisfaction and decrease costs. Such a processrequires a solid foundation of proven techniques. The four pillars of applicationperformance management presented here are an excellent start to building such afoundation.
About the Author:
Craig Hodgins is an Advisory I/T Availability Professional with IBM Canada in Markham, Ontario, Canada. In his 18 years with IBM he has worked in operations, systems and applications programming, and technical support. He is currently working in the OS/390 ServerPac development group. Hodgins can be reached at chodgins@vnet.ibm.com.