In-Depth
A Package Deal: Performance Packages Deliver Prime Tape Library Performance
After migrating from 3480 to 3490 tape drives, Randy Singleton, Data Center Manager for State Auto Insurance in Columbus, Ohio reports he and his staff began noticing their old 3480 MTCs were running more errors on the new drives. Until Singleton implemented a tape library performance service, they were compelled to manually check 3480 cartridges for errors. After implementation, problem cartridges and problem drives were automatically flagged for maintenance, and megabytes per temporary written error increased from below the industry average of 4,000 to over 70,000 MB/TWE.
This was a dramatic increase that may never have been realized had there been no alternative to manually analyzing library sections and individual MTCs. That alternative, a tape library performance software package or service, has become a useful tool for mainframe data centers.
Some of these software tools can be used as objective performance indicators for comparison with hardware vendor performance reports to improve performance beyond what the hardware vendor is willing to take responsibility for. They can be used to avoid problems during migrations and to fix temporary problems during migration. They can also be used by outsourcing services to vet and cleanse incoming tape libraries. Once satisfactory tape library performance is achieved with a performance tool, some packages and services can be used as tape library management tools to maintain performance without taxing data center staff resources.
Library Problems and Package Solutions
The data center of a large transportation company decided to improve its mainframe tape library’s performance by replacing all old magnetic tape cartridges. To do so, the center’s staff needed to determine the age and usage of 200,000 individual MTCs in several library sections. The solution was to use an independent tape library software package and service to analyze the library sections and to recommend those suitable for replacement. After replacement, library performance increased from 1500 megabytes per temporary written error to 60,000 MB/TWE.
Another insurance company used a tape library performance software package and service to solve a hardware problem. The service’s report recommendations were cross-referenced with the company’s hardware vendor maintenance report. A tape controller appeared to be malfunctioning. That information was passed along to the hardware vendor, whose customer service engineer confirmed the problem and fixed it. As a result, library performance doubled.
A drug manufacturer with an older tape library was experiencing hardware problems. Library’s performance was below 1000 MB/TWE. The library’s hardware vendor did not have a service tool for generating regular maintenance reports. As an unbiased means of measuring performance indicators for the purposes of analyzing hardware problems the company used a tape library performance package and service to generate weekly tape library reports. Those reports were then shown to the hardware vendor to point out performance problems in the onsite equipment. Library performance improved.
From Problem-Solving to Preventive Peace of Mind
The key to both tape library problem solving and tape library preventive maintenance is the automated tracking of performance indicators. A performance package makes tape library preventive maintenance realistic for the data center by placing minimal demands on data center staff resources. In addition to the automated tracking and reporting of performance indicators, at least one performance package also provides analysis and recommendations by application engineers experienced in 3480/90 tape library management.
Data centers that do practice preventive maintenance typically rely on more than one report to maintain and enhance performance. In the opinion of Mike Bump, Manager of Production Support in the data center of CNF Service Company (formerly Consolidated Freight), Portland, Oregon, the key to managing a tape library is using more that one performance indicator (or report). Bump uses three indicators in the CNF data center: offsite service monthly reports; customized, daily error-reporting software; and reports from the IBM Service Director. CNF has 62,500 active MTCs in its data center.
With three tape management catalogues containing a total of 400,000 MTCs, Sungard Computer Services in Voorhees, N.J., combines onsite daily reports of error from its hardware vendor with monthly tape-library performance reports from an offsite service. "The monthly analysis brings potential problems with drives or media to our attention that might not be noticed over a single day or a week, but become noticeable trends over a month’s time," reports John Gallagher, the data center’s input/output services manager.
Data centers, like the two above, use a comprehensive package to minimize time spent manually collating hardware and media performance data and to perform these specific tape library performance maintenance and enhancement functions:
• Compare data on the drive and individual MTCs, to determine the nature of a tape library problem.
• Focus on key comparative statistics, such as MB/TWE, to find media and hardware problems.
• Analyze trends over time with monthly reports – evaluate performance over time, as well as "snapshots" of performance.
• Cross-validate with other performance indicators and avoid having "all your eggs in one basket." Verify what the hardware system is doing and what the tape management system is doing.
• Segment the library, compare data on batches of media and flag defective cartridges.
Performance Package Automation Is a Must
Today, fewer people are staffing the data center. That has implications for how hardware and media maintenance is conducted. Manually checking individual magnetic tape cartridges for errors or bad sections is a labor-intensive process. Additionally, the storage capacity of large tape libraries of 50,000 to 200,000 or more tapes doubled from 1995 to 1997 and is projected to increase by a factor of six between 1995 and the end of 1999, according to industry statistics. So, fewer people are being asked to manage the storage of increasing quantities of data with little time left over for preventive maintenance of the manual type.
As a result of this situation, several automated means of tracking tape library performance indicators and reducing staff time spent tracking errors have emerged to join the traditional hardware-oriented maintenance solutions like the IBM Service Director. They include:
• Off the shelf utility software packages that can be customized to the needs of individual data centers.
• Report writer software solutions which are typically part of a tape library management system can be customized by the individual data center.
• Offsite services, ranging from the manual data gathering and evaluation of tape library performance indicators to the automated gathering and summary of 3480/90 drive data into monthly management reports with trends in performance indicators.
These solutions operate differently with different features, and they focus on different performance indicators. And, depending on the data center’s particular situation, some offer more actual labor savings than others.
Generic Sense Information
3480/90 drives generate "sense bytes." Sense bytes are the error indicators that come back from a drive when there is a problem. Every mainframe hardware device uses sense information and has some kind of error reporting system. Sense information defines what each specific error is. IBM developed EREP (environmental record, edit and printing program) for 3480/90 tape devices to read sense information, interpret it and print it out. But, EREP does not collate and format sense information for analysis, interpretation and problem identification. That is done either manually or by a tape library performance software package.
There are actually 32 sense bytes on 3480/90 tape drives. Those sense bytes conform to certain standards, which make the sense information capable of reporting a data check, a hardware unit check and a channel check. Different manufacturers do different things with those 32 sense bytes. Some of those sense bytes list out an error fault symptom code. IBM has its own fault symptom code, while another hardware manufacturer, StorageTek has a different fault symptom code. Regardless of device manufacturer differences, all fault symptom codes go into EREP, and EREP sends the fault symptom codes to the operating system.
Many types of hardware generate sense data. But only in the mainframe 3480/90 arena does the combination of available sense bytes and EREP make the pulling and reporting of performance indicators possible. For example, when a tape drive error occurs, it is flagged to the operating system. Then the operator must determine the nature of the error: catastrophic, retryable or temporary. To do that he must pull the sense information trail left by the error. Only in the 3480/90 mainframe system is the sense byte data sufficient to differentiate between these error types. Midrange systems do not generate enough sense data to make that determination. In the newer mainframe 3590 system, the sense bytes necessary for gathering that data have yet to be activated.
Another example of how sense data works in the 3480/90 environment is the buffered sense log. When a regular error occurs, a regular error message is offloaded to sense bytes. There are also buffered-log sense bytes, which are counts of how many megabytes the drive or control unit has written. Those offload to the operating system whenever they fill-up. The combination of these two types of sense data yields MB/TWE, a key library performance indicator that combines error information with the amount of data processed. That combination is called a buffered sense log, one type of important statistical information on library performance.
Establishing Performance Indicators
Here is how 3480/90 performance indicators pulled from 3480/90 drive sense data can be used for preventive maintenance of storage media:
• Establish a performance benchmark performance using MB/TWE and media standards. Overlay the total performance number (MB/TWE) along with average error counts of each device on a periodic basis to spot trends in performance. Important indicators are temporary write errors (TWE), correctable errors (ECC), erase gaps and transient errors (speed variations and block count errors).
• Use industry standards for media evaluation to develop a "pull list" of tapes marked as poor performing cartridges. These cartridges should be removed from the library when they reach scratch status.
• Establish profile information concerning efficient library usage. Important indicators are capacities of single volumes, single volume versus multi-volume datasets, block size, density, expiration date and last date used. This data can be useful to eliminate small datasets, inefficient block sizes or densities and planning for scratch tapes and storage requirements.
• Segment the library and chart the number of opens since birthdate, opens for the current period, write and read errors, and birthdates of each segment. Overlay this data on a periodic basis to isolate poor performing or inefficient usage in smaller volume ranges, rather than the overall library. In many instances a poor-performing library can be narrowed down to just a few volume ranges of media.
With this information in hand, the data center can develop a management report that will summarize most of the above indicators on just a few pages. This summary should include year-to- date averages and monthly comparisons that will indicate that a library is meeting set performance and efficiency benchmarks. Recommendations for improvement or comments on the library should also be included.
About the Author: George Pannhausen is a Senior Applications Engineer who oversees data center applications for EMTEC DataStoreMedia’s (formerly BASF Magnetics Corp.) offsite tape-library performance service, System Performance Evaluator.
***
SIDEBAR 1:
A Sound Insurance Policy
Randy Singleton, Data Center Manager for State Auto Insurance, reports his data center began using a tape library performance management service after migrating from 3480 to 3490 drives. The data center had 13,000 3480 MTCs in its tape library, and some of the original 3480 tapes were eight years old. He and his staff began noticing 3480 MTCs were running more errors on the new drives because the technology was different.
Those operating conditions were forcing Singleton and his staff to manually check tape cartridges for errors. "Before [using this offsite service] we had to check tapes manually, and try to determine if it was the drive or the tape. We would look at a tape that had a lot of errors and, to be on the safe side, decide to discard it, not really knowing if it was the drive or the tape," he says. "I didn’t realize how long at night it would take to read the tapes that apparently contained bad sections," Singleton recalls.
The solution to Singleton’s problem was a tape library performance management service that tracks individual tape drives, as well as individual tape cartridges. Now, there is no need to manually check tape cartridges for errors or wonder about their hardware-or-media origin.
Every month, Singleton gets a summary printout of tape library performance and the detailed performance data on a tape cartridge that is compatible with his CA-1 Tape Management System. "[The service] keeps track of everything and prints it out. We have a printout that comes to the customer engineer for the data center’s hardware. That makes him aware of any tape drive problems that show up in the summary," says Singleton.
In the beginning, the service’s pull-list of problem cartridges averaged 100 tapes per month. Two years later, the pull-list is down in the single digits. "Over time we have identified and removed the problem 3480 tapes from the library," says Singleton. But, they still have some of the original 10-year-old 3480 MTCs in the library ... those that continue to perform up to industry standards.
G.P.
***
SIDEBAR 2:
Home Office Improvement: Michigan Retailer Uses Tape Management for Thousands of VSE Tapes
ACO Hardware is known to do-it-yourselfers throughout Michigan as the friendly place to purchase what they need for those weekend home improvement projects. In its 39 locations, the Farmington Hills-based retailer stocks tens of thousands of hardware items to assist homeowners in their improvement successes.
But the ACO computer operations staff, which manages the mainframe applications that keeps track of all of those warehouse items and the 39 stores in the chain, had a real problem last year. Tape management is a critical part of ACO’s bet-your-business inventory system. The job of keeping track of 5,000 tapes for the tape drives – two 3430s, two 3490s and one 3420 – for its IBM 9221 model 30 mainframe became difficult when upgrades to their VSE-based tape management product were not delivered.
Ned Doerr, now Manager of Technical Services, who was initially at ACO on a consulting basis, did some research into tape management systems. Doerr believed that TAPE2000, an automated tape library management system from Software Engineering of America (SEA) of Franklin Square, N.Y., would be a good fit.
In evaluating the SEA product, Doerr found that TAPE2000 provided a superior CICS interface compared to that of the CA-Dynam/T product, which the ACO computer operations staff found to be "cumbersome" to use. TAPE2000 was also fully compatible with the most recent IBM release of VSE/ESA, which ACO is running, which was delayed in upgrading to the latest release of the mainframe operating system.
One of the key features of TAPE2000, from Doerr’s perspective, was the fact that TAPE2000 can run parallel with any existing tape management system. This meant the ACO computer operations staff could continue to work with the old vendor’s system until they were completely comfortable using the SEA product.
Installation
When he joined ACO officially as manager of technical services in January 1999, there were just three weeks remaining in the previous tape management vendor’s contract. Doerr had less than a month to complete the TAPE2000 implementation; as it turned out, he needed less the one weekend. "We had been told that TAPE2000 can be installed and operational in hours, with no hooks or IPLs required ... but seeing is believing," he says.
Doerr said the computer operations manager and staff quickly came to appreciate the flexibility of the TAPE2000 system. With the GUI interface, they can quickly move from screen to screen to get different views of the tape library system. TAPE2000 features ISPF, VTAM, GUI Windows and CICS interfaces. It requires no RE-IPLs and allows for multiple locations to be managed in one single database. TAPE2000 provides the operations staff with complete online tracking and management. TAPE2000 stores all information online and edits data as it is entered. TAPE2000 protects against overwriting tape files, offering immediate scratch protection and verification. TAPE2000 features complete online vaulting, tracking and management. It eliminates the need to run batch programs to find errors or process update requests.