System Sort: A Powerful Weapon in Your Y2K Arsenal
Commercial-strength sort packages have always done much more than just sort quickly and efficiently. They can also perform a host of data manipulation functions as easily as they sort. Modern sort packages can select, reformat, and convert data. They can also eliminate duplicate records, summarize/aggregate data, and write reports. This versatility is what makes sort packages such powerful tools in all types of mainframe applications from billing to statistical studies. And this same versatility is now making these packages invaluable aids to fast database loads and data warehouse staging on UNIX and Windows NT.
What seems to have been overlooked in all the shouting about Year 2000 problems is that most sites already have this powerful data manipulation tool on-site and many of their programmers know how to use it. With a little imagination, the sort package, the traditional power tool, can be put to good use solving Y2K problems with all of the following techniques:
- Date Field Conversion
- Test Data Mining and Aging
- Program Substitution
Best of all, these powerful data manipulation tools are available on all major platforms, both mainframe and client/server, with similar functionality and syntax.
Remediation Through Windowing
Commercial sort packages on the mainframe make full use of one of the most important concepts in solving Year 2000 problems: the century window. This simple technique allows you to tell the sort function which two-digit years should be in the 20th century and which in the 21st. And you only need to change your applications if you use windowing. You don’t need to change your data.
Windowing can be a very cost-effective solution. The Canadian Imperial Bank of Commerce (CIBC) actually tracked their costs. Initially CIBC began its Y2K remediation by expanding the date fields in its data to four-digit years. Then, beginning in 1996, it began implementing the far less time-consuming and less-expensive windowing approach. Because of its windowing initiative, CIBC was able to reduce its project staff by 50 and save $5 million in project costs.
Implementing a Century Window
How does a "century window" work? First, you select the 100-year span that is right for your organization. For example, choosing a century window of 1960 to 2059 implies that an organization expects all of its two-digit-year fields from "60" to "99" to be 1960 through 1999, and all two-digit years between "00" and "59" to be 2000 through 2059 and collate after 1999.
Once you’ve selected your century window, implementation will differ, depending on your operating system. On mainframe systems such as MVS, VSE and VM, where the Y2K problems in legacy applications are numerous, commercial sort packages have added new options such as CENTWIN and special Y2K data formats, which work together to treat two-digit year values as four-digit years. On UNIX, where the problem is less severe, the century window is normally implemented as an alternate collating sequence.
Additional Century Window Facilities
Because the century window becomes part of the application, most sort packages allow a sliding window. This requires only that a number of years be specified, and the sort then subtracts that number from the current year to set a century-window starting point. For example, specifying "20" in 1996 would create a century window of 1976 through 2075. In 2006, the window would "slide" to 1986 through 2085.
As large organizations started to use the century window, they found that they needed the technique extended to other sort-related data manipulation functions. One of these organizations is IMS Health, a leading provider of advanced decision support tools and other knowledge-based solutions to the health care community. When IMS Health began to implement its Year 2000 strategy, it found that windowing its sort function wasn’t enough. According to Meryl Raskin, a senior systems engineer on the Y2K project team, IMS Health also needed to extend the windowing technique to record selection because their sort applications often only sorted a subset of records. The records were selected or rejected by comparing two-digit date fields before sorting.
If the century window hadn’t been extended to record selection, "our solution would have been much more circuitous, and probably would have involved processing in two steps," says Raskin. But with the extension, IMS Health was able to keep most of its sort applications intact, saving the IMS project team "a lot of work," according to Raskin.
The COBOL Dilemma
Another company that is using the century window technique with all the standard enhancements is Olan Mills, the world’s leading producer of family portraits. Olan Mills uses the CENTWIN option for sorting and record selection along with the special Y2K data formats. Because Olan Mills does 80 percent to 90 percent of its processing under VSE/ESA 2.2 and uses a lot of COBOL programs, the company has many COBOL-invoked sorts that need to be remediated for the Year 2000. This is a problem in the VSE environment because many sites have not had the time to convert all of their programs in "old" COBOL (DOS/VS COBOL) to "new" COBOL (COBOL/VSE), which was released only a few years ago.
According to Pete Clark, the Technical Support Manager at Olan Mills, the sort package they are using has made this aspect of his Y2K work much easier because the vendor added an enhancement that allowed the century window to be used by COBOL-invoked sorts in both the old and new versions of the language. Since the century window is easy to implement, Clark and his staff are using it almost exclusively. "We stuck with the two-digit century window technique, and so far it’s working for us like a champ."
Converting Two-Digit Fields to Four Digits
Although sort-related Y2K problems can often be solved with the century window, some applications require that two-digit year fields be expanded into four-digit year fields. It is in such cases that the sort package can become a very powerful weapon in your Y2K arsenal.
The reformatting capabilities of sort packages on all platforms, including UNIX and Windows NT, can convert large quantities of data very quickly, making the packages ideal tools for Y2K field expansion. An added advantage in sort packages is that you can do heavy-duty data manipulation — without sorting. You only need to tell the sort to do a copy instead, and your data manipulation jobs will run faster because the sort will have less work to do.
Let’s look at an example of the simplest way to do a reformat/convert in an MVS application. You would identify input and output files, tell the sort to do a copy, and then write an OUTREC (output record) statement to reformat the records by expanding them. In the OUTREC statement, you would specify the beginning and end of the record in terms of input field position and length, and insert a literal string of "19" before the two-digit twentieth-century date field. With values included, here is an example of an OUTREC statement:
Here the literal "19" is inserted after position 30, making the output record 82 bytes long. The position (1 and 31) and length (30 and 50) specifications refer back to the input records, which are 80 bytes long.
A slightly more complex, but far more powerful conversion technique on the mainframe combines the copy function, the CENTWIN parameter, and a special Y2K format (Y2C) that expands the data:
SORT FIELDS=COPY, CENTWIN=1960
Here a copy instead of a sort will be performed, and the century window will be 1960 through 2059. And because of the OUTREC statement, which contains the special Y2K format Y2C, all two-digit date fields between "60" and "99" will be expanded to "1960" through "1999" and all fields between "00" and "59" will be expanded to "2000" through "2059."
If you have date fields in formats other than character, such as packed decimal, different Y2K formats can be used to either expand the data, or keep it in two digits for use with a century window.
Bridging: A Conversion/Testing Strategy
At some sites, jobs must be changed to accommodate expanded records, and the sort package is used to do a copy-convert operation, such as the one described directly above, in a simple yet powerful conversion/testing strategy called "bridging." An example will illustrate this technique.
Let’s say we have the following five-step job, and all the steps use a common input file:
Because the data in this job will be coming from an outside source and will contain four-digit year fields beginning in 1999, all the steps must be converted to accommodate four-digit year fields now.
Of course, we could fix all the programs and put the copy-convert step at the beginning of the job, but that would make testing very difficult. If something went wrong, it would be hard to isolate the step with the problem. Instead, savvy programmers have been working backward systematically. After fixing STEPE, they run the following job, using a copy-convert step as a "bridge":
In this job, STEPA through STEPD use normal two-digit date fields, but STEPE, after the copy-convert step, expects a four-digit year. If all goes well, they revise STEPD, and go on to the next test run, moving the "bridge":
After several more iterations, the final test job will look like this:
This technique is being used successfully to speed conversion and testing in cases where windowing is not an option, and both the data and the applications must be converted.
Mining Data and Aging Dates for Testing
Sort package functionality can also be used to create test data sets and age dates within them easily. Sort packages traditionally provide options that allow you to "mine" or sample data for testing. On the mainframe, two options that can help you do this are SKIPREC and STOPAFT. SKIPREC tells your sort package to skip a certain number of records before sorting or copying from a file. STOPAFT specifies the number of records to be sorted or copied from the beginning of a file. Record selection (INCLUDE and OMIT statements) can provide even more sophisticated "mining."
Another useful option is CHANGE, which allows you to change and/or expand a value in a record, and there are special Y2K data formats which can expand data, as described above.
An example will help us understand the mining/aging process. Let’s say we have an 8 GB file with 80-byte records and a mixture of the following two-digit year data:
We can select or "mine" about 80 megabytes from these records, and age the dates with standard sort package statements in a copy job like the following:
The STOPAFT statement tells the sort package to use only the first million records in a file, and the CHANGE option will change all "96" values in position 47 and 48 to "99," all "97" values to "00," and all "98" values to "01," so that we have aged the dates in the file to span the millennium change.
Since we are only changing a two-digit year field, this job can only be used to test remediation with windowing. To expand the date fields in addition to "mining" data and aging it, a slightly more complex job using the special Y2K data format Y2C and the century window (CENTWIN) parameter can be specified:
In this case, the date aging occurs on input in the INREC (input record) statement, and the format is identical to the OUTREC statement in the previous example. In the next statement (SORT), the sort package is told to do a copy, use a century window of 1970 through 2069, and stop after "mining" a million records. The century window (CENTWIN) is only included to allow the special Y2K format (Y2C) to be used on the OUTREC statement, which, in this case, expands the "aged" input records to four bytes, that is, "99" to "1999," "00" to "2000," and "01" to "2001."
Substituting a "Sort" Application for a COBOL Program
The options in a sort package can also provide an excellent alternative to converting a program in COBOL or another language for Y2K compliance. As we’ve already seen in this article, a sort package can select data and reformat it, and it can do both these functions in a copy application, that is, without sorting. In addition, sort packages can:
- Group records and output them in multiple files, all differently formatted.
- Write reports with title pages, headers, trailers, and sectioning with record counts and numeric field summing on the section, page, and report levels.
Although sort packages are not a substitute for COBOL or other languages, sort/copy applications are much easier and faster to write than programs, and they do not need to be compiled. If the features in a sort package are the same as those used in a program that has been lost or that must be converted, it is usually much faster to write a sort/copy application than to rewrite the COBOL or other program from scratch.
But perhaps the greatest benefit in substituting a sort package application for a program is that it is very likely to run much faster and more efficiently than the original program. And the sort package is a "new" tool for your data-intensive applications, which can be used well beyond the year 2000.
Many sites are already implementing windowing as a cost-effective way to make current sort applications Y2K-compliant. Others are using the sort package’s powerful data manipulation features to solve many types of conversion problems, even those not directly related to sorting. These solutions include:
- Reformatting two-digit date fields as four-digit date fields by using OUTREC statements or special Y2K data formats.
- Using "bridging" in conversion and testing jobs with several steps.
- "Mining" test data and aging the dates.
- Replacing COBOL and other programs with the powerful data manipulation functions available in sort packages.
The system sort can indeed be one of the most powerful weapons in your Y2K arsenal.
ABOUT THE AUTHOR:
Pat Salisbury is a Software Services Manager at Syncsort Incorporated (Woodcliff Lake, N.J.) where she has specialized in sorting techniques and led a wide variety of sort-related projects for almost two decades. She can be reached at firstname.lastname@example.org.