In-Depth

Fault Free Software

Automated software inspections can quickly locate software faults — even lingering Y2K faults — that exhaustive test cycles might altogether miss, helping companies produce higher reliability software, on time and within budget.

We all know how disruptions caused by faulty software can be costly and even devastating in terms of lost business, damage to customers, and declines in stock value, which in turn may lead to litigation. Reports of serious disruptions caused by software faults are all too commonplace:

  • New Jersey Department of Motor Vehicles computer problem — a new software program failed in the first hour after installation, forcing all 45 field offices to turn away thousands seeking licenses, registrations and other services.
  • Oxford Health Plans billing problem — billing and payment software problems resulted in failure to collect hundreds of millions of dollars from member hospitals and doctors, causing large losses and a drop in Oxford’s market value of $3 billion in one day.
  • Maiden flight 501 of the European Space Agency’s new Ariane 5 heavy-lift rocket — due to a simple exception failure the rocket exploded 40 seconds into the mission, losing an uninsured scientific payload valued at roughly 500 million dollars.

Why Does Software Fail?

In the case of the Ariane rocket failure the problem was identified as a software exception in the inertial reference system. The exception was "caused during execution of a data conversion from 64-bit floating point to 16-bit signed integer value. The floating-point number, which was converted, had a value greater than what could be represented by a 16-bit signed integer. This resulted in an operand error. The data conversion instructions were not protected from causing an operand error." [Lions, Jacques-Louis, ARIANE 5 Flight 501 Failure: Report by the Inquiry Board, World-Wide Web, http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html, 1996]. The operand error occurred due to an unexpectedly high value of an internal function result.

Software fails mainly for two reasons: logic errors in the software and exception failures. Exception failures can account for up to two-thirds of all system crashes [Maxion, Roy A. and Olszewski, Robert T. "Improving Software Robustness with Dependability Cases." 28th International Symposium on Fault-Tolerant Computing, pages 346-355. Munich, Germany; 23-25 June 1998. IEEE Computer Society Press], hence, are worthy of serious attention. In general, an exception is any unexpected condition or event, usually environment- or data-driven, which would cause an otherwise operational program to fail. Many different types of conditions can cause exceptions including an empty data file, insufficient memory, type mismatch, wrong command-line argument, protection violation, and bad data returned from another program. These kinds of conditions can be guarded against, yet frequently they are not. Perhaps the most famous of these is the Year 2000 date fault.

Why Testing Alone Is Not Enough

The production of highly reliable software, on time and within budget, is a constant challenge for the software industry. Just as we know how disruptive failures can be, we all know how often schedules slip. Not surprisingly, one of the largest time and resource drains in a development lifecycle is testing, often comprising as much as 50 percent of a project’s life cycle.

Testing is a cyclical process that leads to incremental improvements, but it is impossible to perform "testing in" quality. Errors and bugs still find their way into applications, and testing can rarely check all possible scenarios. There are always more test cases to be tried, so testing is never finished, only abandoned. It is for this reason than many organizations, rather than test to exhaustion, are enhancing their test efforts through software inspection.

Software Inspection — More Effective Than Testing

More than 20 years ago, senior researchers at IBM began identifying methodologies and technologies for what is now called software inspection. Over time, inspection has been proven to be a very effective means of removing software faults. In fact, early results from Aetna Insurance Company reported that inspections found 82 percent of their errors in a COBOL program. Software inspection can lead to the discovery of defects before a lengthy and costly testing cycle begins, and often uncovers defects that testing might completely miss because of shortcomings in the testing plan. Ed Yourdon reported that he inspected 200 LOC in 45 minutes and found 25 faults, five of which could not have been caught in test. As a result, Yourdon judged inspections to have been more effective than test. [Yourdon, E. Structured Walkthroughs. 2nd edition, Englewood Cliffs, NJ; Prentice Hall, 1979].

Inspections can be done manually, having staff review all lines of code and identify problem areas, or they can be performed by automated software running on relatively inexpensive workstations, which is faster, more systematic, and less expensive.

Inspection Plus Testing: Double the Strength in Half the Time

To prove just how effective inspection can be, let’s run the numbers. Assume an application containing one million lines of code, of which only 0.2 percent, or 2,000 lines, were changed during a maintenance or new development effort. With a 3 percent error rate, we can expect about 60 errors. The 3 percent figure was based on a study of software reengineering projects that found that programmers routinely make three errors for every 100 modifications. Error types include omissions, modifying code that should not be changed and making incorrect changes. Although overall productivity varied greatly, this study found little difference in the error rate between experienced and inexperienced programmers. Here’s a quick definition for the way error types will be classified in this example. Show stoppers prevent the application from operating until they are resolved. Mission impact errors have a significant impact on application operations or cause damage to downstream operations. Minor errors result in inconvenience, while unresolved errors cause intermittent problems or remain dormant waiting to be triggered.

Assuming no testing, the 60 errors would remain. Of these perhaps five percent, or three, are showstoppers, and another 20 percent, or 12, would have a mission impact. 30 percent, or 18, would be classified as minor, and the remaining 45 percent, or 27, would remain unresolved. Now suppose testing occurs, but without prior inspection. Assuming a 30 percent testing efficiency rate, the number of errors would be reduced by 18, from 60 to 42. Assuming the same percentage impacts, two would be showstoppers, eight would have a mission impact, 13 would be considered minor and 19 would be unresolved.

Let’s build in an inspection cycle, assuming it to be about 60 percent efficient. We do inspection first, so our initial 60 errors are reduced by 36 to only 24. Testing catches 30 percent of these, lowering the remaining errors to 17, of which only one is a showstopper, three have a mission impact, five are minor and eight are unresolved. The 60 percent inspection efficiency rate is taken from the published experience of the Hewlett-Packard Company, while the 30 percent testing efficiency rate assumes the use of production data as the basis for regression tests.

Using inspection plus testing reduces errors 72 percent, much more than testing does alone. It also saves time and expensive test cycle resources - both critical success factors for this any project team. If testing alone was relied upon to generate the same results (reduce 60 errors to only 17), testing would have to achieve a 72 efficiency level, more than twice the rate commonly experienced in a production environment.

This data should not be misread as de-emphasizing testing in favor of inspections. The combination of the inspection and testing provides the greatest benefits. By eliminating a large percentage of errors before testing, the inspection process reduces the number of errors that must be caught during testing (24 rather than 60), thereby reducing testing cycles and the number of errors that will slip into production.

As a result software inspection has been increasingly used by major software development organizations as an effective means of insuring the delivery of high reliability software on time and within budget. An online bibliography of papers written on the subject is available at http://www.ics.hawaii.edu/~johnson/FTR/Bib/bib-master.html#Blakely91

Automated Software Inspection

As one can imagine one of the biggest hindrances to widespread deployment of software inspection is that it is a highly manual process, and as a result, is used very selectively on mission-critical sections of the software. Fortunately, technology is now becoming available that can replace humans with computers in the process – thereby, making it feasible for the first time to apply inspection to a complete application.

There are some key requirements for the automated software inspection technology:

  • Second-generation technology-based
  • , incorporating such sophisticated language processing techniques, such as alias analysis, data flow analysis and control flow analysis;
  • Accurate
  • enough to take much of the "pressure" off of testing — i.e.: be able to genuinely compress and improve the efficiency of the testing cycle while reducing the number of errors introduced into production;
  • Automated
  • to replace the human inspector in the process;
  • Extensible
  • to cover the wide variety of languages and environments found in the modern enterprise; and
  • Scalable
  • to be applied to large chunks of application code, in the neighborhood of 1,000,000 to 10,000,000 lines of code.

Because it is fast and inexpensive to implement, automated software inspection can very quickly and cost-effectively help in moving to fault free software. In fact, because automated inspection can be up to 100 times faster than manual code analysis methods, an entire application can be inspected for lingering a large class of faults in less than a week.

Automated Software Inspection: Application to Year 2000 Compliance

As we have said manual inspection, applied to tens of millions of lines of code, is a costly proposition. It requires an additional team of programmers to inspect the work of your primary team and, even with a large team, it is restrictively time-intensive. Thus, in a project where time is critical, automation can be the "make or break" scheduling choice.

As time runs short for Year 2000 compliance efforts, many companies are expecting their testing plans to resolve for them the unfinished business of locating the remaining Year 2000 defects in their code. But as we’ve seen, testing takes time — lots of it — and still-defective code found through testing and, subsequently, remediation needs to be tested again. The sad fact is, testing alone may not give companies enough time to achieve the Year 2000 compliance level they are seeking. However, by radically reducing the number of defects in previously remediated code before testing begins, automated software inspection can greatly increase the odds of success for a company’s Year 2000 test cycle.

Meeting schedules isn’t the only benefit of Year 2000 inspection, however. In addition to speeding the compliance process, reports from the automated software inspection tool provide a valuable audit trail in support of a company’s Year 2000-compliance efforts in the event of litigation. The use of an automated software inspection process becomes itself a strong demonstration of intent to minimize Year 2000 problems, and evidence of the company’s genuine effort to do so.

Flexible technology solutions will allow an organization to define its own unique Year 2000 compliance policy - for example, some companies have a policy requiring all programs to have 4-digit dates, while others have a policy of only repairing computational problems in their programs. Automated software inspection can then systematically and automatically review lines of code for the specific Year 2000 compliance criteria a company has established. As a result, valuable software engineers can be redirected to focus specifically on the code that really needs their skilled attention.

Yet as simple and as expeditious as automated Year 2000-software inspection may seem, there are still some important rules of thumb. First, it is imperative that the automated software inspection tool set and its process is different from the tool set and process originally used to analyze date defects prior to remediation. Similarly, staff members who oversee the automated software inspection analysis should not be the same analysts, programmers, consultants or auditors who worked on the remediated code — complete objectivity is crucial. This second point applies to any software inspection process.

Most Year 2000 efforts are well underway and remediated code has been prepared. If this is the case for your company, now is the right time to perform inspection, checking for defects before committing code to a lengthy testing cycle. After product integration, automated software inspection can be used again. Many changes have been made, but are all the changes consistent? Did any new errors slip in? According to a survey conducted in March 1998 by Market Perspectives Inc., 12 percent of the 300 companies reported Year 2000 re-contamination problems. Automated software inspection is fast and inexpensive enough to allow you to re-inspect the same code again and again.

Automated software inspection is also a useful buffer against suspect code that has been sent overseas or to a third party for remediation. This code is exposed to risks that may not exist in-house, including substandard quality assurance procedures. Automated inspection provides an additional and inexpensive line of defense.

Finally, automated software inspection provides companies with a highly effective spot audit capability. For example, while there is little opportunity for a company to test all their suppliers’ code, automated inspection can provide a rapid and easily implemented means of assessing a supplier’s compliance efforts.


About the Author:

Timothy Chou is the Chief Operating Officer at Reasoning Inc. (Mountain View, Calif.), responsible for software development, software manufacturing, product support, product management and Reasoning’s transformation services.


Must Read Articles