Case Study: Finding and Fixing Security-Related Code Defects

Finding code problems was the challenge; a service provider's analysis held the answer

The order came down to Sharp Labs of America from company headquarters in Japan: test all code for software defects.

At issue were the multi-function peripherals (MFPs) the Digital Imaging Systems Department at Sharp Labs develops. These devices, such as a combination printer, scanner, and copier, are largely aimed at small offices, and pack a lot of code to handle all the functionality. Sharp supports a number of these devices, and the code bases for some are now 10 years old.

Of course, each year of maintenance just keeps adding and revising code—fixing errors, resolving issues, maintaining backward compatibility. As a result, the code “just got harder and harder to read and understand,” says Mary Bourret, the senior manager of the Digital Imaging Systems Department at Sharp Labs of America. “I figured it out, and it was going to take us something like six months” just to scan all existing code, to meet the headquarters directive.

Sharp’s code-scanning challenges, and the corporate imperative to produce cleaner code, were far from unique. The U.S. Department of Defense and the Software Engineering Institute estimate 5 to 15 defects plague every 1,000 lines of code. Those defects add up, with obvious cost repercussions. According to a study by the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST), errors in software cost the U.S. economy $59.5 billion per year, or about 0.6 percent of the gross domestic product. As a result of its study, NIST said “the path to higher software quality is significantly improved software testing.”

Catching code defects saves money, and the earlier the better. Given the potential for cost reduction, research firm Gartner recommends organizations implement code testing at all levels of the development cycle, since code-fixing costs skyrocket after deployment.

The Challenge of Manual Scanning

Even before Sharp Labs received its code-testing mandate, it had been looking for ways to better test its code’s reliability and security, especially since it had a new group in India handling software maintenance. “While they’re really good engineers, when we set up that group, we didn’t really set it up with really good coding standards, and we were somewhat concerned with the code quality,” Bourret says.

Initially, her group tried some Windows and Linux off-the-shelf code-scanning tools, such as lint, but the tools were too imprecise. “We’d turn on a few rules, and we’d end up with 30,000 errors,” she notes. Developers didn’t have time to sift through all the results.

As she was scrambling for a solution—in autumn 2002—her boss sent an e-mail noting a code-checking outsourcing service from a company called Reasoning. Bourret decided to try it. In March 2003, she sent a small amount of code to Reasoning, and “we were really happy with the results,” she says. “In particular, it was not just that they find the defect, but the way they report the defect back to you makes it really easy to understand if it’s a true defect or a false positive,” while also explaining to the development team where the fix needs to be made. “We had very few false positives as well, which was nice,” and Reasoning returned the code quickly; she’s never waited longer than two weeks to get code back.

How exactly does the process work? Companies send their source code to Reasoning, which uses static analysis tools to “generate a list of potential defects,” says Bill Payne, former CEO of Reasoning. Still, from this baseline, “what you don’t know are which ones are real [and] which ones aren’t.”

Thus Reasoning also has a process to weed out “the non-true defects,” he says. “So when we bring the list back to the client, it’s only a list of the 100 percent vulnerabilities.” For obvious competitive reasons, he doesn’t go into detail on how the weeding-out process works, except to say he’s honed the algorithms by working with data collected by Carnegie Mellon University’s Computer Emergency Readiness Team.

Why outsource code-checking? “Traditionally, engineers do not write for security vulnerabilities. They don’t test their own code as they write it, they don’t put checks in their code, and that’s been traditional,” says Payne. He notes while a lot of companies have put developers through security education, that hasn’t eradicated insecure code practices. “A lot of people think if we retrain the programmers, they’ll write secure code. But then they’ve had the ability to write non-defective code for 30 years, and they haven’t done that yet either.”

The Effect of 50 Developers Upon a Code Base

After Sharp Labs had success with its initial, small batch of code, it tried a second, larger batch. “This was the code that had been around for 10 years and had been maintained by probably 50 different engineers,” says Bourret. “There, Reasoning found maybe 98 errors, and almost all of that was in some area that had been under maintenance or had been rewritten two or three times.” Reasoning provided an Excel spreadsheet detailing the errors, which Bourret says Sharp could easily import into its bug-tracking application. “We already had a whole process set up for a bug: how it’s fixed and tested, then regressed before released,” and its engineers duly tackled the errors.

While Sharp Labs is primarily interested in reliability scans of the code, it also has Reasoning check code security, says Bourret. “For the price difference, it’s worth it to us to have that piece of mind.”

So how important were those 98 errors? Sharp’s engineers determined none were critical—they wouldn’t crash a machine outright, for example. When the code went to the product quality assurance (QA) team at Sharp’s headquarters, however, the cleaner code was apparent. The QA team vets all new or revised code before approving its general release. To check code, the team loads it onto MFPs, then runs them through a battery of tests without ever resetting them, a prelude to the updated code being released to the real world and used everyday by hundreds of thousands of machines.

The QA team’s verdict on the Reasoning-checked code: it was “the most stable of any release in 10 years,” says Bourret. “They ran it for five days … without a problem. Previous to that, it would crash, on average, once every three days.”

Sharp’s engineers weren’t negligent; most of the problems simply eluded obvious detection, and related to memory management issues or memory leaks. So Sharp Labs’ engineers created new coding standards to avoid similar problems in the future. “The funny thing is, probably if we’d mined our product database” of code defects, says Bourret, the engineers would have discovered memory management was a recurring issue.

Sometimes, however, it takes a third-party perspective to focus scarce resources on problems. Ironically, the development team hadn’t wanted a third-party service looking at its code—developers worried it would just return thousands of potential errors. Bourret says they warmed to the idea given the precise way errors were communicated.

When it comes to helping developers write more-secure code, Payne sees some common mistakes. “We cover about 70 percent of everything that CERT’s ever had in terms of the types of vulnerabilities,” he notes. The leading problem in the code Reasoning scans—“by more than two to one”—is buffer overflows. Next is invalidated fields—a vulnerability caused by a developer not verifying what goes into a field is what should go into the field, such as a name being a name, instead of a string of numbers. Other problems, though more minor, include race conditions—“where a hacker can grab a process between when it opens and the next process grabs it,” says Payne—as well as the use of temporary filenames, and weak random-number generation.

When Printers (Don’t) Attack

Back at Sharp, corporate headquarters issued another mandate: use AppScan, a tool from Sanctum (the company was recently purchased by Watchfire) to scan all Web pages and applications developed in house, looking for vulnerabilities.

Headquarters’ new mandate didn’t come out of the blue. “My concern had to do with the embedded software that’s on the chip that’s in the printer,” says Bourret. “All printers have processors in them, and those processors could actually be turned into attack machines, I presume, because they’re a processor, and while we’ve never had an attack like that—no printer has ever been turned against its company—the very fact that there is a processor and thus could be invaded” made her want to test it. To quiet mass fears of a distributed attack based upon compromised printer processors, she notes someone would have to be physically standing next to the machine to attack it, and also says Sharp builds in a number of security measures, including a periodically run checksum that verifies the processor is only running legitimate code.

Still, outsourcing the Web application code scanning “made me more comfortable,” she says, because when it came to Web-application security, “we didn’t have the knowledge of what to look for.”

Today, Sharp prioritizes code for outsourced scanning and plans to work its way through the whole code base, periodically revisiting all in-maintenance code “since that’s where we find a lot of defects get added,” says Bourret. “Usually you have your less-skilled engineers doing your maintenance activity, because they’re just coming up to speed on the product, so you give them things that can be done in a week.”

When it comes to Reasoning, Bourret’s only reservation is the service isn’t free. “I wish they were in-house, so we wouldn’t have to have the budget, so I could just say, 'Hey, come over here and do this for us.'”

Related Article

Q&A: Arresting Bugs Earlier in Development Cycle Cuts Security Costs