In-Depth

Case Study: Stopping Leaks of Program Code

Using pattern matching with information taxonomy tools to track sensitive information leaving the company

What happens when sensitive information in electronic form gets stolen or inappropriately piped through the corporate firewall? Answer: it’s probably gone for good, possibly finding its way to the Internet.

Loss of documents or information can devastate a company. For the video game industry, the threat hit home in September 2003 after an attack on Valve Software, makers of the immensely popular game Half-Life. The entire source code for the game’s sequel—over five years in the making—was stolen after attackers snuck Trojan code onto Valve’s computers. “Our speculation is that these were done via a buffer overflow in Outlook’s preview pane,” Gabe Newell, managing director of Valve, said in a Web posting after the attacks.

The motives of the attackers remain unknown, though speculation runs from competitors to diehard gamers wanting an edge to hackers looking for security flaws in multiplayer mode. The final effect of the theft—on Valve as well as future players of the now-delayed Half-Life 2—remains unknown.

Beyond the theft itself, the attack surprised many because it was so targeted. While security experts say such attacks aren’t rare, public disclosure is. Today, new tools can help notice, and stop, such attacks. “No one thing can stop the flow of information out of the company, but if you put enough nets out there, you’re going to find something,” says Mark Rizzo, vice president of operations at Perpetual Entertainment, and a former network architect at Electronic Arts. “Fortunately, to date we haven’t had an incident,” he observes.

Perpetual Entertainment is a startup company developing online games and the platform to support them. Naturally, Rizzo says,“we want to protect [that] as much as possible.” To do that, he’s been testing Content Alarm 1.0, an appliance from Tablus, for more than a month. “Because it’s so easy for people to send out snippets of code or design, we thought it was something to look at.”

Many companies fear losing control of sensitive information, even if it’s not stolen. “In reality, a good portion of the security person’s life is stopping the accidental release of information,” says Jim Nisbet, president of Tablus. With the numerous regulations affecting many industries, some companies are taking radical steps. “We’ve talked to a couple of companies that have taken departments off their corporate intranet because they don’t want it to go out to the corporate Internet,” he says.

Content Alarm sits passively on the network, watching for inappropriate content leaving the firewall. It runs on a hardened version of Linux, and works “with near-router performance,” says Nisbet. “We can keep up with bursts of near-gigabit speed without dropping any packets.”

The box watches the TCP stream—including on non-standard ports—and captures and reassembles information, then passes it to a content analyzer, which breaks open any compressed files, looks at files in their native text representation, and tests document characteristics. The software also notes such things as destination IP address, and references security policies. If something isn’t in compliance generates a Simple Network Management Protocol (SNMP) alert.

Content Alarm founders come from sniffer and knowledge management (KM) companies, a focus that is manifested in the appliance. “We have an agent that runs on a file server or close to a file server, and basically it crawls sensitive documents” and learns from the expressions and content it finds, says Nisbet. Administrators tell the box which sorts of files are sensitive—and which aren't. Think of it as a KM approach—classifying information into taxonomies—to know what’s sensitive. Content Alarm can also watch such things as the public Web site. When a previously sensitive press release gets posted, it adjusts accordingly.

Why not just subject all corporate content to keyword filters, looking for sensitive information? Simple pattern matching helps catch some things, but when applied to “confidential, or restricted distribution information, the precision on that is just really ugly,” says Nisbet. It helps if the software actually knows what the documents are talking about.

Content Alarm, ironically, can also protect organizations with KM software implementations. “Some content owners, and security people, are petrified by KM technology, because in KM you’re helping people find things, and often you’ll get the content owners alarmed—‘You can find this?’ Often it was protected by obscurity,” says Nisbet.

To test it, regardless of claims, “I’m one to be somewhat skeptical and want to beat the living you-know-what out of it,” says Perpetual Entertainment’s Rizzo.

He pointed Content Alarm at the source control software used in-house to manage information and revisions. “I told it to protect everything, even though there may be a few things in there that I don’t care about,” he says, to see how it could deal with 40,000 files. “I was quite impressed, once it got through the initial run-through of files,” he says, saying it also kept up well with modifications to, or deletions of, files.

Training Content Alarm did take time. “You need to do a bit of work to go into sub-directories or pattern matching so it’s not always giving you information about things leaving the building you don’t want to know about,” he says.

Once trained, the software does a good job of parsing complex linguistics, says Rizzo. For example, developers often run long development scripts, set to e-mail the developers at home when it finishes. The script will have pieces of the "make" file—a non-proprietary description of what you’re doing—and Content Alarm was able to distinguish between pieces of a "make" file and other, sensitive development documentation with similar language. “The ability to not identify false positives is quite amazing.”

Rizzo also applauds the administrator tool for first showing excerpts of potential violations. Even though many businesses have security policies allowing them to read any e-mail sent or received at work, no one wants to seem Orwellian. By showing excerpts of potential violations, and referencing the path and file name the content might have come from, “it gives you the ability to scan through the audits without actually being intrusive and looking at the e-mail,” he says—unless, of course, more digging is required.

Features Rizzo would like to see include the ability to detect whether other kinds of digital content—parts of images, for example—have been cut and pasted into other pictures and sent out. He also wants the device not just to detect infractions but stop them. “You can say that as a rule no one is allowed to use HTTPS or SSH, but [the appliance] is not going to catch encrypted traffic. It would be great if the device itself was a proxy,” he says, having final say on whether something gets to leave the enterprise or not.

Still, Rizzo likes knowing what’s leaving the firewall. “Most people, if they’re really intent on stealing something from you, they’re going to make a few mistakes, and they’ll make enough that you can dial into that.”

About the Author

Mathew Schwartz is a Contributing Editor for Enterprise Systems and is its Security Strategies column, as well as being a long-time contributor to the company's print publications. Mr. Schwartz is also a security and technology freelance writer.

Must Read Articles