Test Data: Security Loophole?

When it comes to testing, using live data without access controls puts IT at risk. Special tools and processes can help.

It's not always the threat of security breaches from the outside that needs your attention. Sometimes the problem comes from inside your company, and it isn't always intentional.

Developers, quality assurance (QA) testers, and usability engineers must ensure new applications and upgrades perform properly. To do that, they need test data. Creating that test bed can be tedious and time-consuming, so programmers end up using real data.

For data privacy reasons, and to prevent large numbers of people from seeing sensitive information, it might seem obvious that real customer information should never be used when testing. Think of the potential for fraud or a class-action lawsuit if, for example, employees or contractors testing an insurance Web site use real-world medical and personal records. Think of the liability.

No matter what company policy is, your testing department might be using the real stuff anyway because it works better, warns Pete Lindstrom, an analyst for Hurwitz Group in Framingham, Mass. Copying production data records (or a subset of records) is convenient and fast. Account numbers in the new test bed will be known to conform to check-digit algorithms, dates are randomly distributed, and there's probably a good cross section of data conditions. "What very often occurs—but I don't know of specific instances, because [developers] dance around the issue—is that they use copies of the live data," Lindstrom says. "See, developers are too smart for their own good in a lot of ways; it would just never occur to them that they're a security risk."

What's Sensitive?
Much of today's corporate data—whether medical records, financial statements, or just contact information on customers who frequently purchase golf balls—is sensitive.

"The example that I use is, if you're in the testing department, and you're testing the payroll system, you really don't want to have them know the CEO's salary," says Dick Heiman, an analyst for IDC in Framingham, Mass.

Keeping information as secure as possible is also necessary given the current regulatory climate. Regulations such as the Health Insurance Portability and Accountability Act (HIPAA) mandate that information be secured against deliberate or inadvertent disclosure or misuse. Violations of a patient's right to privacy could result in civil and federal penalties, including fines and jail time, or disqualification from the Medicare program.

In addressing the data security problem, it's easy to lapse into believing that only employees in the accounting department can release accounting information. Sometimes overlooked is the seat-of-the-pants hot dog programmer who uses production data to quickly test an application "fix." Less sensitive to the confidentiality of the data, that programmer can carelessly (no matter how unintentionally) leave reports on top of a desk for all to see—or worse, transmit it to an outside agency during an interface test.

Creating Almost-Real Data
While nothing beats testing with real data, really good "fake" data may be the solution. Creating it, however, can be tough. "Generating good test data sets is hard to do—people run out of time and tend to scrimp on that, and the result is that your tests often aren't any good," says IDC's Heiman.

For application testing, designers and usability experts need an accurate sampling of data. As more and more fields of sensitive information need testing, quickly generating data that's safe yet useful for testing becomes increasingly difficult.

One option is to write homegrown software to alter the data so that it's no longer real, but "real enough" for testing purposes. Actual social- security numbers, for example, could be overwritten by other numbers.

To alter data, companies "have to write a utility, or a program, or a SQL query, to collect the data," explains Moungi Slim, product manager for file and data management products for Compuware Corp. in Farmington Hills, Mich. The company recently released a tool, called File-AID/Data Solutions 3.3, for altering sensitive data so that it's useful enough for test purposes.

"What Compuware helps you do is keep stuff that looks very much like real data but transfers or translates it into specific data such that privacy is maintained," says Hurwitz Group's Lindstrom.

Compuware's tool starts at $15,000 and tackles "data disguise" in three ways (encryption, translation and aging).

Encryption alters information by using a key. Whoever possesses the key can reverse the information back to its original state.

Translation goes a step beyond encryption by using a one-way hashing algorithm to consistently replace values with another value with no traceable connection to the original data.

Aging is a process for altering sensitive dates. "You almost want to say 'de-aging' instead of aging," says John Stevens, testing manager for Blue Cross Blue Shield of Minnesota (BCBS-MN), an early user of File-AID/Data Solutions 3.3. Stevens uses aging to take old records that were already scrubbed (to be HIPAA-compliant) and alters relevant dates so they can be used for testing. For instance, some insurance systems look up records from the past two years only, so old data must be "de-aged" to appear within the proper time period.

Hurwitz's Lindstrom says File-AID/Data Solutions 3.3 is remarkable for its simplicity. "There's no heavy-duty encryption required, no heavy duty access control required, it's just the translation from live data to fake data."

Real-World Test Bed Security

Here's how Blue Cross Blue Shield of Minnesota (BCBS-MN) ensures that test data, whether real or fake, stays secure:

1. Prepares the test data for testers. Specific data gatekeepers scrub data to make it HIPAA-compliant, then store the clean data on test servers, so individual testers don't have to scrub the data.

2. Generates good, fake data. BCBS-MN testing manager John Stevens recommends drawing fake names from a list of 50 common last names (Smith, Jones, etc.), and using first names that match the gender of the subscriber record.

3. Reuses test data whenever possible. The company keeps scrubbed data in play as long as possible, using File-AID/Data Solutions to help alter data dates and keep it useful.

4. Secures test data as if it were production data. BCBS-MN puts a fence around all data and uses elaborate, role-based security to ensure individuals see only what they're supposed to see. Automated systems require approval before a tester needing access to more sensitive data gets it, also creating an audit trail.

5. Educates, then swears testers to privacy. Anyone with access to sensitive information must attend training, sign a non-disclosure agreement, and sign a privacy statement. Without training, an employee can access only fake data.

6. Updates the security policy. BCBS-MN makes explicit which company roles get automatic access to which information after training.

7. Assigns gatekeepers. An automated system intercepts first-time requests for information from a sensitive database and requires gatekeepers—the employee's manager, security, and the database's owner—to each sign off.

8. Uses strong authentication. For more sensitive information, the company requires the use of secure ID tokens.

—M.S.

Can't Beat the Real Thing
For BCBS-MN, privacy isn't just about regulations such as HIPAA. "We've been doing this for years, because in many ways, this is just good business sense—you need to protect the privacy of your members and business customers," says Stevens. But, he notes, "You want your test world to look as much like your production world as you can afford."

There are times, however, when nothing but production data will do.

BCBS-MN often loads real data into its test environment at a certain point in the testing cycle. "HIPAA doesn't say that nobody can look at anything. HIPAA says you should only be able to see the data if you need it to do your job. Testers need to be able to see data," says Stevens. "The next sentence is: 'And we put a big fence around it.' The reality is that that fence is the same that we put around it in production, because you need to test that your security works."

That fence is access control via IBM's Tivoli Access Manager for Business Integration. A user's access depends upon his/her role in the company. For instance, a claims adjuster should only be able to see claims information. Any requests for access to sensitive information kick off an automatic process whereby a user's manager, then security, then the information owners, must first approve.

For accessing more sensitive information, the company requires a username and password as well as stronger authentication via a secure ID token. "It's a 'watch fob' thing that gives an eight-digit number, and changes it every six seconds," says Stevens. IT also instructs users never to leave their computers logged on and unattended. Stevens recommends, just to be safe, that Windows users enable the built-in screen savers that automatically activate after inactivity and require a password to deactivate.

Access to the BCBS-MN mainframes, where data is actually stored, requires not only an on-screen sign on, but—via old RACF (Resource Access and Control Facility) mainframe security software—that users be sitting at a mainframe terminal.

Using real data serves two purposes: It allows testers to make sure the systems work, and ensures that appropriate privacy controls are in place (thus testing for HIPAA compliance).

When BCBS-MN tests with real data, only subsets of production data are used. That's because there are 89 IMS databases that together contain 600 gigabytes of information— including claims, plus subscriber and group benefits. Stevens says the Compuware tool is also useful for helping to bridge the multiple databases while keeping information in a HIPAA-approved format. For instance, the database key of every record must be scrambled so that the key can't be used to trace information back to a person.

The Compuware tool is especially useful because the BCBS-MN production environment is old and complex. Creating made-up data would be diffucult because mainframes feed data to client/server systems, which then feed it to Web applications. "We have 25 years of systems, and the reality is that building data that works in the legacy environment is very hard to do because it takes so much knowledge," he says.

Who Plays Gatekeeper?
Ensuring that live data isn't inappropriately used in test environments requires that someone at your company have ultimate responsibility for security. Where should the security buck stop? "That's a problem with the entire security space; it's very difficult to have a point person," notes Lindstrom. He says the gatekeeper often varies by industry—financial services firms will use their security teams, for instance. Lindstrom recommends that companies with chief privacy or security officers turn to them. Otherwise, companies with serious QA groups might designate a project leader to watch the data, or else individual developers or business analysts involved in the project.

While there are no easy answers, there are guidelines you can follow to reduce your risk of data insecurity during testing:

1. Make sure your company's written security policy records the different access levels granted to different roles, so that everyone—from security to the test team—has the rules and procedures in writing.

2. Swear the gatekeepers to secrecy, and educate them about privacy. Gatekeepers at BCBS-MN, any others needing to access the company's sensitive data, must get training, then sign a non-disclosure agreement and an even longer privacy statement. "People who develop and test systems have access to those systems, and we need to educate them and rely on them to keep that privacy," says Stevens.

3. Realize that application testing is a team effort. It isn't up to the test group to run test bed security; that's a job for security. To keep the BCBS-MN test bed secure, notes Stevens, "I'm leveraging off a whole bunch of other people in a lot of different roles." Security is in charge of access control, even if Stevens is in charge of testing it.

For testers, the ability to quickly generate real-looking fake data is good news. Yet many organizations continue to need to test with real data. Regardless, a cogent mix of real and fake data in a well-secured testing environment with well-articulated privacy policies that employees must explicitly agree to will help any company meet more stringent regulations. It can also help you avoid the unpleasant spectacle of having information publicly compromised.