Dispelling Log Data Retention Myths

Retaining data isn't enough. IT faces a host of regulations that address maintaining log files—tracking who did what and when—along with the requiring access to them in very short order.

Regulations: if only there was just one. In reality, many organizations, especially in the finance and healthcare industries, face multiple regulations, and each comes with its own data-retention requirements. Take the Health Insurance Portability and Accountability Act (HIPPA), which requires companies retain patient information for up to six years. Or the Sarbanes-Oxley Act of 2002, which mandates financial-record retention for seven years. Sarbanes-Oxley auditors will also be carrying the Federal Financial Institutions Examination Council’s IT examination handbook, which requires financial companies maintain appropriate security controls, as well as audit trails and logs for all data entered or processed. In other words, organizations need to not only store information, but be able to find it again in a timely manner.

Manually reviewing every piece of data before deciding to store it or not is out of the question; so is storing absolutely everything—the cost would be prohibitive. Yet how do organizations even begin to approach log data, which the average organization generates reams of everyday?

To talk about technologies for automatically managing what to save or discard, Security Strategies spoke with Adam Frankl, vice president of marketing at Addamark. The company creates software to collect, analyze, and retain log information for complying with audits or running investigations.

How do organizations begin to approach the data-retention problem?

Organizations need to collect all this information for information security reasons, and also for regulations in the U.S. and Europe.

The basic idea behind Addamark was that log data—machine-generated information about what’s going on on the system—is becoming increasingly important. One of our fundamental hypotheses was that (1) the data was important, and (2) most systems do a bad job at gathering and storing it.

Given the amount of data some organizations generate, isn’t combing through it a nightmare?

A bank was coming to us [and said] they generate five gigabytes of log files per day, and would like to maintain that data for a minimum of six months. Putting it on tapes in a secure closet somewhere is useless—you might as well be throwing the tapes away. When the bank had a discrepancy, their vice president of IS figured out that they would have to load 1,000 tapes at an hour a tape, and then … they discovered tapes 652 and 354 were missing. Those tapes weren’t where they were supposed to be, or maybe they were mislabeled. [End result: a full restore would have been impossible anyway.]

How do you automate that process?

What we’ve done is created a solution called Addamark Omnisight, which allows you to collect log data from every log-producing device across the enterprise, retain it in a … cost-effective manner, and maintain it online for [such things as] reviews and reports. We have name-brand customers that have the system in production today … [ranging from] Lehman Brothers and Goldman Sachs to Rockwell Automation to Blue Cross/Blue Shield and Yahoo. They’re using this to maintain the log files and get actionable information on a daily basis.

How many different regulations apply to data retention?

There are 20 or 30 different federal regulations that concern log data, but we’ve found [several—including Sarbanes Oxley, HIPAA, and the Gramm-Leach-Bliley Act (GLBA)] that have a combination of specific requirements, industries used to dealing with regulators, and which have teeth.

[For example take the] Federal Financial Institutions Examination Council (FFIEC) [guidelines]. It applies to all U.S. commercial and national banks, and [standardizes requirements for a variety of institutions, including the FDIC, Treasury Department, and Security and Exchange Commission]. Starting in December 2003, banks [got] their first round of audits against these regulations.

How detailed are the regulations?

[FFIEC] is a 120-page document … it’s a best practices document, information security professionals say anyone following this would be at a good state of information security, and it puts a requirement on banks that they collect, retain, and review 12 types of log data.

Now if you’re [running] firewalls or intrusion detection systems [IDS], you can filter the data coming in [that you want to see]. But the FFIEC regulations take a different tack on data—it’s not that it’s useful in preventing outsider attacks, but it’s a legal record. A log record could be your only legal [recourse if sued], or if the Feds want the data for an investigation, you have to produce it. Because it’s a legal record and not a business record, there are different requirements.

There are three things that log data need to be court-admissible: it needs to be complete, accurate, and verifiable … If log data isn’t complete, it’s not admissible. So … it puts an additional burden on all these institutions to maintain these legal records.

These data-storage requirements are relatively new, aren’t they?

Before Enron, the popular theory was you delete records as soon as you don’t have a need for them, because they’re just waiting for a hostile subpoena to inspect them. Post-Enron, the [government has] a different view. Anyone who is deleting corporate records, there’s something going on, [plus] it’s very difficult to delete records completely.

So many organizations are taking the opposite tack, saying we’ll not only be in compliance but we’ll look like we’re in compliance, storing e-mail, instant messages (IMs), and log files.

[Recently] there was a $675 million fine against Bank of America, [partially] because one of their fairly low-level employees was doing trading on reports that hadn’t been published yet, but they had log records that were supposed to be [internally] reviewed … [and] which should have been turned over to the federal government [but they couldn’t produce them]. So … a new SEC [regulation] says, when we request records, they must be produced the same business day. Banks have regulators with huge power over them.

What about storing instant messages?

Broker and dealers have the longest requirements—up to seven years for instant messages and e-mail.

So a data-retention best practice is hanging onto all these things?

Well, hanging onto it and putting it into a closet just doesn’t work. There’s another concept we’re just becoming aware of, which is the investigation backlog. Ten years ago there was an application backlog … What’s going on now in information security departments is there’s an investigation backlog. The IS dept has a limited number of people and they don’t have the tools they need, so they get a request for information … and most are pass-ons.

So what many of our customers are doing is using our tool to radically cut the time needed to [investigate]. At one of our customers, a major bank, the head of information security got a call from the head of HR saying the company was about to terminate a senior executive, and he’d like to review all of [that executive’s] activities on computer systems for the past 30 days, and he wanted it by the end of the next business day. And the head of information security said it was impossible, it would take 20 people days just to assemble the raw data. By bringing in our system, however, they were able to complete the request in a matter of minutes, not days.

How does the software work?

Omnisight has a number of different components … but the heart is … the scalable log server. What we’ve done is … created a database that’s optimized … Data is time-stamped first, arranged in pretty standard columns, also it’s append-only—never updated or changed—and it comes in huge volumes. So … we’ve created a database that works just on log data, and we divide the data up by columns and store and compress each column separately.

So you’re compressing similar data?

Yes, and that gives us a number of advantages. One terabyte (TB) of raw logs—a medium-size company’s logs for one year - - could expand to 4 TB of data, when stored, but for us, it compresses down to 100 GB of disk imprint. What this means is we’ve eliminated a huge cost of keeping terabyte information around, because we compress up to 40 times more efficiently than a conventional system.

If a company looks at building a TB data warehouse, you’re looking at millions of dollars—just for the disks. What this also allows us to do is run our queries against compressed data, whereas in a data warehouse, you first have to [decompress] a whole table.

In the Omnisight system, you can run a query, and it goes to the time period being searched and only the time period of your query and decompresses only the columns you need to run the queries on. This means we can get one to two orders of magnitude faster than a conventional SQL server.

So you compress individual columns filled with very similar information?

Exactly, that’s why if you store it by column you can compress it so well.

Does this approach to storage assist companies’ internal investigations?

Most of the companies we’re dealing with now have built a homegrown solution to try and rapidly look through log files … but generally they’re hampered by having to look at one type of file at a time—IMs, e-mails, firewall logs.

The problem is it takes a huge amount of time, more than you might expect, to view each file type [and they can’t compare two different types of files, side by side, easily]. Say an individual did something suspicious but not illegal—why is he logging onto that system? Most companies have no way of seeing [across different file types] what else he’s doing at the same time, so they can’t [easily build a complete picture].

I think most employees are honest … but information security people would be foolish to think that all employees are honest all the time. If they do internal investigations they can settle things more quickly than if they wait for the Feds to show up. That’s a big incentive for our financial customers as well.


Coping with the Gramm-Leach-Bliley Act

Teaming Identity Management with Auditing