Get Smart about Archiving Backup Data

Backing up everything isn't just wasteful and expensive -- data that "lives forever" exposes your enterprise to additional risks. We explain a smarter, safer, and less expensive approach to disaster recovery.

By Jim McGann, Vice President, Index Engines

User-created documents and e-mail are a necessary component of daily business. Agreements, contracts, proposals and other intellectual property flow through corporate networks daily. Some of this content is valuable and important to archive for legal and compliance purposes. However, the majority of it is not useful and can present a liability if retained.

Corporate policy is the bridge between what information is archived and what should be purged. IT organizations are in the business of protecting and making copies of this content for disaster recovery (DR) purposes. IT must begin to work with legal teams to understand these policies and manage data differently.

Archiving all corporate data forever is not smart. It is not a policy that any legal or compliance team would endorse. However, this is what happens when IT organizations archive legacy disaster-recovery backup tapes into long-term, offsite storage. Backup processes were created for disaster recovery and were designed to copy everything; corporate polices do not apply to DR backups. However, the minute the backup tapes or disk images are cycled out of the disaster recovery process and placed in offsite storage as part of your enterprise's long-term archive, they become a liability. This process saves everything. All data in this case lives forever, a records manager's or general counsel's worst nightmare.

Data on legacy backup tapes or disk images are archived by IT organizations every day because they cannot be recycled. Some of the data in a backup may be required for ongoing litigation or required to be archived based on regulatory requirements. However, most of it ‒ probably more than 90 percent of it ‒ is not required according to policy; permanently storing it will grow into a liability over time.

Take, for example, the case of your organization being sued, and suppose the case involves ex-employees' e-mail exchanges that occurred over two years ago. The only copies of these e-mail messages are on legacy backups. If policy was applied two years ago, these e-mail messages could have been purged because no active litigation or compliance requirements for these employees existed. However, because it was archived on legacy tapes, the courts will require the messages to be produced.

Many legal cases exist in which organizations have battled unsuccessfully to avoid producing data from legacy tapes. Judges see this content as historical records, and if the company saved them, they need to produce them when requested. IT needs to manage this exposure and apply policy to corporate data as they replicate data over and over for disaster recovery in order to avoid liability in the future.

How can IT organizations manage disaster recovery and apply policy at the same time given that its disaster recovery process works (the process protects your organization from data loss)? As data cycles out of disaster recovery, policy must be applied. The backup process must become an archiving process. This is the only way to manage data according to corporate policies and protect the organization from long-term liabilities.

Corporate policies are defined within your records management and legal teams. They should be well versed in regulations such as SOX, HIPAA, and Dodd-Frank. If they struggle with these issues, consider using a consulting firm to help your enterprise put sound policies in place. Policies should not be a mystery. They should be well-thought-out, documented, and delivered to IT for implementation.

E-mail is used to communicate contracts, agreements, and proposals. E-mail records are typically where the "smoking guns" are buried. This is why most corporate policies involve e-mail -- they define what should be retained and purged according to user mailboxes or based on specific content within the e-mail messages. A typical e-mail retention policy requires that about 10 percent of the employees' e-mail be archived. This means that IT should be purging 90 percent of the user e-mail content that is currently being backed up. Of course, your company may have policies that require higher volumes of e-mail be captured in an archive, but even if 50 percent is the correct figure, you are saving more than that today if you're not applying policies to backups.

How do you apply policy to backup data? The challenge with backup data is that it is contained in a proprietary format, which in the past had required the original software for restoration. If, in fact, you wanted to apply policy to data you are going to back up, you would have to restore the content from tape and then search and find the relevant files and e-mail. However, direct indexing of backup tapes and images is now available. This is the key to applying policy and extracting only what you need from backup in order to satisfy corporate policy.

As tapes or tape images cycle out of the disaster recovery rotation, policy must be applied. Indexing all the metadata and content is critical to applying policy. Policy cannot be applied by reviewing the backup catalogs; this is a high level listing of files and does not give you the depth of knowledge required especially within MS Exchange and Lotus Notes databases. A deep index of backup content can be searched and the relevant data extracted. In fact, this can be accomplished without the use of the original backup software so legacy tapes and content can be reviewed. The cost of making these backup images searchable, so policy can be applied, is far less expensive that the liability of archiving all the content.

When new backups are created, policy can be applied. Indexing the backup image, on tape or disk, allows this to happen. Search for specific user mailboxes, or documents containing sensitive intellectual property, or e-mail messages that discuss the manufacturing of a product under litigation. These policies are defined and delivered to you by legal and compliance. Applying these polices against the backup content now becomes the next step as backups cycle out of rotation for disaster recovery. Relevant files and e-mail are extracted into an archive, and the tapes are recycled; IT purges the balance of the content that is no longer required.

The benefits of this process are many. Long-term corporate liability is managed. IT is not saving too much data. Offsite tape storage costs are reduced significantly. IT can recycle backup tapes and not continually purchase new tapes. Organizations can significantly reduce expenses by implementing this process. It is achievable due to direct indexing of backup data, making archiving possible.

Integrating intelligent archiving into the backup process is the way of the future. No longer will companies save everything on legacy backup tapes. IT and legal teams will work together to manage data more intelligently and avoid the issues that occur when data did live forever.

Jim McGann is vice president of information discovery at Index Engines, a tape discovery and remediation company based in Holmdel, NJ. McGann is a frequent speaker on electronic discovery and has authored multiple articles for legal technology and information management publications. You can contact the author at

Must Read Articles