In-Depth

Q&A: Extracting the Mystery Out of Backup Tapes

Over the years, IT has built up a collection of tapes, especially for backup. The problem is -- what's on all those tapes?

Over the years, IT has built up a collection of tapes, especially for backup. The problem is -- what's on all those tapes and why is it important to find out? What are the risks of not knowing -- and the legal risk of holding on to tapes you longer need. To learn more, we talked with Jim McGann, vice president of information discovery at Index Engines, an enterprise discovery technology company based in Holmdel, NJ.

Enterprise Strategies: Recently, Index Engines launched a "mystery tape" program. Can you tell us what mystery tapes are, and how your program deals with them?

Jim McGann: Mystery tapes are backup tapes that have unknown content. Even though many organizations have mostly now migrated to D2D backups, tapes are still being made off the back-end for long-term archiving. IT organizations have saved tapes that cycle out of disaster recovery rotation and companies inherit tapes from a merger or acquisition. This can add up to tens or hundreds of thousands of tapes that are typically stored in expensive offsite storage vaults to protect them for the long haul.

Mystery tapes cause great headaches for corporate IT because their contents are unclear, especially if the labeling is cryptic, faded or non-existent, which is often the case as the tapes age. Index Engines' mystery tape program gives IT a chance to "look and learn" what is on a mystery tape. Using a web based interface allows search and access to the content. Any relevant data can be easily extracted and delivered as required. This is a cost effective model for small volumes of mystery tapes. It has been a very popular program.

Where do mystery tapes come from?

No doubt, all organizations have the good intentions of properly labeling and tracking tapes, and they endeavor to manage the content with labels, bar codes, and catalog information. However, as tapes age, the details related to specific servers or networks do not stay the same. The context around the tape begins to change and knowledge of its contents begins to fade. Reading a tape label for a 10-year-old tape and knowing if the tape contains any sensitive content that must be archived for legal and compliance purposes is simply not feasible.

Why should IT care about mystery tapes?

Until recent years, IT didn't need to access user data stored on mystery tapes unless there was a natural or man-made disaster. Typically, restoration of files and e-mail messages had proven rare for tapes older than six months. In the past, legal and compliance departments were seldom required to produce tape-bound legacy content to support cases and litigation. They were able to mount the so-called "burden argument," claiming that the cost to restore and search tape content was prohibitive and burdensome.

However, the game has changed. Newly available technology makes access to legacy tape content much cheaper. Data on tapes is now affordably accessible, even for old tapes where the original backup software is no longer available.

Lawyers and judges are educated about available technology and know full well that this legacy content is accessible. As such, they are requiring companies to produce content as needed to support litigation. Many cases exist where fines and sanctions were issued when this legacy data was not produced. In fact, one well-known judge has said, "If you can't find user data on your primary network, go to your backup tapes. Just don't come into my court without the data required to support the case." The burden argument has truly disappeared.

How has technology impacted access to tape data?

The high cost to access data on tape, especially older tapes, mostly resulted from the infrastructure required to restore the content. For example, if you created a tape using NetBackup version 6, and you had since migrated to Commvault and retired NetBackup, you would no longer have an environment in which to restore anything off NetBackup tapes. In cases like this, the tapes had to be sent to a third-party specialist to restore everything from the tape and deliver the raw data, so you could search and find what you needed. This process was expensive and very time consuming.

New technology now provides direct access to tape data without the need to recreate the original backup environment. Using direct-indexing technology, legacy tapes can be scanned, indexed, and searched to find specific files and e-mail in Exchange and Notes databases. Once relevant content is found, even a single e-mail, it can be cherry picked from tape at a significantly lower cost and more quickly.

What are the true costs of keeping mystery tapes?

The actual monetary costs of managing legacy backup tapes include off-site storage costs as well as maintenance and support of legacy infrastructures including systems and software that are no longer being used. Third-party vendor fees are sometimes needed to restore and recover the data in response to legal and compliance requests. These hard costs add up to hundreds of thousands, even millions, of dollars annually.

However, the biggest potential risk and expense lurks in the liability posed by the actual content on these tapes. Corporations that avoid responsibly managing legacy content are, in effect, keeping potentially harmful user data. This data can be demanded by the courts if litigation arises and can damage an organization's position in lawsuits and investigations. The cost of this liability can run into the hundreds of millions. Organizations need to manage and control this potential liability by taking the mystery out of mystery tapes and proactively facing their content.

Why not keep tapes as an archive?

Tapes serve a useful purpose for disaster recovery in the short-term. However, they are not meant to serve as a long-term archive. Tapes generally cycle out of disaster recover rotation after a well-defined timeframe such as 30, 60, or 90 days. At this point, they should be processed to extract only the required data and secured according to corporate policy. This typically represents a small portion of the tape content -- less than 5 percent. Saving the entire contents of the tape in a non-searchable backup format is not advisable because getting access to the data as the tapes age becomes complex. Therefore, saving legacy backup tapes for long-term archiving and creating significant stockpiles of tapes should be avoided at all costs.

What if I only have a small number of mystery tapes?

Index Engines' cloud service allows for easy access to tape content. Simply send the tapes to our processing lab to be scanned and indexed. The Web-based interface allows for search and access to the tapes' content. For an additional fee, any relevant data can be easily extracted and delivered to you as required. We believe it's a cost-effective model for small volumes of mystery tapes.

A good portion of the data will be duplicates, systems files, and e-mail messages with no value or relevance. Most of this data can be deleted, and what you will probably find is you have reduced the number of tapes (and thus amount of data) by more than half. You will have solved the mystery tapes problem. How do you clean up the stockpiles of legacy mystery tapes?

Restoring all the content from legacy tapes is not cost-effective. You will only need to keep a small portion of the content because most of it is useless data. Traditional tape restoration is a wasted and expensive process, especially if you don't have the original backup environment.

The best way to clean up a stockpile of legacy tapes is to use direct indexing technology. This allows all tapes to be scanned and indexed without any original backup software. Once the tapes are indexed, it is easy to apply policy by searching the tape content and extracting what is required, including specific user mailboxes from certain date ranges. Such systems pull out intellectual property and manufacturing documents or any content that is deemed important for legal and compliance purposes.

Direct tape indexing and extraction are far less expensive than traditional restoration. In fact, most people find that this technology costs less than the annual offsite storage fees it eliminates. It is cost effective and will avert the need for offsite storage expenses once the tapes are processed and remediated, which is a money-saver.

Jim McGann is vice president of information discovery at Index Engines, an enterprise discovery technology company based in Holmdel, NJ. McGann is a well-respected, nationally known expert in the areas of electronic and tape, having written several articles and white papers; he speaks frequently on related topics. Contact him at [email protected]

Must Read Articles