5 Projects for Purging Data and Managing Long-Term Liability

These steps will help you begin building and executing a defensible deletion strategy.

By Jim McGann

Organizations are facing exponential data growth and are struggling to keep up. Some of the data maintains its importance to a business over time, but most of it loses value and languishes on networks and legacy backup tapes for years. Because the price of storage has decreased, the perception exists that it is cheaper and easier to keep everything forever, causing data to be stockpiled and archived for decades. This thinking is now being questioned as the legal and regulatory climate has evolved and turned stored data into a significant liability.

Defensible deletion strategies are fast becoming commonplace in organizations that are facing frequent litigation and regulatory requirements. Identifying data that has no business value, but has the potential to become a liability, has become a key component of information governance programs in every industry. IT organizations are tasked to better understand corporate data assets. The sheer scope often delays the start, as companies struggle to engage and receive sign-off from all stakeholders and find an agreed-upon plan of attack.

With this in mind, we present five steps you can take immediately to start cleaning house by targeting some of the most sensitive and potentially high-risk data, so you can manage risk and liability. Combining these with a series of manageable, attainable projects will provide an initial road map and turn a daunting task into a logical process of analysis and action.

Project #1: Remove obsolete PSTs

Users save all their e-mail and are afraid to delete anything in the event they need it in the future. Although most e-mail messages contain normal business communication, an important subset of a user's messages contains agreements, directives, and other sensitive correspondence.

Data mapping is a process that provides the information you need to make decisions about PST files. A data map tells you the exact location of all PST files, who owns them, when they were last modified or accessed, and more. Working with legal, a list of PST files can be examined, and based on policy, potentially be purged in large volumes. Common criteria for purging PST files include files that:

  • Have not been accessed or modified in more than three years
  • Are owned by ex-employees who are no longer associated with the organization
  • Are redundant or duplicate copies

PST files that do not fit the classifications above -- and, as a result, are not easily deleted -- can be reviewed and managed according to policies based on their contents. One option is to move PST files to a central server to make them more manageable rather than letting them reside on user desktops. This allows the user to maintain access to legacy e-mail while ensuring that they can be monitored for any legal or compliance reasons. In this environment, an up-to-date index or data map of the PSF file content is a critical component to managing this e-mail according to policy. Once rogue PST files are purged and a data map is created for the remaining PST files, managing them will be significantly easier.

Project #2: Purge data from former employees

Although all employees leave an organization at some point, their data typically lives on forever. Purging data from a desktop or laptop is usually standard procedure, but it doesn't address the residual files that are scattered about networks and servers. As this data ages, it becomes more difficult to find and manage. As a result, this content sits in various repositories and can easily become a liability over time.

Working with your legal department, you can define a list of ex-users who have no legal or compliance preservation requirements. An accurate data map will allow you to easily determine the location of all these users' content, even as it ages and recedes into the infrastructure of backups and archives. A data map and index will also allow for additional analysis, such as:

  • Has the data been accessed in the past few years? If so, it may have business value and is leveraged by current employees.

  • Does the content contain any intellectual property that should be archived for long-term retention?

Until recently, finding the answers to these questions was a long and expensive task, often leading to a "do nothing" strategy that allows the data to stay dormant and hopefully forgotten. With current technology, however, defensibly deleting ex-employee data can save a significant amount of network storage expense and help you manage long-term liability.

Project #3: Eliminate redundant legacy tape backups

When backup tapes outlive their usefulness for disaster recovery, they typically get moved to offsite storage. This includes copies of disk-based backup images as well as physical cartridges for those organizations that backup directly to tape. Either way, tapes pile up in offsite storage vaults and contain a significant volume of sensitive user files and e-mail. The data on tapes that needs to be kept long-term is often less than 1 percent of the overall data set. The starting point for this project is cataloging and organizing tapes in order to classify them and begin eliminating redundancy.

Tapes by their nature are highly redundant. Repetitive copies of user data are created on a nightly basis using incremental backups and sent to tape. For example, if you have six incremental updates that occurred during the week and a full backup at the end of the week, you will want to keep the full backup tapes only. Incremental backups represent a large percentage of the overall tape content, so purging them will make a large stockpile significantly smaller. Once you have purged incremental backup tapes, the next step is to remediate the balance of the legacy tapes.

Project #4: Review your legal hold and preservation archive

Legal hold and preservation requests often result in over collecting and archiving too much data. It is commonplace to archive a full mailbox or even an entire e-mail server to satisfy a request from your legal department or a compliance officer.

Even organizations that have an existing archive may be challenged by the legal hold and preservation process. Most archive applications were not designed to manage the volume of data seen by today's enterprise, nor were they intended to facilitate the increased level of legal and compliance requirements that today's business environment brings. Finding requested archived data quickly can be extremely difficult, and the cost of maintaining the archive drains valuable dollars from operating budgets.

Implementing an intelligent archive to manage and support these hold and preservation requests will allow e-mail to easily be preserved and then released when the hold request expires. Indexing the e-mail server directly can capture only the specific messages required for legal hold, preventing the over-provisioning that is commonplace in today's archiving systems. It is also possible to make hold and preservation requests a self-service option for the legal team so they can pick and choose what to archive.

Project #5: Create a data map for tiered storage via data classification

Data mapping is the foundation for any defensible deletion strategy. The basis for data mapping is an index of user files and e-mail, namely the unstructured or unmanaged content, existing in the enterprise. The index can simply be a high-level/metadata view of the content, detailing owners, dates, and type of data. This information allows for data classification based on data type, sensitivity, age, or any other policy component. Even an initial step -- such as locating, identifying, and tagging sensitive data on a network -- is a significant step towards improved information governance. A unified, actionable data map of online, near-line, and offline data is a mandatory first step for any storage project.

Beyond these examples, other scenarios exist where data mapping can facilitate the defensible deletion of user content. A data map can point to user data that has not been modified or accessed for more than five years. This data can easily be moved to a less expensive storage tier, including cloud storage. Additionally, any data that has not been accessed for more than seven years can be defensibly purged as long as legal or compliance has no hold on the content.

A defensible deletion strategy is achievable if you take an incremental approach and start with projects that are easy to deploy and will have immediate impact. The above examples illustrate specific projects that have been implemented at Index Engines client sites. The process and methodology is in place and the benefits are significant. Organizations are not only saving storage and IT expenses by implementing defensible deletion strategies, they are also managing long-term liability and risk.

Jim McGann is vice president of marketing for Index Engines, an electronic-discovery provider based in New Jersey. McGann is a frequent writer and speaker on the topics of big data, backup tape remediation, electronic discovery, and records management. You can contact the author at

Must Read Articles