How To Whip Your Enterprise Archive into Shape for E-discovery
Follow these steps and you'll better understand the current state of your enterprise archive.
By Richard Harding
The promise of cost-effective data management is driving many organizations toward enterprise information archives (EIA). According to a December 2011 poll by Gartner, 20 percent of respondents had already implemented on-premise EIAs and 25 percent indicated they were looking into implementing them. Additionally, 10 percent of the respondents indicated they had implemented cloud-based EIA, while another 20 percent said they were considering cloud-based EIA.. Despite these trends toward EIA, some companies are finding that their archives fail them at the exact moment they are needed most, and that extraction of data for electronic discovery proves costly in a multitude of ways.
EIAs store semi-structured data, such as e-mail messages and other end-user files. Initially, many enterprises adopted archiving to deal with proliferating e-mail stores by migrating older messages to the archive and storing them on cheaper disks. Information technology departments often took the lead in selecting and implementing these archives. As regulatory and compliance needs evolved (e.g., the Federal Rules of Civil Procedure changes in 2006 requiring preparation for electronic discovery), archives were implemented to capture, preserve, and make their contents available for legal review.
Although the core role of archives and their importance have changed, archives are often not considered mission-critical systems. These systems are commonly deployed, maintained, and largely ignored until another upgrade (such as enterprise messaging) occurs. Mergers, acquisitions, divestitures, and changes in outsourcing contracts can bring additional archives into the enterprise. During these transitions, the active users and their current data are migrated, but the orphaned archives (containing historic users and their years of data) are often put aside in the corner of the data center and forgotten until the worst happens: an investigation or litigation request makes it critical to access the archived data.
Usually, there is an urgent need to search and collect data from the archive to comply with a discovery request. The visibility of the effort extends from the data center, through the General Counsel's office and into the boardroom. The enterprise archive is ready to have its moment in the spotlight, but that's exactly when things start to go horribly wrong.
First, there is the imminent issue of finding and preserving the relevant data in the archive. Although a seemingly simple request, it is often thwarted by corrupt or incomplete indexes in the archive. Depending on the technology, the indexes may be stored as flat files or in a database, but the result of the corruption is the same: the search is incomplete or it fails completely. Although an index rebuild might solve the problem, it usually takes weeks or longer and the indexing might lose pointers to stored data -- not an acceptable timeframe or solution.
Another waiting surprise lies in the actual length of the archive's data retention period. This issue often results in the retention of data long thought retired or, conversely, the expiration and deletion of data that was expected to be stored in the archive. Either instance is often the result of an incomplete archive implementation. For example, retention schedules may have been developed after the archive solution was implemented and may not have been properly integrated back into the configuration. Perhaps a disposal feature was not enabled or the enterprise never made an active choice on retention.
Regardless of the actual error, data is resultantly held for the wrong time periods -- a costly error that results in overproduction or the time-consuming process of restoring data from backup tapes. Alternatively, the enterprise may not be able to access the data at all.
Another lurking issue is working with an archive that does not scale to the needed performance level. Archives store terabytes of data, so it seems ludicrous to suggest that there would be a scalability issue, but few consider the search and export functionality in the scalability discussion. Archives are often engineered to handle peak data import, and when attempts are made to export a large amount of data, scaling problems become evident. A massive export may take weeks or months to complete -- if it is completed at all.
The strain on system resources can cause ancillary problems and generate logs full of message and file export exceptions. Countless hours are then dedicated to resolving the exceptions or breaking the search and export into smaller sets without making any errors in the numerous search criteria.
There are also potential process issues. Documentation for the operation of the archive solution is commonly out-of-date, incomplete, or missing. Furthermore, data and custodian maps are not available to point to the relevant data.
Given that these are real-world issues facing a multitude of enterprises on a daily basis, how can an enterprise prepare for the eventuality that their archived data will need to be available for search and export? The best course is to attempt to manage the archive's health and understand the limitations of the archive solutions in place. In some instances, those understandings may require further action, but at least the limitations will be known.
To ensure the health and performance of your archive:
- Understand and/or develop regular maintenance requirements and system health checks for the archive solution. This may require developing and tracking metrics that can be verified independent of the archive solution, such as the number of items archived per period, items expired, etc.
- Audit system configurations to ensure compliance with stated retention and destruction policies. Create and execute test plans to simulate policy action.
- Regularly monitor policy and system logs.
- Benchmark search and export performance. Define typical thresholds for the number of custodians (users) and the date range to search. Understand the time requirements at each step of the process.
- Review and update the documentation and procedures for the search, export, and delivery of results. Ensure your process is defensible in preparation for litigation.
- Create or update data maps and custodian maps. Understand if there are one or more archives where data is stored. Understand the challenges of identifying user data for long time periods -- access to historic user directories may be necessary.
- Create visibility for any issues identified by these actions to key stakeholders in the enterprise, such as Legal, Compliance or Records Retention.
Following these steps will set you on the path to better understanding the current state of your enterprise archive. Many enterprises will be able to accomplish the tasks internally. For others, it may be helpful to harness external resources to evaluate the current state of your archiving program, but use caution when considering archiving solution vendors or value-added resellers (VARs). In many instances, leveraging such resources only results in an upgrade to the latest version without a comprehensive assessment. Consider identifying a trusted e-discovery partner with in-depth knowledge of the technology and legal issues.
Like many technologies, archiving has evolved from what was once a niche solution into a multifaceted, enterprise-level option for a variety of business needs. With increased options, however, come increased risks if improperly managed. Before you take the next step in the evolution of archiving, make sure you thoroughly understand your archive and take the appropriate measures to ensure its long-term viability. Doing so will pay dividends in the long run.
Richard Harding is a senior consultant for Kroll Ontrack. In his role, he partners with corporate and law firm clients on issues to effectively manage electronically stored information (ESI), whether part of litigation readiness, information management, technical implementations, or litigation response including identifying, preserving, collecting, analyzing, and producing ESI. You can contact the author at firstname.lastname@example.org.