How Data Maps Can Control eDiscovery Costs While Recouping Storage Capacity

How data maps can help you manage your storage environment efficiently and effectively.

By Jim McGann

Current regulatory, compliance, and legal requirements are forcing organizations to get a better understanding of their data assets, manage it according to policy and purge what is no longer required. Gartner states that "The challenge is intensifying as data grows at a rate of 40% to 60% every year." (Gartner, Organizational Collaboration and the Right Retention Policies Can Minimize Archived Data and Storage Demands, Alan Dayley and Sheila Childs, June 2012).

IT organizations have spent many years implementing massive storage capacity in order to keep up with data growth. This activity has been done blindly in that IT has no sense of the content being created by users and its value to the organization. That trend has changed; IT organizations are reviewing what it costs to maintain and manage storage resources and have realized that it is a significant component to their budget.

Streamlining storage capacity and purging what is no longer required will help control costs and simplify resources. In fact, leading analysts state that up to 60% of storage capacity can be recouped by purging unnecessary user data (aged content no one needs, ex-employee data, etc.). Using data maps, IT organizations can gain visibility into their data stores so they understand what exists, where it is located, who owns it, and why it is being kept.

Traditional data maps were mostly generated by face-to-face interviews. Working with IT professionals, application managers, and other key resources, a map detailed what types of data reside on which servers and who owns it. Using this type of data map you can easily define the sensitivity of the content and use this information for ongoing compliance and legal efforts. However, traditional data maps don't get to the granular level required by today's demanding legal and risk management requirements.

Using high-speed, enterprise-class indexing technology, data maps can now provide more comprehensive knowledge and profiles of user content. Detailed metadata, such as file type, owner, create/modified/accessed data, and location is now available to profile the content. Summary reports also help classify data and provide high-level views into massive storage repositories. This new level of information allows organizations to better manage legal and compliance policies, as well as streamlines storage capacity, to reduce the volume of content maintained online.

Because indexing is the foundation of the data map, it must be executed efficiently and at high speeds so it is not a bottleneck in the process. Terabytes, even petabytes, of data must be processed in a reasonable time and with minimal resources. Additionally, the indexing platform must be scalable to support enterprise-class environments.

Indexing will be layered on top of the storage infrastructure to provide the intelligence needed to manage data effectively. As such, the index itself must be minimized in size without compromising quality. Enterprise data maps should also support all classes of storage environments, including such data sources as LAN servers, e-mail databases, and even legacy backup tapes. The legacy data on tape, a snapshot of legacy user content, is, in fact, a significant liability for enterprises and needs to be managed according to policy. Significant legal cases have occurred in recent years in which judges have forced organizations to retrieve data from tape because it was the only copy that remained within the corporate networks. The risk and liability of unknown and unmanaged content is keeping legal and compliance teams up at night.

When it comes to data mapping, it is logical to begin with the most sensitive sources of data. Many organizations focus on specific departmental servers and systems such as finance, senior executives, R&D, etc. Others focus on legacy content, such as old backup tapes, which is vastly unknown and is more of a risk than online data sources. Either way, the implementation of indexing nodes within the network environment will allow you to connect to these sources and commence indexing.

Reports are a great place to start data mapping. Profiling based on servers or departments, lists of owners, date ranges, data types, or age of the content are all valid starting points. Many organizations use data maps to fine tune and update current corporate policies. Understanding what exists and profiling this content can help you refresh or validate corporate policies. For example, if an organization has a policy that restricts users from creating PSTs on their desktop, a data map will quickly audit the network and determine if any PSTs exist and if the policy is being implemented correctly.

The most common use of a data map is to manage user files and email according to legal and compliance policies, in order to limit risk and control long term liability. If legal and compliance do not have a solid policy, with respect to user content, data maps will shed light on this content and help refine or define polices. It is next to impossible to define a policy without any sense of what exists on the network.

Another common use case for data mapping is to control eDiscovery costs and effort. When a legal hold request is issued, identifying, collecting, and preserving the responsive data can be complex and time consuming. Using a data map, the relevant content can be identified quickly. One benefit data maps provide an IT organization is the ability to classify and manage data based on policy and storage rules. Tiering user content based on metadata criteria gleaned from the data map is a valuable capability that will streamline resources. For example, user files that have not been accessed in over 5 years can be found and moved to less expensive storage, such as offsite cloud repositories. Tiering data is a significant solution to reducing IT budgets and saving internal resources.

Data maps provide significant value to the organization. Providing a deeper understanding of data assets is key to proper records and storage management. Without a data map, managing content is next to impossible and as a result, organizations spend significant time and resources stockpiling and storing vast amounts of user data. Using innovative enterprise class indexing technology, data mapping is simplified and automated, making it a core component of every IT data center.

Jim McGann is vice president of marketing for Index Engines, an electronic discovery provider based in New Jersey. McGann is a frequent writer and speaker on the topics of big data, backup tape remediation, electronic discovery, and records management. You can contact the author at
comments powered by Disqus