The Elusive Integrated Archive Stack

We can’t expect vendors to solve all our problems. Sometimes we have to do the heavy lifting ourselves.

A short while ago, the Storage Networking Industry Association issued the results of a poll of over 200 IT professionals and characterized the results as "an archive crisis." The findings suggested that the length of time companies were holding on to data was now moving into the range of 80 to 100 years. A significant percentage of the respondents said that they were very concerned about their ability to read their data at the end of that time. In an interesting twist, most said they do nothing today to migrate this retention data around various components of their storage infrastructure.

In reading the survey and the SNIA summary, I agreed that there was an archive crisis. The crisis, however, was not in the limitations of technology, but in the attitude and behavior of the IT practitioners. What people were really complaining about was that archiving data was more difficult than most of their other tasks in the storage realm. Tools needed to be cobbled together from different sources and there was no integrated, automated stack provided by vendors to do the heavy lifting.

The "crisis" in my view after reading the SNIA report was much simpler. We have begun expecting the vendor community to do all of the work for us.

There is no doubt that archiving is hard work, requiring the identification of data to be retained, the selection and implementation of suitable data containers (formats such as OpenDoc, Microsoft’s Open OfficeXML, and Adobe’s Acrobat, to name just three), the identification of some method for collecting the data per policy, and the creation of a migration strategy that identifies when, where, and how data will be moved (and moved again over time) until it is ready for the shredder.

As I’ve noted in previous columns, the ways and means for managing data do exist, but most require some effort and integration on the part of the consumer. Complaining about the lack of silver-bullet solutions to these issues—the lack of a one-stop-shop solution—isn’t going to move anyone closer to archive nirvana.

I was reminded of this two weeks ago as I sat in a briefing from CA about some integration work they were announcing in the area of an Information Governance solution. As reported on ESJ.com last week (http://esj.com/Enterprise/article.aspx?EditorialsID=2834), CA has taken assets from two of their recent acquisitions—e-mail archiving company iLumin and records manager MDY—to create an the Information Governance Suite.

The suite includes CA Records Manager, which is records-management software acquired from MDY and formerly known as MDY Federated RM, and CA Message Manager, which is e-mail archiving software acquired from iLumin and formerly called Assentor Enterprise. The suite provides centralized policy management features for controlling physical, electronic, and e-mail records as well as e-mail management tools for handling integrated records management, mailbox management, e-discovery, and supervision capabilities.

CA’s Infrastructure-Agnostic Approach

CA has worked to integrate these offerings so that the archiving engine from iLumin works and plays well with the "federated, in place, records management" philosophy of MDY to deliver an "infrastructure-agnostic" means for managing business information in accordance with various regulatory and legal mandates.

Kristi Perdue, director of product marketing for this integration at CA, insisted that the build would "ensure consistent compliance policy adherence in the organization" without requiring a rip and replace of existing technology and without disrupting existing work processes. "We provide a platform for governance; the storage is up to you."

She stressed the DOD certification of the solution, referring to a process that tests various archive and data management products against a standard specification developed at DOD to prevent the acquisition of data management and information governance piece products that don’t cooperate with each other, with business apps, or with hardware infrastructure. Vendors must resubmit their wares to the DOD process each time they issue a new release.

To be clear, DOD certification does not mean that the CA product, or any other vendor’s product, is somehow magically certified to place corporate data into compliance with regulations. Perdue was up front about this point, which was refreshing—especially considering other vendors that boast about "compliance certifications" for their wares that just don’t stand up to the truth test.

Basically, the solution works because it is commonsensical. There is no secret sauce here for data classification, no artificial intelligence that cherry picks the right data for inclusion in the management scheme (though an "active content search" is on the roadmap). The data to be managed is mostly chosen by user identity.

In one scenario, users fill out a profile card describing the job he or she performs. Based on this identity, the data they create is automatically passed to the appropriate management mechanism according to policies created by risk managers.

Another dimension of the solution: CA is reluctant to replicate data. Their approach is federated, Perdue emphasized. They can make a copy of data to a remote repository, but believe that management of data in place, provided that there are sufficient lock-downs and controls, does the job just as well.

Perdue said that this announcement was intended to demonstrate the potential for integration of the archive and governance stack. I CA’s already well-established security products could also find an effective role to play in information governance.

I like what CA is trying to do here. It is similar to the capabilities being built in BridgeHead Software’s products and also in FileTek’s product family, which includes StorHouse, Clearview ECM, and another identity-based classification technology, TrustedEdge. It seems that the software houses are getting a leg up on the hardware shops for a change, enabling services for information governance that can be used across all hardware platforms.Your observations are welcome: jtoigo@toigopartners.com.

[See Page 3 for reader feedback to last week’s column.]

Reader Feedback

Following last week’s column about VMware and Storage, Steve Marfisi of emBoot, Inc. wrote to me saying he enjoyed the column and agreed that the interviewee (Jim Price, CEO of Fairway Consulting Group in Sunrise, FL) knew his stuff, "But I would like to clarify one statement he made."

Price had observed that in a physical server environment, storage is exposed as raw LUNs, and the server can boot from any kind of storage over any kind of connection except for software iSCSI and NFS. He was quoted as saying, "You can boot from hardware iSCSI, but nobody does it that way: software iSCSI rules. In a VM environment, you can boot from any datastore."

Marfisi responds, "Servers can, indeed, boot from software iSCSI—our customers do this every day with our netBoot/i and winBoot/i products. As well, there are several open-source solutions to doing this. In all cases, these are certainly software iSCSI initiators (in our case, loaded via PXE bootstrap or integrated into ROM). The servers can be physical systems, or virtual machines—doesn't matter—in both cases, they are using software initiators."

Thank you for this valuable clarification, Steve. We may look at boot issues in a future column.