De-duplication and the Allure of the One-Stop Shop

Is IBM's purchase of a de-duplication process tactical or strategic?

IBM has announced a third storage acquisition in as many months. They have purchased Diligent Technologies, a Tel Aviv, Israel-based provider of de-duplication and virtual tape library technology.

IBM's Cindy Grossman, vice president of the Tape and Archive Storage Systems unit of Big Blue's Systems and Technology Group, described the rationale for the acquisition: that de-dupe is "a critical element that clients are asking for" to cope with "exponential data growth," which is being driven in large part by the data retention requirements embedded in many regulatory and legal governance mandates. To further validate her claim, she cited analyst data from November 2007 that showed de-duplication and virtual tape as top priorities on the shopping lists of IT managers.

At first, the acquisition seemed simple to understand: IBM wanted to own a technology that companies were keen to buy. They could simply have partnered with others to deliver the technology in concert with IBM-branded gear, but Grossman observed that customers wanted "pre-integrated" solutions as opposed to multi-vendor solutions.

Doron Kempel, Chairman and CEO of Diligent Technologies, observed that his company, which had sold its wares as software-based appliances for about the past five years, was already installed at "200 of the Fortune 500." In addition to IBM, its products have also been re-sold by Hitachi Data Systems and Sun Microsystems, and by smaller players such as Overland Storage.

Why buy the cow if you have ready access to the milk? At first, I suspected that this was a play by IBM to isolate HDS and Sun, and maybe Overland, from their de-dupe source: Diligent. However, assurances were made that IBM wanted to retain the relationship with HDS as a technology distributor. "With respect to Sun," however, the continued sale of Diligent products "remains to be seen," a spokesperson observed. Later, Kempel stated that Sun accounted for one to two percent of Diligent’s revenues in 2007, but that Sun's contribution was expected to rise to 15 percent this year. Fifteen percent of sales revenues is nothing to scoff at, creating in my mind at least some confusion about whether the views of IBM and Diligent about continuing relations with Sun were entirely synchronized.

Even within IBM, there seemed to be some discontinuity. Greg Tevis, a software architect representing Tivoli Storage Technical Strategy within the IBM Software Group, noted that IBM Tivoli was already preparing to put integral de-duplication functionality, not based on Diligent's hardware appliance or Hyperfactor algorithms, into its management software. Tevis said there had been a lot of "soul searching" prior to the acquisition of a de-dupe appliance vendor given this fact.

The most confusing upshot of the briefing was whether IBM viewed de-dupe as tactical or strategic. Kempel laid out the proposition that de-dupe is a form of "intelligent compression" in which replicated bits or strings of bits are removed from block data. He said that the logical place for de-dupe is in archival storage and virtual tape libraries; it was not appropriate or intended for primary storage, nor for tape.

As a service for compressing data and squeezing more of it onto the same capacity disk, the panel said that it was a green technology as well, and cost-effective because it would slow the need to acquire more disk drives to handle the increasing amount of data. The key differentiator repeatedly cited was that Diligent's de-dupe was an "in-line" function -- which I take to mean in the data path between capture storage and retention storage. They cited the speeds and feeds -- 4800 megabits per second for one node, or double that for the new two-node clusters that will soon be introduced -- to extol the superiority of the Diligent approach. This performance is achieved, they explained, by collapsing what for other competitors is a two-step process of "de-duplicate then ingest" (write onto target disk) into a single in-line process using their secret sauce, Hyperfactoring.

With all due respect for the innovative technology, all of this sounded rather tactical. IBM/Diligent is simply helping companies pack more bits into the same space, realizing a "10x to 20x" capacity improvement in capacity allocation efficiency. The strategic move would be to de-duplicate AFTER data has been classified and sorted, and in the best of all possible worlds, expose de-dupe as a service that can be selected for certain types of data and turned off for others that don't need it or shouldn't be exposed to it for legal, business, or operational reasons. Only in this way do we begin to move closer to capacity utilization efficiency improvement -- which is what IBM's service-oriented architecture mantra is supposed to be about.

Grossman accepted a question from me on this point and observed that "information lifecycle management" is of great interest to IBM, but IBM Tivoli's Tevis sounded more like a kindred soul, suggesting that the goal of Tivoli Storage Manager's de-dupe was predicated on just such a vision of storage services.

Until a comprehensive vision of IBM storage architecture is articulated, this acquisition -- like those that preceded it (namely, XIV and FilesX) -- will continue to look like tactical plays: IBM buying another hot-button technology du jour less because of its strategic value to customers than for its short-term money-making potential. According to Grossman, customers want a one-stop shop, and whether it is strategic or even good for them in the long run, IBM is working to sell it to them today.

A question that wasn't asked and that should be addressed at some point is whether de-duplication creates any potential "gotchas" from the standpoint of regulatory compliance. Could it be argued that deflating data negatively affects the nonrepudiability of that data? Going further into the realm of risk, how is de-duplicated data recovered expeditiously if the software appliance used to deflate it is compromised by a disaster?

These and other questions will be addressed in a future column. For now, your questions or comments are welcomed:

About the Author

Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.