Going Cheap on Disaster Recovery

De-duplication may seem like a way to go cheap on DR, but it can yield some expensive consequences.

A solid disaster recovery capability is difficult to maintain, but a blessing when you need it. According to survey data released by Sepaton just before the holiday season, data protection -- a key component of any effective business continuity strategy -- continues to be an investment priority for the coming year, despite the economy.

Sepaton conducted the survey in November and culled about 145 responses from companies that fit their criteria: "enterprise-class" firms with at least 1000 employees and at least 50 TB of primary data to protect. Their key findings are interesting.

First, despite current economic pressures, nearly 75 percent of enterprise respondents expected their data protection budgets to either stay the same or increase in 2009. The respondents regarded data protection as one of their "top priorities" this year.

Second, according to the responses, most enterprises are protecting extremely large and quickly growing volumes of data, with 48 percent reporting that they have more than 200 TB of data to protect, and 30 percent claiming a compound growth rate of this data pile exceeding 30 percent per year.

Interestingly, server virtualization (using products such as VMware) was, according to the survey, putting significant strains on data protection environments: 74 percent of respondents reported that VMware virtual environments had significantly increased their data storage and/or the complexity of their data protection environment.

Although the majority of study respondents reported they were currently using physical tape for data protection today, half of IT professionals said they expected to be using tape one year from now. They planned to increase their use of disk-based technologies -- disk-to-disk, virtual tape library (VTL) appliances, or VTL gateways.

Sepaton, whose name is "no tapes" spelled backwards, posits its own disk-based de-duplication engine as an alternative to tape. Writing data, whether raw copy or backup data, to disk enabled with Sepaton de-dupe allows a lot of data to be squeezed into the same physical space. The strengths of this approach are contextualized, surprisingly, less in terms of speed of restore, than cost of ownership savings.

This argument jibes with another finding in the report: A key focus of spending in 2009 will be on cost containment technologies for data protection. Respondents said that they were keen to invest in technologies that reduced total cost of ownership by providing higher levels of data protection, controlling data growth, and scaling readily in terms of capacity and performance. Sepaton summarized that enterprises see new data protection technologies such as data de-duplication as "essential for maintaining service levels and regulatory compliance."

The report states, "Data de-duplication ranks highest as the technology planned for deployment in the enterprise data center." More than 90 percent of respondents were either currently using de-duplication or want to use it. Of those who did not have de-duplication, 55 percent were allocating budget towards this technology this year.

In the final analysis, if the views of about 145 technology consumers can be viewed as a realistic bellwether of market trends, de-duplication remains a hot technology for 2009. This is widely viewed as good news for companies such as Sepaton, IBM (which acquired Diligent Technologies last Fall), Data Domain, and approximately 15 other vendors of de-dupe technologies. However, some of the findings left me with a case of indigestion.

Concern #1: Working in the Dark. For all the hype about de-duplication, the technology itself, emanating from as many as 18 hardware vendors and almost as many software-only vendors, remains steeped in mystery. Sepaton, more than others, has opened its corporate files to explain how their process works, but the actual algorithms used by vendors for compressing and re-indexing bits -- so that fewer bits are needed to describe the same file -- remain secret. Some vendors tell me that this will change when the U.S. patent officials catch up on the paperwork and intellectual property receives protection.

True enough, but it places consumers in a situation where they are risking their most irreplaceable data assets by moving them into yet another standards-free "Twilight Zone" of technology. We are just now beginning to hear war stories about the last "big thing" -- server virtualization -- and its unforeseen (and largely negative) impact on cost-containment, compliance, continuity, and carbon footprint reduction: the four "Cs" being discussed in boardrooms of the enterprise companies I visit.

Concern #2: Herd mentality rules. Surveys like this one, combined with a generally supportive press (hamstrung for advertising dollars), suggest that everyone who's anyone is deploying de-dupe. They help shape the herd mind.

Concern #3: Legal issues. Another significant issue not being discussed is the legality of de-duplicated data. Can data, once de-duplicated, be accepted into a court of law or a regulatory agency's administrative hearing as a "full and unaltered copy of the original" as required by some regulatory mandates? We have only the assurances of vendors that this will not be a problem.

I am not convinced, and neither are some of the financial companies I have visited recently. In anticipation of the regulatory flurry that will likely build in the next Congress following the subprime and derivatives fiasco, they are excluding SEC data from de-duplication processes. This suggests to me that when the risk managers understand the risk of de-dupe (surprisingly few do), they nix the idea of applying it to their data.

More disturbing is the collateral damage that zealous de-dupe hype is inflicting on stalwarts of data protection such as tape backup. Given the state of tape backup software, combined with the abysmal lack of skills among tape users in troubleshooting and optimizing their backup environments, the technology has become a whipping boy of the de-dupe crowd. Costs per GB remain below 45 cents to store data on tape, making it one of the most efficient media for storing data ever invented and an essential ingredient of any strategy for driving cost out of storage. Even that benefit has been dismissed by disk-based backup enthusiasts who emphasize the labor costs associated with the technology.

Labor costs are derived from poor backup software and poor management of the tape library. Both are being addressed, I am delighted to say, by key vendors. CA's beta of ARCserve Release 12.5 is underway. Have a look at their beta registration site ( if you want to see what has me so fired up. I like the intelligence, especially the management tools borrowed from their SRM package BrightStor, that has been added to the product. I think it is a dramatic improvement to the usability and efficacy of tape backup software and a harbinger of things to come from CA's competitors. Note that it even includes a software VTL and software de-dupe engine for those who want to add value around their tape backup solution.

As for optimizing library operations, I have seen great strides in this area both from library makers such as Spectra Logic (that now provides soup-to-nuts management tools for tape management as part of their hardware products) and from Crossroads Systems (their Read Verify Appliance (RVA) is now capable of examining library operations and providing action steps to users that will improve performance and reduce costs). This is offered as a service, too, by the way, which will give you the ability to leverage RVA to improve tape operations without needing to become an expert in the rarified nuances of the technology itself.

All things considered, these innovations and improvements should give tape the leg up over just about any other data protection technology on the market, but you’d be hard pressed to find anyone reporting on them. If tape is in decline as a preferred technology for data protection, it is the media -- and perhaps the tape industry itself through their lack of aggressive self promotion -- that is killing it. They need to learn from the complete failure of OSTA to defend optical technologies from the disk array folks that cost Plasmon its future last year.

De-duplication, in and of itself, is not a bad thing. It is just another way of storing data. Used intelligently, and in combination with a program of data classification and smart archiving -- with the former determining what data needs to be protected and how it can be staged effectively for quick restoration of individual files before their corruption translates into a disaster for the company, and the latter, intelligent archiving, being used to reduce the volume of data that is being replicated nightly for protection thereby lessening the burden on backup -- it may well have a role to play in the business continuity strategy. Absent these other systemic components, de-dupe is nothing but a trash compactor. It may seem like a way to go cheap on DR, but it can yield some expensive consequences down the line.

In my next column, I will tell you about a medical company that has deployed de-duplication as a complementary technology to their tape-based data protection solution. It can be done, but as this story will illustrate, not all de-dupe solutions (they tried several) are equal.

In the meantime, feel free to drop me some email with your views on this column:

Must Read Articles