In-Depth
Simplifying Your Data Protection Strategy
Sifting through the options (and hype) of redundant-backup can be daunting. Our storage analyst simplifies things with three key questions.
A New Year’s Resolution I’m hearing from a lot of companies this year is to finally do something meaningful about data protection. Everyone is looking for a way to safeguard irreplaceable data assets against predictable and unpredictable loss scenarios ranging from failed disk drives and application or user goofs to disasters of the smoke-and-rubble variety. In other words, how to make a copy of their data.
The only real way to protect data is to make a copy of the original. While other aspects of business technology afford you a choice between two strategic options for protection—redundancy or replacement—data itself can only be protected via a strategy of redundancy.
That there are so many options for making a copy explains, partly, why so many folks find the process of selecting and implementing a data protection strategy so daunting. Sifting through the hype to arrive at the strategy that is right for you can be overwhelming.
There is, of course, the old, on-going battle between vendors about the merits of disk-to-disk copying versus disk-to-tape. Choosing one is like following Alice down the rabbit hole. On the disk mirroring side, we hear debates about the merits of different types of mirroring approaches—between point-in-time mirror splits and write journaling, for example. If you choose tape, you will find a rat’s nest of different tape strategies, each with its own champions. Do you do full backups to tape with incremental change data updates or snap shot-based schemes?
Confusing matters even more is the unfortunate tendency of the industry to introduce even murkier concepts like continuous data protection (CDP) into the discussion. CDP means just about anything a vendor wants it to mean.
What’s needed is a dose of common sense. If you want to set up a good data protection strategy, some truly basic questions need to be answered. Only when you have answered these questions, can you shop intelligently for the right hardware and software components.
Question 1: What are the data assets you are trying to protect?
This is a biggie, since the type and volume of data, its growth rates, etc., all factor into the choice of a data-copy solution.
In many shops I visit, data is not differentiated or segregated by any rational sorting scheme. In truth, all data is not equally important or critical. What is needed for regulatory compliance isn’t necessarily the same data you will need to recover your business in the wake of a fire. Before you go shopping for a data-copy solution, you need to understand what you are protecting.
Question 2: How soon after a disaster do you need access to data?
Categorizing data may be accomplished in part by determining the time-to-data requirements imposed by your recovery scenario. Some applications are very important, and their data needs to be made available ASAP following any interruption. Other applications are less important, and you might have the luxury of time in restoring their data sets. Still other apps are nonessential and may not impose any time-to-data requirement whatsoever.
Different data-copy strategies have different data-restore capabilities in terms of speeds and feeds. You might just find that your data guides you to more than one strategy—that certain data requires real-time restore or failover to the redundant data set, while others can take the more leisurely path afforded by traditional tape.
Question 3: How much money do you have to spend?
This is often the real gating factor when it comes down to "solutioneering" for data copy. Despite the self-serving ROI calculators coming from some vendors of disk-to-disk solutions, tape remains the price king of backup. We are talking about 44 cents or less per GB for tape media: even RAID arrays built on cheap SATA disk can’t compete with that.
Moreover, tape affords flexibility in restore that many disk-to-disk solutions do not. You don’t need to have the same array box at your recovery site as you currently have on the floor of your IT shop in order to restore your data from tape. In most mirroring schemes, all of the boxes need to be identical or from the same vendor. Often your replication software is designed only to work with the vendor’s own gear.
Combining Technologies
When rubber meets the road, a data protection strategy may be best served by a combination of technologies. One way to think about it is to divvy up your threat scenarios into two groups: call them annoyances and emergencies.
Annoyances are events in which discrete files become corrupted or are accidentally deleted. Disk-based copy schemes might provide the best fit for fast restore of discrete files.
Emergencies are those events we don’t like to think about. We show up for work, but the building isn’t there. Now we are talking about data restore with a capital R. Your local disk mirror is toast; it’s time for tape to be called into the game.
To make any of this work, of course, you will need to establish policies and procedures for data copy based on an analysis of data itself. It is only by understanding data that you can select the right components of a data-copy strategy; and only by on-going analysis and classification that you can keep the expansion of data from overwhelming your data-copy capability. Everything is connected, as the saying goes. You need to establish a data-copy strategy as well as a continuous archive strategy—if you want to keep your data-protection capability up to snuff.
As data ages and its frequency of access declines, you need to move it out of production disk and into an archive repository (or the circular file). Letting data accumulate is the number one reason why backups don’t complete within operational windows. There are tools for archiving databases, e-mail, and workflow content, but no really good, high-granularity tools for user files, which tend to comprise the largest junk drawer of data in your cabinet. To wrangle user files into an archive scheme, you need to get users to buy in.
One way to do it, if your corporate culture permits, is to do what NASA did at Goddard Space Flight Center a couple of years ago. Give your users a monthly storage allocation, and levy a dollar charge per GB or TB (for example) for data volume exceeding allocation. At the same time, give them free and unlimited use of tape. That way, they are compelled by their budgets to think very hard about what data they really need to store on expensive arrays and what they could afford to relegate to cheap tape.
The result is a sort of user-guided hierarchical storage management scheme, which is much more efficient than moving data around based on date last accessed or date last modified. It isn’t rocket science, but it worked at NASA.
Next week I’ll look at evolving copy-on-write schemes that may provide data protection alternatives at competitive prices. For now, I welcome your feedback at [email protected]
About the Author
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.