Time to Data: When IT Comes to Disaster Recovery, Always Wait for Backup
Without current copies of the data used by critical business applications, formulating strategies for recovering hardware platforms is a pointless undertaking. "Time to Data" is essential.
According to many observers, the successful recovery of mission-critical business processes in the wake of a disaster boils down to a single factor: "time to data." Everyone understands that without current copies of the data used by critical business applications, formulating strategies for recovering hardware platforms and networks is a pointless undertaking. What might be less clear is the importance of making back-up data available for use by restored systems in the shortest possible timeframe.
According to Dan Broadway, Director of ClearPath Storage for Unisys in Mission Viejo, Calif., many of his customers operate "high-volume, transaction-oriented businesses that are very sensitive to the duration of unplanned downtime. For applications, such as car reservation systems, airline ticketing systems and certain financial applications, outage costs are extremely high and can build up to significant losses of revenue in a very brief period of time."
For this reason, observes Broadway, these companies are doing anything and everything they can to shorten post-disaster recovery timeframes. Many of his customers "are replicating servers and storage to provide automatic failover in the event that one system experiences a problem."
The Problem with Traditional Data Backup
According to Broadway and others, the traditional strategy for data recovery – data restored from back-up tapes – no longer meets the needs of companies that stand to lose between several hundred thousand to several millions of dollars for every hour that their systems are down. Don Swatik, Vice President for Product Management at EMC Corporation (Hopkinton, Mass.) underscores the point, "The volume of data is increasing exponentially and, based on a survey we conducted last year, the primary issue this raises from a disaster recovery standpoint is how quickly you can restore your data. Restoral from tape backups is just too slow."
In a traditional back-up scenario, production data is copied to tape, which is then removed to secure off-site storage. In the event of a disaster that forces the relocation of corporate systems to an alternate processing site, tapes must be retrieved from storage, transported to the remote site, and sorted and loaded into tape libraries or autoloaders. Once these time-consuming steps are complete, data must be transferred to disk storage devices at the fastest possible rate of speed.
Even in the best of circumstances, says Swatik and others, the process takes too much time. Assuming that the right tapes are obtained from off-site storage, and that tapes themselves are undamaged, and the data they contain is recoverable (very big "ifs," according to some observers), restoring terabyte-sized databases from tape is a time-consuming process. Data restoral merely adds expensive recovery time to what has already accrued to the other manual processes in the traditional approach.
Says Swatik, "not all, but a growing percentage of companies are looking for alternatives" to traditional tape backup that will shorten time to data. As a result, alternative strategies, such as Electronic Tape Vaulting and Remote Mirroring are increasingly finding a business case.
Electronic Tape Vaulting
Broadway observes that, in addition to building its own storage products, Unisys has cultivated strong relationships with two storage product vendors over the years: disk array manufacturer, EMC Corporation, and tape vendor, Storage Technology Corporation (StorageTek, Louisville, Colo.). These relationships are being leveraged to deliver new data restore solutions to companies with critical downtime sensitivities.
StorageTek, and other tape vendors, have been improving remote tape solutions, called electronic tape vaults, for the past decade. The objective of electronic tape vaulting is to enable data backups, processed locally, to be written onto tape units located at a remote system recovery facility (sometimes called hot site). The data is transported across a wide area network (WAN) link to a tape device at the remote site.
Most electronic tape vaulting strategies leverage channel extension technology offered by vendors, such as Computerm Corporation (Pittsburgh) and Computer Network Technology (CNT) Corporation (Minneapolis). One extender is installed on the channel that would normally be used to connect the tape peripheral to the host. A second extender, at a remote site, is cabled to the tape peripheral itself. Connecting the two extenders is a WAN link. In operation, data streams from back-up processes are directed over the WAN to the remote device and commands and responses expected from the peripheral device are carried back to the host system across the same link. Variations exist in the channel extenders from different vendors intended to reduce the effects of link latency and to optimize the bandwidth provided on the WAN link. In a properly configured solution, the remote tape device should operate at rated speeds.
Electronic tape vaulting shortens time to data in disaster recovery by eliminating the manual procedures involved in retrieving tapes from off-site storage and transporting them to the hot site. According to Jim Grogan, Vice President of Alliances with disaster recovery back-up facility vendor, SunGard Recovery Services (Wayne, Pa.), electronic tape vaulting has come of age. "We have offered electronic tape vaulting since the early 1990s, and we signed our first vault customer in 1992. It was 1996 before we signed our second customer for the service. Now, a significant percentage of our customers use this service."
Grogan argues that the increased interest in electronic tape vaulting derives from "a dramatic change in the cost model for the service." He notes that tape vaulting was prohibitively expensive at first, owing to network costs, "A contract that included tape vaulting was 10 times more expensive than a traditional contract." However, deregulation in the telecommunications industry has caused a three- to four-fold reduction in the cost for high-bandwidth WAN facilities, such as T1 and T3, Grogan says.
He adds, "There has also been significant maturation in the vaulting products themselves." In a January 1999 press release, StorageTek announced that the disaster recovery services companies SunGard and Comdisco (Rosemont, Ill.) planned to use the latest StorageTek Timberline tape systems to support the more than 28,000 companies that use StorageTek products and want StorageTek technology to provide their remote electronic vault.
Not Fast Enough for All
Says Grogan, electronic tape vaulting provides the means for recovery to begin while customer recovery teams are in transit to their hot site. In some cases, SunGard personnel can begin recovering data from tape to disk as soon as the customer formally declares a disaster. Depending on the volume of data to be recovered, restoral may be complete by the time the customer arrives.
In the case of some customers, however, even data restorals from electronic tape vaults constitute an unacceptable "time to data" delay. Leveraging technology from EMC Corporation, SunGard offers an alternative in the form of remote mirroring.
Mirroring replicates production system data in real-time or near-real-time, "depending on how much money you have to throw at the problem," according to Dale Miller, Vice President for the consulting firm, Trilliant Group. He notes that restoring a multi-terabyte database from tape takes a significant amount of time, "With DLT 7000 tape systems, you are talking 7 Mb/s data transfer rates. Newer devices from StorageTek can deliver 18 to 25 Mb/s. The problem with tape is getting the data off the tape and onto disk within an acceptable amount of time. With mirroring, the transfer is instantaneous."
EMC’s Swatik agrees with Miller. He views the value proposition of remote mirroring as a no-brainer, "Mirrored data is synchronized with production data. Tape contains an older generation of the data. You back up data on tape when you’re home; you’ll never have to read it. If you think you may need to use the data, you ought to have it on disk."
The underlying issue with tape backups is not resolved through electronic vaulting, mirroring advocates assert. Companies using tape backups generally need to restore tape data first, then work frantically to add data that wasn’t part of the last backup. Bringing data current with the pre-disaster production system introduces additional time-to-data delays.
Grogan says that the truth of this assertion is validated by the increasing number of SunGard customers opting for remote mirroring solutions. SunGard’s offering is based on EMC’s Symmetrix Remote Data Facility (SRDF), which is available for use with the vendor’s Symmetrix disk arrays.
He observes, "SRDF has been around for several years, and we were a beta site for the technology. Originally, the product suffered from a usability perspective. You had no read access while the mirror was active, so customers were unable to see any proof that the mirror was actually working. Now, EMC has added capabilities that let you see the mirror, work with it and test with it. Combined with decreasing costs for WAN links, the improved features of SRDF are boosting interest in the technology as a back-up option. We have doubled the number of customers using the technology in the last couple of months."
Not everyone is sold on remote mirroring, however. Jack Wells, Product Manager of SAMS:Alexandria, a tape back-up product from Sterling Software (Boulder, Colo.) is cautious about portraying remote mirroring as a panacea for recovery. He notes that "mirroring can be subject to hardware errors" and adds, "There are data integrity errors to consider as well. If a database writes a bad block and corrupts its data on the original array, the mirrored array can simply copy this error and corrupt its data store as well." He notes that other problems can arise if proper procedures are not observed when replacing array drives or performing other tasks.
"If by those concerns you are implying that a mirror reproduces errors faithfully," responds EMC’s Swatik, "I would say you are right. But errors are also reproduced on tape backups."
Mirroring does introduce latency into production systems, Swatik concedes, especially if the production array and the mirror array are placed at a distance from each other. "You have issues of the speed of light if you extend the distance by greater than 1,000 miles," he says, "which is why we recommend staying within the rated distance of the interface (up to about 10 kilometers with Fibre Channel, for example). If you need to extend the distance further, we recommend making the data transfer in hops."
What EMC calls multi-hopping is what others in the industry refer to as surrogate mirroring. The strategy entails deploying three (or more) arrays to provide a remote mirroring function.
In operation, the production array is mirrored to a nearby "intermediate" mirror array using synchronous mirroring techniques. With synchronous mirroring, new writes to the production array are prohibited until old data writes have been completed on both the production array and its mirror. If arrays are too far apart, the distance that data must travel can introduce latency to the process – slowing the operation of reads and writes to unacceptable levels. By keeping the arrays near to one another, the latency accrued to synchronous mirroring is manageable.
Remote mirroring is actually a function of the intermediate array communicating its data via a WAN link with a third array, called a "remote mirror." The remote mirror receives data from the intermediary array using an asynchronous mirroring technique. This process occurs separately from the production array and does not impact its performance. The consequence of asynchronous mirroring is the remote mirror is typically slightly out of synchronization with the production array, but much less so than a tape backup, which may be out of synchronization with production data by 24 hours or more.
Shortening Time to Data
Both remote mirroring and electronic tape vaulting are decided improvements over traditional tape back-up methods from the standpoint of "time to data." The specific recovery requirements of a company will determine which (if either) method is appropriate.
While expensive to implement, these strategies promise to become more cost-efficient as high bandwidth SONET-based WAN access is extended to company premises in the near future. Sprint, with its ION offering, and AT&T, with its INC service, are targeting the end of this year for the deployment of OC3/OC12 metropolitan SONET rings in most major metropolitan areas of the United States. Both companies see high-speed, low-latency remote mirroring as one likely use that companies will make of the service once available.
About the Author: Jon William Toigo is an independent writer and consultant specializing in business automation solutions. He is the author of eight books, including The Holy Grail of Data Storage Management and Disaster Recovery Planning, 2nd Edition. He can be reached via e-mail at firstname.lastname@example.org or through his Web site at www.toigoproductions.com.