More on replication, crash-consistent images, and Revivio's technology
In an earlier column I discussed a new technology offered by Revivio, a storage start-up in Lexington, MA. I was so impressed by the technology, which seeks to reduce the costs and improve the functionality of traditional mirror splitting for data protection, that I subsequently wrote a lengthier assessment in the form of a white paper that can be downloaded from that company’s web site at http://www.revivio.com
After reading both, a long-time friend of this column (and a very smart storage guy) named Fabrice Helliker posed several good questions, which I put to Michael Rowan and Kirby Wadsworth at Revivio. This is an important dialog that anyone confronting problems with data protection in their own shops may want to consider.
Helliker, vice president of engineering for tape backup software vendor BakBone Software, wrote: “[Revivio’s] concept is based on real-time, time-stamped, block-level replication. This is a useful and valid tool in the data recovery toolkit. But it appears that it has a few reasonably serious limitations that weren’t addressed in your write-up.
“Replication, when done without the cooperation of the application, produces images on disk which may be inconsistent. Databases, for instance, do a lot of caching in memory and flush to disk at their leisure. This can mean that [Revivio’s Time Addressable Storage] TAS (which does constant snapshots) will only produce ‘crash consistent’ images. [That is,] it is like pulling the plug in the machine and rebooting. You will have a database there but you really can’t guarantee to what level of consistency you will have it. This doesn’t seem to be something that a DB admin would really want as to some extent it is worse than having an image, which was produced at a set time—say from more traditional replication/mirroring/backup.
“A generic problem that the replication products have (TAS still falls under that heading) is that it only addresses a smallish sector of the data recovery requirements. Partial restores are the most common types of restores: an operator wants to restore a single table or file. Replication is an all or nothing. Being point in time doesn’t really give you any advantage here. With databases, you can’t even use the strategy of remounting the volume somewhere else and working on it.
“Anyhow those were my immediate thoughts on the subject but wanted to bounce it of you in case I missed something obvious."
Considering the trusted source of these comments, I weighed Helliker’s view carefully and prepared my response. Then it dawned on me that these are issues that Revivio had better be able to address itself if it is going to compete effectively in this market—so I forwarded the e-mail to them.
Here is their response, which came from Kirby Wadsworth, Chief Marketing Officer, with input from CTO Michael Rowan:
“First of all, Revivio isn't replication—it's live storage, operating in real time. The TAS model does also allow for replicating the TimeStore over vast distance—giving customers the ability to instantly access a point-in-time image of NY’s data in Chicago. This brings some real advantages to the replication model—distance, streaming ‘go back’ capability, offline remote access, etc. So, while TAS is not itself replication, it extends to include replication functionality.
“Revivio can indeed deliver an image that is ‘crash consistent,’ but it can also deliver an image from any previous point in time—including the exact same point in time saved on tape by traditional backup. The difference is that Revivio delivers that point in time image instantly rather than in the minutes or hours it would take to move a static single point in time image off tape.
“It’s perhaps also worth pointing out that databases don’t actually cache unprotected data in memory (to flush at their leisure)—they are two phase transactional systems and the logging that occurs with rollback and redo logs occurs synchronously, which is how databases provide crash-consistent recovery in the face of system failures (e.g., high availability).
“When the customers we talk to are offered restoration from a single point in time image created by taking the database down (or putting it into backup mode) with that method’s inherent hours of downtime vs. getting the business back online within minutes using Revivio’s instant access to infinite previous points in time, the customers always choose the latter: DBA's and sys admins alike.
“Up until recently, the problem with PITR (point in time recovery) is that the user could not go to a PITR that spanned disks even with crash-consistent recovery. TAS allows you to atomically roll many related disks (e.g., a database system spanning dozens or even hundreds of spindles) to any point in time with crash-consistent recovery, safely. Additionally, because Revivio protects the entire database instantiation (tables, log files, etc.), when instantly revived to a crash consistent point in time using Revivio, (assuming you are using a modern DB) the database acts exactly as it would if the system crashed—it makes every attempt to recover.
“So, Revivio customers have choices—they can easily and quickly create quiese points to give them full transactional salvage points and Revivio will mark the TAS timeline at that point – in the event of a failure the customer can chose which makes the most business sense - to roll back 2 minutes with crash-consistent recovery, or roll back 2 hours to the last quiese point. In addition, because recovering with Revivio is fast and non-destructive, they even have the opportunity of doing both—attempting a crash-consistent recovery first, and then falling back to the last quiese point if the first recovery fails.”
Implicit in the above exchange is a more basic issue: tape versus disk mirroring. Helliker wrote that he is often asked how BakBone (or any other tape backup software product) can compete against replication type products. “My view, however, is don’t. Replication/Mirroring/TAS serves a useful purpose that tradition backup cannot address. However, they come with their own problems [and] should be viewed as complimentary to traditional backup not a replacement.”
Revivio’s Wadsworth agrees to a point. “TAS is complimentary to traditional single-point-in-time storage methods such as mirror splits and tape backup—we also believe that over time TAS will become the preferred method of instantly accessing previous point in time images for use in data restore and recovery.”
Our thanks go out to Helliker and Wadsworth for sharing their dialog with Storage Strategies.
Jon William Toigo is chairman of The Data Management Institute, the CEO of data management consulting and research firm Toigo Partners International, as well as a contributing editor to Enterprise Systems and its Storage Strategies columnist. Mr. Toigo is the author of 14 books, including Disaster Recovery Planning, 3rd Edition, and The Holy Grail of Network Storage Management, both from Prentice Hall.