Pushbutton Disaster Recovery

Do you know if your disaster recovery processes are up to date?

With the mean season finally in full swing, disaster recovery and business continuity initiatives have returned to front burner status in many companies. The uptick in severe weather activity has also pushed many vendors to turn up the volume on the disaster recovery value of their wares.

One pitch I found to have a better signal-to-noise ratio than most came from Continuity Software, a firm that has been around since 2005, but just this year has set up new headquarters in Boston. CEO Gil Hecht spent an hour on the phone last week briefing me on the product. I must admit that I liked what I heard.

First off, Continuity Software is not a hardware company. Their product, RecoverGuard, is not tied to proprietary data mirroring, continuous data protection, or similar schemes. What it does is much simpler: it examines the disaster recovery processes that currently exist in an environment and lets you know if they are still current with the many business-process and hardware-platform changes that occur in any healthy business IT environment. When it finds a mismatch, it informs you either by a dashboard presentation or in a report.

Usually, ongoing testing is required to ensure that continuity strategies are synchronized with the business environment, but actual alignment is only as good as the last test (and what was fixed based on test results). Continuity Software’s RecoverGuard is touted as the next evolution, permitting real-time, ongoing testing and reporting of any misalignments that may exist.

According to Hecht, who quotes Gartner to make this point, more than 70 percent of disaster recovery solutions will not work if they ever need to be executed. The reasons, as he defines them, are three-fold. First, the more complex the environment to be protected, the greater the likelihood of mistakes that cause gaps. Closely related to this is the heterogeneity of the environment: large enterprises tend to have a variety of technological solutions, which include multiple operating systems, multiple databases (DBs), multiple storage platforms, and so forth. "Consequently," he notes, "the IT environment in the DR site may not stay consistent with the primary production data center." Finally, he notes, the disaster recovery solution needs to reflect essential dependencies that exist in the production environment between multiple layers of technology and even seemingly small misalignments at any of these layers may have much greater impacts.

What Hecht proposes to address with this situation is to field his RecoverGuard software on a server and let it go out and discover the topology of your IT infrastructure. This discovery process is agentless and uses standard protocols to collect information on the gear currently deployed, network and fabric connections, and a host of other information about the assets in both the production and remote environments. The product is then used to map dependencies between assets and to identify possible gaps between the assets deployed at both locations.

Finding DR Issues

At the core of this particular function is a patent-pending gap-analysis algorithm tied to a knowledge base containing best-practice information and vendor-recommended configurations and settings. When gaps between recommended and actual settings are discovered, they are highlighted via an easy-to-understand dashboard or printed reports.

Following our chat, Hecht forwarded materials that identified specific gaps that had been detected by his product at shops that either currently use or are actively evaluating his product. RecoverGuard had spotted such issues as improperly configured application server clusters that were not replicating data at alternate locations. This would create a significant problem if a failover were to occur.

In another documented case, RecoverGuard discovered database logs that were not being replicated by EMC’s SRDF product because of changes that had occurred in the location of data in the production environment. Had this not been detected, databases would have been corrupted during attempts to restore systems at the recovery site.

In yet another example, disk capacity had been changed in the production site, but comparable changes had not been made in the recovery site, presenting the possibility that space would be insufficient to recover operations using the remote facility. RecoverGuard also discovered replication that had been established between dissimilar storage platforms in the production and DR locations. The configuration was not only creating poorly synchronized datasets—it was causing performance delays on production applications that no one had been able to troubleshoot successfully.

The last point is important. RecoverGuard may provide considerable value beyond the validation of data replication strategies. In one customer case, a cellular communications company, the product identified 33 data replication errors as well as over $100,000 worth of wasted disk space on expensive EMC storage arrays. The product could morph into something capable of providing a comprehensive health check of infrastructure, facilitating infrastructure optimization and capacity planning.

I like the approach and will be tracking RecoverGuard and Continuity Software as their story develops. This is saying a lot, because most of the software I have examined for validating recoverability has been very limited in its capabilities (supporting only a few server platforms, storage arrays, or operating systems), full of "gotchas" (for example, providing no means to validate patch levels), or disruptive in their operation (pausing replication processes to check the status of data at both sides of the mirror).

It is traditionally difficult for DR product vendors to create software tools that do not require consumers to conform to the vendor’s preferred methods for safeguarding assets. Using their tools usually requires you to conform to their methodology. RecoverGuard is completely agnostic in this regard. Therefore, whatever mix of hardware and software functionality you are using to replicate data for protection purposes, RecoverGuard will support it from a monitoring and management perspective.

If you are looking to evaluate your state of disaster preparedness, or just want a single-pane-of-glass solution for monitoring your readiness on an ongoing basis, RecoverGuard is worth a look. Your comments are welcome: jtoigo@toigopartners.com.